linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
@ 2022-05-02 10:25 Dharmendra Singh
  2022-05-02 10:25 ` [PATCH v4 1/3] FUSE: Implement atomic lookup + create Dharmendra Singh
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Dharmendra Singh @ 2022-05-02 10:25 UTC (permalink / raw)
  To: miklos
  Cc: Dharmendra Singh, linux-fsdevel, fuse-devel, linux-kernel, bschubert

In FUSE, as of now, uncached lookups are expensive over the wire.
E.g additional latencies and stressing (meta data) servers from
thousands of clients. These lookup calls possibly can be avoided
in some cases. Incoming three patches address this issue.


Fist patch handles the case where we are creating a file with O_CREAT.
Before we go for file creation, we do a lookup on the file which is most
likely non-existent. After this lookup is done, we again go into libfuse
to create file. Such lookups where file is most likely non-existent, can
be avoided.

Second patch handles the case where we open first time a file/dir
but do a lookup first on it. After lookup is performed we make another
call into libfuse to open the file. Now these two separate calls into
libfuse can be combined and performed as a single call into libfuse.

Third patch handles the case when we are opening an already existing file
(positive dentry). Before this open call, we re-validate the inode and
this re-validation does a lookup on the file and verify the inode.
This separate lookup also can be avoided (for non-dir) and combined
with open call into libfuse. After open returns we can revalidate the inode.
This optimisation is performed only when we do not have default permissions
enabled.

Here is the link to performance numbers
https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/


Dharmendra Singh (3):
  FUSE: Implement atomic lookup + create
  Implement atomic lookup + open
  Avoid lookup in d_revalidate()

 fs/fuse/dir.c             | 211 +++++++++++++++++++++++++++++++++++---
 fs/fuse/file.c            |  30 +++++-
 fs/fuse/fuse_i.h          |  16 ++-
 fs/fuse/inode.c           |   4 +-
 fs/fuse/ioctl.c           |   2 +-
 include/uapi/linux/fuse.h |   5 +
 6 files changed, 246 insertions(+), 22 deletions(-)

---
v4: Addressed all comments and refactored the code into 3 separate patches
    respectively for Atomic create, Atomic open, optimizing lookup in
    d_revalidate().
---


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-02 10:25 [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Dharmendra Singh
@ 2022-05-02 10:25 ` Dharmendra Singh
  2022-05-03 12:43   ` Vivek Goyal
                     ` (2 more replies)
  2022-05-02 10:25 ` [PATCH v4 2/3] FUSE: Implement atomic lookup + open Dharmendra Singh
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 37+ messages in thread
From: Dharmendra Singh @ 2022-05-02 10:25 UTC (permalink / raw)
  To: miklos
  Cc: Dharmendra Singh, linux-fsdevel, fuse-devel, linux-kernel,
	bschubert, Dharmendra Singh

From: Dharmendra Singh <dsingh@ddn.com>

When we go for creating a file (O_CREAT), we trigger
a lookup to FUSE USER SPACE. It is very  much likely
that file does not exist yet as O_CREAT is passed to
open(). This lookup can be avoided and can be performed
as part of create call into libfuse.

This lookup + create in single call to libfuse and finally
to USER SPACE has been named as atomic create. It is expected
that USER SPACE create the file, open it and fills in the
attributes which are then used to make inode stand/revalidate
in the kernel cache. Also if file was newly created(does not
exist yet by this time) in USER SPACE then it should be indicated
in `struct fuse_file_info` by setting a bit which is again used by
libfuse to send some flags back to fuse kernel to indicate that
that file was newly created. These flags are used by kernel to
indicate changes in parent directory.

Fuse kernel automatically detects if atomic create is implemented
by libfuse/USER SPACE or not. And depending upon the outcome of
this check all further creates are decided to be atomic or non-atomic
creates.

If libfuse/USER SPACE has not implemented the atomic create operation
then by default behaviour remains same i.e we do not optimize lookup
calls which are triggered before create calls into libfuse.

Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
---
 fs/fuse/dir.c             | 82 +++++++++++++++++++++++++++++++++++----
 fs/fuse/fuse_i.h          |  3 ++
 include/uapi/linux/fuse.h |  3 ++
 3 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 656e921f3506..cad3322a007f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -523,7 +523,7 @@ static int get_security_context(struct dentry *entry, umode_t mode,
  */
 static int fuse_create_open(struct inode *dir, struct dentry *entry,
 			    struct file *file, unsigned int flags,
-			    umode_t mode)
+			    umode_t mode, uint32_t opcode)
 {
 	int err;
 	struct inode *inode;
@@ -535,8 +535,10 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	struct fuse_entry_out outentry;
 	struct fuse_inode *fi;
 	struct fuse_file *ff;
+	struct dentry *res = NULL;
 	void *security_ctx = NULL;
 	u32 security_ctxlen;
+	bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
 
 	/* Userspace expects S_IFREG in create mode */
 	BUG_ON((mode & S_IFMT) != S_IFREG);
@@ -566,7 +568,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 		inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
 	}
 
-	args.opcode = FUSE_CREATE;
+	args.opcode = opcode;
 	args.nodeid = get_node_id(dir);
 	args.in_numargs = 2;
 	args.in_args[0].size = sizeof(inarg);
@@ -613,9 +615,44 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 		goto out_err;
 	}
 	kfree(forget);
-	d_instantiate(entry, inode);
+	/*
+	 * In atomic create, we skipped lookup and it is very much likely that
+	 * dentry has DCACHE_PAR_LOOKUP flag set on it so call d_splice_alias().
+	 * Note: Only REG file is allowed under create/atomic create.
+	 */
+	/* There is special case when at very first call where we check if
+	 * atomic create is implemented by USER SPACE/libfuse or not, we
+	 * skipped lookup. Now, in case where atomic create is not implemented
+	 * underlying, we fall back to FUSE_CREATE. here we are required to handle
+	 * DCACHE_PAR_LOOKUP flag.
+	 */
+	if (!atomic_create && !d_in_lookup(entry) && fm->fc->no_atomic_create)
+		d_instantiate(entry, inode);
+	else {
+		res = d_splice_alias(inode, entry);
+		if (res) {
+			 /* Close the file in user space, but do not unlink it,
+			  * if it was created - with network file systems other
+			  * clients might have already accessed it.
+			  */
+			if (IS_ERR(res)) {
+				fi = get_fuse_inode(inode);
+				fuse_sync_release(fi, ff, flags);
+				fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
+				err = PTR_ERR(res);
+				goto out_err;
+			}
+			/* res is expected to be NULL since its REG file */
+			WARN_ON(res);
+		}
+	}
 	fuse_change_entry_timeout(entry, &outentry);
-	fuse_dir_changed(dir);
+	/*
+	 * In case of atomic create, we want to indicate directory change
+	 * only if USER SPACE actually created the file.
+	 */
+	if (!atomic_create || (outopen.open_flags & FOPEN_FILE_CREATED))
+		fuse_dir_changed(dir);
 	err = finish_open(file, entry, generic_file_open);
 	if (err) {
 		fi = get_fuse_inode(inode);
@@ -634,6 +671,29 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	return err;
 }
 
+static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
+			      struct file *file, unsigned int flags,
+			      umode_t mode)
+{
+	int err;
+	struct fuse_conn *fc = get_fuse_conn(dir);
+
+	if (fc->no_atomic_create)
+		return -ENOSYS;
+
+	err = fuse_create_open(dir, entry, file, flags, mode,
+			       FUSE_ATOMIC_CREATE);
+	/* If atomic create is not implemented then indicate in fc so that next
+	 * request falls back to normal create instead of going into libufse and
+	 * returning with -ENOSYS.
+	 */
+	if (err == -ENOSYS) {
+		if (!fc->no_atomic_create)
+			fc->no_atomic_create = 1;
+	}
+	return err;
+}
+
 static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
 		      umode_t, dev_t);
 static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
@@ -643,11 +703,12 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	int err;
 	struct fuse_conn *fc = get_fuse_conn(dir);
 	struct dentry *res = NULL;
+	bool create = flags & O_CREAT ? true : false;
 
 	if (fuse_is_bad(dir))
 		return -EIO;
 
-	if (d_in_lookup(entry)) {
+	if ((!create || fc->no_atomic_create) && d_in_lookup(entry)) {
 		res = fuse_lookup(dir, entry, 0);
 		if (IS_ERR(res))
 			return PTR_ERR(res);
@@ -656,7 +717,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 			entry = res;
 	}
 
-	if (!(flags & O_CREAT) || d_really_is_positive(entry))
+	if (!create || d_really_is_positive(entry))
 		goto no_open;
 
 	/* Only creates */
@@ -665,7 +726,13 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	if (fc->no_create)
 		goto mknod;
 
-	err = fuse_create_open(dir, entry, file, flags, mode);
+	err = fuse_atomic_create(dir, entry, file, flags, mode);
+	/* Libfuse/user space has not implemented atomic create, therefore
+	 * fall back to normal create.
+	 */
+	if (err == -ENOSYS)
+		err = fuse_create_open(dir, entry, file, flags, mode,
+				       FUSE_CREATE);
 	if (err == -ENOSYS) {
 		fc->no_create = 1;
 		goto mknod;
@@ -683,6 +750,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 }
 
 /*
+
  * Code shared between mknod, mkdir, symlink and link
  */
 static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e59fbdefeb..d577a591ab16 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -669,6 +669,9 @@ struct fuse_conn {
 	/** Is open/release not implemented by fs? */
 	unsigned no_open:1;
 
+	/** Is atomic create not implemented by fs? */
+	unsigned no_atomic_create:1;
+
 	/** Is opendir/releasedir not implemented by fs? */
 	unsigned no_opendir:1;
 
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index d6ccee961891..e4b56004b148 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -301,6 +301,7 @@ struct fuse_file_lock {
  * FOPEN_CACHE_DIR: allow caching this directory
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
+ * FOPEN_FILE_CREATED: the file was actually created
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -308,6 +309,7 @@ struct fuse_file_lock {
 #define FOPEN_CACHE_DIR		(1 << 3)
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
+#define FOPEN_FILE_CREATED	(1 << 6)
 
 /**
  * INIT request/reply flags
@@ -537,6 +539,7 @@ enum fuse_opcode {
 	FUSE_SETUPMAPPING	= 48,
 	FUSE_REMOVEMAPPING	= 49,
 	FUSE_SYNCFS		= 50,
+	FUSE_ATOMIC_CREATE	= 51,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 2/3] FUSE: Implement atomic lookup + open
  2022-05-02 10:25 [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Dharmendra Singh
  2022-05-02 10:25 ` [PATCH v4 1/3] FUSE: Implement atomic lookup + create Dharmendra Singh
@ 2022-05-02 10:25 ` Dharmendra Singh
  2022-05-04 18:20   ` Vivek Goyal
  2022-05-02 10:25 ` [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate() Dharmendra Singh
  2022-05-04 19:18 ` [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Vivek Goyal
  3 siblings, 1 reply; 37+ messages in thread
From: Dharmendra Singh @ 2022-05-02 10:25 UTC (permalink / raw)
  To: miklos
  Cc: Dharmendra Singh, linux-fsdevel, fuse-devel, linux-kernel,
	bschubert, Dharmendra Singh

From: Dharmendra Singh <dsingh@ddn.com>

We can optimize aggressive lookups which are triggered when
there is normal open for file/dir (dentry is new/negative).

Here in this case since we are anyway going to open the file/dir
with USER SPACE, avoid this separate lookup call into libfuse
and combine it with open call into libfuse.

This lookup + open in single call to libfuse has been named
as atomic open. It is expected that USER SPACE opens the file
and fills in the attributes, which are then used to make inode
stand/revalidate in the kernel cache.

Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
---
 fs/fuse/dir.c             | 78 ++++++++++++++++++++++++++++++---------
 fs/fuse/fuse_i.h          |  3 ++
 fs/fuse/inode.c           |  4 +-
 include/uapi/linux/fuse.h |  2 +
 4 files changed, 69 insertions(+), 18 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index cad3322a007f..6879d3a86796 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -516,18 +516,18 @@ static int get_security_context(struct dentry *entry, umode_t mode,
 }
 
 /*
- * Atomic create+open operation
- *
- * If the filesystem doesn't support this, then fall back to separate
- * 'mknod' + 'open' requests.
+ * This is common function for initiating atomic operations into libfuse.
+ * Currently being used by Atomic create(atomic lookup + create) and
+ * Atomic open(atomic lookup + open).
  */
-static int fuse_create_open(struct inode *dir, struct dentry *entry,
-			    struct file *file, unsigned int flags,
-			    umode_t mode, uint32_t opcode)
+static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
+				 struct dentry **alias, struct file *file,
+				 unsigned int flags, umode_t mode, uint32_t opcode)
 {
 	int err;
 	struct inode *inode;
 	struct fuse_mount *fm = get_fuse_mount(dir);
+	struct fuse_conn *fc = get_fuse_conn(dir);
 	FUSE_ARGS(args);
 	struct fuse_forget_link *forget;
 	struct fuse_create_in inarg;
@@ -539,9 +539,13 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	void *security_ctx = NULL;
 	u32 security_ctxlen;
 	bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
+	bool create_op = (opcode == FUSE_CREATE ||
+			  opcode == FUSE_ATOMIC_CREATE) ? true : false;
+	if (alias)
+		*alias = NULL;
 
 	/* Userspace expects S_IFREG in create mode */
-	BUG_ON((mode & S_IFMT) != S_IFREG);
+	BUG_ON(create_op && (mode & S_IFMT) != S_IFREG);
 
 	forget = fuse_alloc_forget();
 	err = -ENOMEM;
@@ -553,7 +557,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	if (!ff)
 		goto out_put_forget_req;
 
-	if (!fm->fc->dont_mask)
+	if (!fc->dont_mask)
 		mode &= ~current_umask();
 
 	flags &= ~O_NOCTTY;
@@ -642,8 +646,9 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 				err = PTR_ERR(res);
 				goto out_err;
 			}
-			/* res is expected to be NULL since its REG file */
-			WARN_ON(res);
+			entry = res;
+			if (alias)
+				*alias = res;
 		}
 	}
 	fuse_change_entry_timeout(entry, &outentry);
@@ -681,8 +686,8 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
 	if (fc->no_atomic_create)
 		return -ENOSYS;
 
-	err = fuse_create_open(dir, entry, file, flags, mode,
-			       FUSE_ATOMIC_CREATE);
+	err = fuse_atomic_do_common(dir, entry, NULL, file, flags, mode,
+				    FUSE_ATOMIC_CREATE);
 	/* If atomic create is not implemented then indicate in fc so that next
 	 * request falls back to normal create instead of going into libufse and
 	 * returning with -ENOSYS.
@@ -694,6 +699,29 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
 	return err;
 }
 
+static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
+				struct dentry **alias, struct file *file,
+				unsigned int flags, umode_t mode)
+{
+	int err;
+	struct fuse_conn *fc = get_fuse_conn(dir);
+
+	if (!fc->do_atomic_open)
+		return -ENOSYS;
+
+	err = fuse_atomic_do_common(dir, entry, alias, file, flags, mode,
+				    FUSE_ATOMIC_OPEN);
+	/* Atomic open imply atomic trunc as well i.e truncate should be performed
+	 * as part of atomic open call itself.
+	 */
+	if (!fc->atomic_o_trunc) {
+		if (fc->do_atomic_open)
+			fc->atomic_o_trunc = 1;
+	}
+
+	return err;
+}
+
 static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
 		      umode_t, dev_t);
 static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
@@ -702,12 +730,23 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 {
 	int err;
 	struct fuse_conn *fc = get_fuse_conn(dir);
-	struct dentry *res = NULL;
+	struct dentry *res = NULL, *alias = NULL;
 	bool create = flags & O_CREAT ? true : false;
 
 	if (fuse_is_bad(dir))
 		return -EIO;
 
+	if (!create) {
+		err = fuse_do_atomic_open(dir, entry, &alias,
+					  file, flags, mode);
+		res = alias;
+		if (!err || err != -ENOSYS)
+			goto out_dput;
+	}
+	 /*
+	  * ENOSYS fall back for open- user space does not have full
+	  * atomic open.
+	  */
 	if ((!create || fc->no_atomic_create) && d_in_lookup(entry)) {
 		res = fuse_lookup(dir, entry, 0);
 		if (IS_ERR(res))
@@ -730,9 +769,14 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	/* Libfuse/user space has not implemented atomic create, therefore
 	 * fall back to normal create.
 	 */
-	if (err == -ENOSYS)
-		err = fuse_create_open(dir, entry, file, flags, mode,
-				       FUSE_CREATE);
+	/* Atomic create+open operation
+	 * If the filesystem doesn't support this, then fall back to separate
+	 * 'mknod' + 'open' requests.
+	 */
+	if (err == -ENOSYS) {
+		err = fuse_atomic_do_common(dir, entry, NULL, file, flags,
+					    mode, FUSE_CREATE);
+	}
 	if (err == -ENOSYS) {
 		fc->no_create = 1;
 		goto mknod;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index d577a591ab16..24793b82303f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -669,6 +669,9 @@ struct fuse_conn {
 	/** Is open/release not implemented by fs? */
 	unsigned no_open:1;
 
+	/** Is atomic open implemented by fs ? */
+	unsigned do_atomic_open : 1;
+
 	/** Is atomic create not implemented by fs? */
 	unsigned no_atomic_create:1;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ee846ce371d8..5f667de69115 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1190,6 +1190,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 				fc->setxattr_ext = 1;
 			if (flags & FUSE_SECURITY_CTX)
 				fc->init_security = 1;
+			if (flags & FUSE_DO_ATOMIC_OPEN)
+				fc->do_atomic_open = 1;
 		} else {
 			ra_pages = fc->max_read / PAGE_SIZE;
 			fc->no_lock = 1;
@@ -1235,7 +1237,7 @@ void fuse_send_init(struct fuse_mount *fm)
 		FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
 		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
 		FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
-		FUSE_SECURITY_CTX;
+		FUSE_SECURITY_CTX | FUSE_DO_ATOMIC_OPEN;
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		flags |= FUSE_MAP_ALIGNMENT;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index e4b56004b148..ab91e391241a 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -391,6 +391,7 @@ struct fuse_file_lock {
 /* bits 32..63 get shifted down 32 bits into the flags2 field */
 #define FUSE_SECURITY_CTX	(1ULL << 32)
 #define FUSE_HAS_INODE_DAX	(1ULL << 33)
+#define FUSE_DO_ATOMIC_OPEN	(1ULL << 34)
 
 /**
  * CUSE INIT request/reply flags
@@ -540,6 +541,7 @@ enum fuse_opcode {
 	FUSE_REMOVEMAPPING	= 49,
 	FUSE_SYNCFS		= 50,
 	FUSE_ATOMIC_CREATE	= 51,
+	FUSE_ATOMIC_OPEN	= 52,
 
 	/* CUSE specific operations */
 	CUSE_INIT		= 4096,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate()
  2022-05-02 10:25 [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Dharmendra Singh
  2022-05-02 10:25 ` [PATCH v4 1/3] FUSE: Implement atomic lookup + create Dharmendra Singh
  2022-05-02 10:25 ` [PATCH v4 2/3] FUSE: Implement atomic lookup + open Dharmendra Singh
@ 2022-05-02 10:25 ` Dharmendra Singh
  2022-05-04 20:39   ` Vivek Goyal
  2022-05-04 19:18 ` [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Vivek Goyal
  3 siblings, 1 reply; 37+ messages in thread
From: Dharmendra Singh @ 2022-05-02 10:25 UTC (permalink / raw)
  To: miklos
  Cc: Dharmendra Singh, linux-fsdevel, fuse-devel, linux-kernel,
	bschubert, Dharmendra Singh

From: Dharmendra Singh <dsingh@ddn.com>

With atomic open + lookup implemented, it is possible
to avoid lookups in FUSE d_revalidate() for objects
other than directories.

If FUSE is mounted with default permissions then this
optimization is not possible as we need to fetch fresh
inode attributes for permission check. This lookup
skipped in d_revalidate() can be performed  as part of
open call into libfuse which is made from fuse_file_open().
And when we return from USER SPACE with file opened and
fresh attributes, we can revalidate the inode.

Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
Reported-by: kernel test robot <lkp@intel.com>

---
 fs/fuse/dir.c    | 89 ++++++++++++++++++++++++++++++++++++++++++------
 fs/fuse/file.c   | 30 ++++++++++++++--
 fs/fuse/fuse_i.h | 10 +++++-
 fs/fuse/ioctl.c  |  2 +-
 4 files changed, 115 insertions(+), 16 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 6879d3a86796..1594fecc920f 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -196,6 +196,7 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
  * the lookup once more.  If the lookup results in the same inode,
  * then refresh the attributes, timeouts and mark the dentry valid.
  */
+
 static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 {
 	struct inode *inode;
@@ -224,6 +225,17 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 
 		fm = get_fuse_mount(inode);
 
+		/* If atomic open is supported by FUSE then use this opportunity
+		 * (only for non-dir) to avoid this lookup and combine
+		 * lookup + open into single call.
+		 */
+		if (!fm->fc->default_permissions && fm->fc->do_atomic_open &&
+		    !(flags & (LOOKUP_EXCL | LOOKUP_REVAL)) &&
+		    (flags & LOOKUP_OPEN) && !S_ISDIR(inode->i_mode)) {
+			ret = 1;
+			goto out;
+		}
+
 		forget = fuse_alloc_forget();
 		ret = -ENOMEM;
 		if (!forget)
@@ -515,14 +527,52 @@ static int get_security_context(struct dentry *entry, umode_t mode,
 	return err;
 }
 
+/*
+ * Revalidate the inode after we got fresh attributes from user space.
+ */
+static int fuse_atomic_open_revalidate_inode(struct inode *reval_inode,
+					     struct dentry *entry,
+					     struct fuse_forget_link *forget,
+					     struct fuse_entry_out *outentry,
+					     u64 attr_version)
+{
+	struct fuse_inode *fi;
+	struct fuse_conn *fc = get_fuse_conn(reval_inode);
+
+	/* Mode should be other than directory */
+	BUG_ON(S_ISDIR(reval_inode->i_mode));
+
+	if (outentry->nodeid != get_node_id(reval_inode)) {
+		fuse_queue_forget(fc, forget, outentry->nodeid, 1);
+		return -ESTALE;
+	}
+	if (fuse_stale_inode(reval_inode, outentry->generation,
+			     &outentry->attr)) {
+		fuse_make_bad(reval_inode);
+		return -ESTALE;
+	}
+	fi = get_fuse_inode(reval_inode);
+	spin_lock(&fi->lock);
+	fi->nlookup++;
+	spin_unlock(&fi->lock);
+
+	forget_all_cached_acls(reval_inode);
+	fuse_change_attributes(reval_inode, &outentry->attr,
+			       entry_attr_timeout(outentry), attr_version);
+	fuse_change_entry_timeout(entry, outentry);
+	return 0;
+}
+
+
 /*
  * This is common function for initiating atomic operations into libfuse.
  * Currently being used by Atomic create(atomic lookup + create) and
  * Atomic open(atomic lookup + open).
  */
-static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
+static int fuse_do_atomic_common(struct inode *dir, struct dentry *entry,
 				 struct dentry **alias, struct file *file,
-				 unsigned int flags, umode_t mode, uint32_t opcode)
+				 struct inode *reval_inode, unsigned int flags,
+				 umode_t mode, uint32_t opcode)
 {
 	int err;
 	struct inode *inode;
@@ -541,6 +591,8 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
 	bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
 	bool create_op = (opcode == FUSE_CREATE ||
 			  opcode == FUSE_ATOMIC_CREATE) ? true : false;
+	u64 attr_version = fuse_get_attr_version(fm->fc);
+
 	if (alias)
 		*alias = NULL;
 
@@ -609,6 +661,20 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
 	ff->fh = outopen.fh;
 	ff->nodeid = outentry.nodeid;
 	ff->open_flags = outopen.open_flags;
+
+	/* Inode revalidation was bypassed previously for type other than
+	 * directories, revalidate now as we got fresh attributes.
+	 */
+	if (reval_inode) {
+		err = fuse_atomic_open_revalidate_inode(reval_inode, entry,
+							forget, &outentry,
+							attr_version);
+		if (err)
+			goto out_free_ff;
+		inode = reval_inode;
+		kfree(forget);
+		goto out_finish_open;
+	}
 	inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
 			  &outentry.attr, entry_attr_timeout(&outentry), 0);
 	if (!inode) {
@@ -659,6 +725,7 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
 	if (!atomic_create || (outopen.open_flags & FOPEN_FILE_CREATED))
 		fuse_dir_changed(dir);
 	err = finish_open(file, entry, generic_file_open);
+out_finish_open:
 	if (err) {
 		fi = get_fuse_inode(inode);
 		fuse_sync_release(fi, ff, flags);
@@ -686,7 +753,7 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
 	if (fc->no_atomic_create)
 		return -ENOSYS;
 
-	err = fuse_atomic_do_common(dir, entry, NULL, file, flags, mode,
+	err = fuse_do_atomic_common(dir, entry, NULL, file, NULL, flags, mode,
 				    FUSE_ATOMIC_CREATE);
 	/* If atomic create is not implemented then indicate in fc so that next
 	 * request falls back to normal create instead of going into libufse and
@@ -699,9 +766,10 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
 	return err;
 }
 
-static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
-				struct dentry **alias, struct file *file,
-				unsigned int flags, umode_t mode)
+int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
+			struct dentry **alias, struct file *file,
+			struct inode *reval_inode, unsigned int flags,
+			umode_t mode)
 {
 	int err;
 	struct fuse_conn *fc = get_fuse_conn(dir);
@@ -709,8 +777,8 @@ static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
 	if (!fc->do_atomic_open)
 		return -ENOSYS;
 
-	err = fuse_atomic_do_common(dir, entry, alias, file, flags, mode,
-				    FUSE_ATOMIC_OPEN);
+	err = fuse_do_atomic_common(dir, entry, alias, file, reval_inode, flags,
+				    mode, FUSE_ATOMIC_OPEN);
 	/* Atomic open imply atomic trunc as well i.e truncate should be performed
 	 * as part of atomic open call itself.
 	 */
@@ -718,7 +786,6 @@ static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
 		if (fc->do_atomic_open)
 			fc->atomic_o_trunc = 1;
 	}
-
 	return err;
 }
 
@@ -738,7 +805,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 
 	if (!create) {
 		err = fuse_do_atomic_open(dir, entry, &alias,
-					  file, flags, mode);
+					  file, NULL, flags, mode);
 		res = alias;
 		if (!err || err != -ENOSYS)
 			goto out_dput;
@@ -774,7 +841,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	 * 'mknod' + 'open' requests.
 	 */
 	if (err == -ENOSYS) {
-		err = fuse_atomic_do_common(dir, entry, NULL, file, flags,
+		err = fuse_do_atomic_common(dir, entry, NULL, file, NULL, flags,
 					    mode, FUSE_CREATE);
 	}
 	if (err == -ENOSYS) {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 829094451774..2b0548163249 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -125,11 +125,15 @@ static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
 }
 
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
-				 unsigned int open_flags, bool isdir)
+				 struct file *file, unsigned int open_flags,
+				 bool isdir)
 {
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_file *ff;
+	struct dentry *dentry = NULL;
+	struct dentry *parent = NULL;
 	int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
+	int ret;
 
 	ff = fuse_file_alloc(fm);
 	if (!ff)
@@ -138,6 +142,11 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 	ff->fh = 0;
 	/* Default for no-open */
 	ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
+
+	/* For directories we already had lookup */
+	if (!isdir && fc->do_atomic_open && file != NULL)
+		goto revalidate_atomic_open;
+
 	if (isdir ? !fc->no_opendir : !fc->no_open) {
 		struct fuse_open_out outarg;
 		int err;
@@ -164,12 +173,27 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 	ff->nodeid = nodeid;
 
 	return ff;
+
+revalidate_atomic_open:
+	dentry = file->f_path.dentry;
+	/* Get ref on parent */
+	parent = dget_parent(dentry);
+	open_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY);
+	ret = fuse_do_atomic_open(d_inode_rcu(parent), dentry, NULL, file,
+				  d_inode_rcu(dentry), open_flags, 0);
+	dput(parent);
+	if (ret)
+		goto err_out;
+	ff = file->private_data;
+	return ff;
+err_out:
+	return ERR_PTR(ret);
 }
 
 int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
 		 bool isdir)
 {
-	struct fuse_file *ff = fuse_file_open(fm, nodeid, file->f_flags, isdir);
+	struct fuse_file *ff = fuse_file_open(fm, nodeid, file, file->f_flags, isdir);
 
 	if (!IS_ERR(ff))
 		file->private_data = ff;
@@ -252,7 +276,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
 	}
 
 	err = fuse_do_open(fm, get_node_id(inode), file, isdir);
-	if (!err)
+	if (!err && (!fc->do_atomic_open || isdir))
 		fuse_finish_open(inode, file);
 
 out:
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 24793b82303f..5c83e4249b7e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1014,6 +1014,13 @@ void fuse_finish_open(struct inode *inode, struct file *file);
 void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
 		       unsigned int flags);
 
+/**
+ * Send atomic create + open or lookup + open
+ */
+int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
+			struct dentry **alias, struct file *file,
+			struct inode *reval_inode, unsigned int flags,
+			umode_t mode);
 /**
  * Send RELEASE or RELEASEDIR request
  */
@@ -1317,7 +1324,8 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 /* file.c */
 
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
-				 unsigned int open_flags, bool isdir);
+				 struct file *file, unsigned int open_flags,
+				 bool isdir);
 void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index fbc09dab1f85..63106a54ba1a 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -408,7 +408,7 @@ static struct fuse_file *fuse_priv_ioctl_prepare(struct inode *inode)
 	if (!S_ISREG(inode->i_mode) && !isdir)
 		return ERR_PTR(-ENOTTY);
 
-	return fuse_file_open(fm, get_node_id(inode), O_RDONLY, isdir);
+	return fuse_file_open(fm, get_node_id(inode), NULL, O_RDONLY, isdir);
 }
 
 static void fuse_priv_ioctl_cleanup(struct inode *inode, struct fuse_file *ff)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-02 10:25 ` [PATCH v4 1/3] FUSE: Implement atomic lookup + create Dharmendra Singh
@ 2022-05-03 12:43   ` Vivek Goyal
  2022-05-03 14:13   ` Vivek Goyal
  2022-05-03 19:53   ` Vivek Goyal
  2 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2022-05-03 12:43 UTC (permalink / raw)
  To: Dharmendra Singh
  Cc: miklos, linux-fsdevel, fuse-devel, linux-kernel, bschubert,
	Dharmendra Singh

On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> From: Dharmendra Singh <dsingh@ddn.com>
> 
> When we go for creating a file (O_CREAT), we trigger
> a lookup to FUSE USER SPACE. It is very  much likely
> that file does not exist yet as O_CREAT is passed to
> open(). This lookup can be avoided and can be performed
> as part of create call into libfuse.
> 
> This lookup + create in single call to libfuse and finally
> to USER SPACE has been named as atomic create.

Looking at it and trying to understand if lookup can be avoided. But
before that this naming of calling it atomic create seems confusing.

We already have fuse_create_open() which we are calling it as 
"Atomic create+open operation". Now looks like this is extension
to this operation where we are doing "Atomic lookup + create + open".
Do I understand it correctly?

Thanks
Vivek

> It is expected
> that USER SPACE create the file, open it and fills in the
> attributes which are then used to make inode stand/revalidate
> in the kernel cache. Also if file was newly created(does not
> exist yet by this time) in USER SPACE then it should be indicated
> in `struct fuse_file_info` by setting a bit which is again used by
> libfuse to send some flags back to fuse kernel to indicate that
> that file was newly created. These flags are used by kernel to
> indicate changes in parent directory.
> 
> Fuse kernel automatically detects if atomic create is implemented
> by libfuse/USER SPACE or not. And depending upon the outcome of
> this check all further creates are decided to be atomic or non-atomic
> creates.
> 
> If libfuse/USER SPACE has not implemented the atomic create operation
> then by default behaviour remains same i.e we do not optimize lookup
> calls which are triggered before create calls into libfuse.
> 
> Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> ---
>  fs/fuse/dir.c             | 82 +++++++++++++++++++++++++++++++++++----
>  fs/fuse/fuse_i.h          |  3 ++
>  include/uapi/linux/fuse.h |  3 ++
>  3 files changed, 81 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 656e921f3506..cad3322a007f 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -523,7 +523,7 @@ static int get_security_context(struct dentry *entry, umode_t mode,
>   */
>  static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  			    struct file *file, unsigned int flags,
> -			    umode_t mode)
> +			    umode_t mode, uint32_t opcode)
>  {
>  	int err;
>  	struct inode *inode;
> @@ -535,8 +535,10 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  	struct fuse_entry_out outentry;
>  	struct fuse_inode *fi;
>  	struct fuse_file *ff;
> +	struct dentry *res = NULL;
>  	void *security_ctx = NULL;
>  	u32 security_ctxlen;
> +	bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
>  
>  	/* Userspace expects S_IFREG in create mode */
>  	BUG_ON((mode & S_IFMT) != S_IFREG);
> @@ -566,7 +568,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  		inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
>  	}
>  
> -	args.opcode = FUSE_CREATE;
> +	args.opcode = opcode;
>  	args.nodeid = get_node_id(dir);
>  	args.in_numargs = 2;
>  	args.in_args[0].size = sizeof(inarg);
> @@ -613,9 +615,44 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  		goto out_err;
>  	}
>  	kfree(forget);
> -	d_instantiate(entry, inode);
> +	/*
> +	 * In atomic create, we skipped lookup and it is very much likely that
> +	 * dentry has DCACHE_PAR_LOOKUP flag set on it so call d_splice_alias().
> +	 * Note: Only REG file is allowed under create/atomic create.
> +	 */
> +	/* There is special case when at very first call where we check if
> +	 * atomic create is implemented by USER SPACE/libfuse or not, we
> +	 * skipped lookup. Now, in case where atomic create is not implemented
> +	 * underlying, we fall back to FUSE_CREATE. here we are required to handle
> +	 * DCACHE_PAR_LOOKUP flag.
> +	 */
> +	if (!atomic_create && !d_in_lookup(entry) && fm->fc->no_atomic_create)
> +		d_instantiate(entry, inode);
> +	else {
> +		res = d_splice_alias(inode, entry);
> +		if (res) {
> +			 /* Close the file in user space, but do not unlink it,
> +			  * if it was created - with network file systems other
> +			  * clients might have already accessed it.
> +			  */
> +			if (IS_ERR(res)) {
> +				fi = get_fuse_inode(inode);
> +				fuse_sync_release(fi, ff, flags);
> +				fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
> +				err = PTR_ERR(res);
> +				goto out_err;
> +			}
> +			/* res is expected to be NULL since its REG file */
> +			WARN_ON(res);
> +		}
> +	}
>  	fuse_change_entry_timeout(entry, &outentry);
> -	fuse_dir_changed(dir);
> +	/*
> +	 * In case of atomic create, we want to indicate directory change
> +	 * only if USER SPACE actually created the file.
> +	 */
> +	if (!atomic_create || (outopen.open_flags & FOPEN_FILE_CREATED))
> +		fuse_dir_changed(dir);
>  	err = finish_open(file, entry, generic_file_open);
>  	if (err) {
>  		fi = get_fuse_inode(inode);
> @@ -634,6 +671,29 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  	return err;
>  }
>  
> +static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
> +			      struct file *file, unsigned int flags,
> +			      umode_t mode)
> +{
> +	int err;
> +	struct fuse_conn *fc = get_fuse_conn(dir);
> +
> +	if (fc->no_atomic_create)
> +		return -ENOSYS;
> +
> +	err = fuse_create_open(dir, entry, file, flags, mode,
> +			       FUSE_ATOMIC_CREATE);
> +	/* If atomic create is not implemented then indicate in fc so that next
> +	 * request falls back to normal create instead of going into libufse and
> +	 * returning with -ENOSYS.
> +	 */
> +	if (err == -ENOSYS) {
> +		if (!fc->no_atomic_create)
> +			fc->no_atomic_create = 1;
> +	}
> +	return err;
> +}
> +
>  static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
>  		      umode_t, dev_t);
>  static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> @@ -643,11 +703,12 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  	int err;
>  	struct fuse_conn *fc = get_fuse_conn(dir);
>  	struct dentry *res = NULL;
> +	bool create = flags & O_CREAT ? true : false;
>  
>  	if (fuse_is_bad(dir))
>  		return -EIO;
>  
> -	if (d_in_lookup(entry)) {
> +	if ((!create || fc->no_atomic_create) && d_in_lookup(entry)) {
>  		res = fuse_lookup(dir, entry, 0);
>  		if (IS_ERR(res))
>  			return PTR_ERR(res);
> @@ -656,7 +717,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  			entry = res;
>  	}
>  
> -	if (!(flags & O_CREAT) || d_really_is_positive(entry))
> +	if (!create || d_really_is_positive(entry))
>  		goto no_open;
>  
>  	/* Only creates */
> @@ -665,7 +726,13 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  	if (fc->no_create)
>  		goto mknod;
>  
> -	err = fuse_create_open(dir, entry, file, flags, mode);
> +	err = fuse_atomic_create(dir, entry, file, flags, mode);
> +	/* Libfuse/user space has not implemented atomic create, therefore
> +	 * fall back to normal create.
> +	 */
> +	if (err == -ENOSYS)
> +		err = fuse_create_open(dir, entry, file, flags, mode,
> +				       FUSE_CREATE);
>  	if (err == -ENOSYS) {
>  		fc->no_create = 1;
>  		goto mknod;
> @@ -683,6 +750,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  }
>  
>  /*
> +
>   * Code shared between mknod, mkdir, symlink and link
>   */
>  static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e8e59fbdefeb..d577a591ab16 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -669,6 +669,9 @@ struct fuse_conn {
>  	/** Is open/release not implemented by fs? */
>  	unsigned no_open:1;
>  
> +	/** Is atomic create not implemented by fs? */
> +	unsigned no_atomic_create:1;
> +
>  	/** Is opendir/releasedir not implemented by fs? */
>  	unsigned no_opendir:1;
>  
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index d6ccee961891..e4b56004b148 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -301,6 +301,7 @@ struct fuse_file_lock {
>   * FOPEN_CACHE_DIR: allow caching this directory
>   * FOPEN_STREAM: the file is stream-like (no file position at all)
>   * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
> + * FOPEN_FILE_CREATED: the file was actually created
>   */
>  #define FOPEN_DIRECT_IO		(1 << 0)
>  #define FOPEN_KEEP_CACHE	(1 << 1)
> @@ -308,6 +309,7 @@ struct fuse_file_lock {
>  #define FOPEN_CACHE_DIR		(1 << 3)
>  #define FOPEN_STREAM		(1 << 4)
>  #define FOPEN_NOFLUSH		(1 << 5)
> +#define FOPEN_FILE_CREATED	(1 << 6)
>  
>  /**
>   * INIT request/reply flags
> @@ -537,6 +539,7 @@ enum fuse_opcode {
>  	FUSE_SETUPMAPPING	= 48,
>  	FUSE_REMOVEMAPPING	= 49,
>  	FUSE_SYNCFS		= 50,
> +	FUSE_ATOMIC_CREATE	= 51,
>  
>  	/* CUSE specific operations */
>  	CUSE_INIT		= 4096,
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-02 10:25 ` [PATCH v4 1/3] FUSE: Implement atomic lookup + create Dharmendra Singh
  2022-05-03 12:43   ` Vivek Goyal
@ 2022-05-03 14:13   ` Vivek Goyal
  2022-05-03 19:53   ` Vivek Goyal
  2 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2022-05-03 14:13 UTC (permalink / raw)
  To: Dharmendra Singh
  Cc: miklos, linux-fsdevel, fuse-devel, linux-kernel, bschubert,
	Dharmendra Singh

On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> From: Dharmendra Singh <dsingh@ddn.com>
> 
> When we go for creating a file (O_CREAT), we trigger
> a lookup to FUSE USER SPACE. It is very  much likely
> that file does not exist yet as O_CREAT is passed to
> open(). This lookup can be avoided and can be performed
> as part of create call into libfuse.
> 
> This lookup + create in single call to libfuse and finally
> to USER SPACE has been named as atomic create. It is expected
> that USER SPACE create the file, open it and fills in the
> attributes which are then used to make inode stand/revalidate
> in the kernel cache. Also if file was newly created(does not
> exist yet by this time) in USER SPACE then it should be indicated
> in `struct fuse_file_info` by setting a bit which is again used by
> libfuse to send some flags back to fuse kernel to indicate that
> that file was newly created. These flags are used by kernel to
> indicate changes in parent directory.
> 
> Fuse kernel automatically detects if atomic create is implemented
> by libfuse/USER SPACE or not. And depending upon the outcome of
> this check all further creates are decided to be atomic or non-atomic
> creates.
> 
> If libfuse/USER SPACE has not implemented the atomic create operation
> then by default behaviour remains same i.e we do not optimize lookup
> calls which are triggered before create calls into libfuse.
> 
> Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> ---
>  fs/fuse/dir.c             | 82 +++++++++++++++++++++++++++++++++++----
>  fs/fuse/fuse_i.h          |  3 ++
>  include/uapi/linux/fuse.h |  3 ++
>  3 files changed, 81 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 656e921f3506..cad3322a007f 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -523,7 +523,7 @@ static int get_security_context(struct dentry *entry, umode_t mode,
>   */
>  static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  			    struct file *file, unsigned int flags,
> -			    umode_t mode)
> +			    umode_t mode, uint32_t opcode)
>  {
>  	int err;
>  	struct inode *inode;
> @@ -535,8 +535,10 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  	struct fuse_entry_out outentry;
>  	struct fuse_inode *fi;
>  	struct fuse_file *ff;
> +	struct dentry *res = NULL;
>  	void *security_ctx = NULL;
>  	u32 security_ctxlen;
> +	bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
>  
>  	/* Userspace expects S_IFREG in create mode */
>  	BUG_ON((mode & S_IFMT) != S_IFREG);
> @@ -566,7 +568,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  		inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
>  	}
>  
> -	args.opcode = FUSE_CREATE;
> +	args.opcode = opcode;
>  	args.nodeid = get_node_id(dir);
>  	args.in_numargs = 2;
>  	args.in_args[0].size = sizeof(inarg);
> @@ -613,9 +615,44 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  		goto out_err;
>  	}
>  	kfree(forget);
> -	d_instantiate(entry, inode);
> +	/*
> +	 * In atomic create, we skipped lookup and it is very much likely that
> +	 * dentry has DCACHE_PAR_LOOKUP flag set on it so call d_splice_alias().

Can you please help me understand what does DCACHE_PAR_LOOKUP flag mean.
Also how d_splice_alias() helps in that case.


> +	 * Note: Only REG file is allowed under create/atomic create.
> +	 */
> +	/* There is special case when at very first call where we check if
> +	 * atomic create is implemented by USER SPACE/libfuse or not, we
> +	 * skipped lookup. Now, in case where atomic create is not implemented
> +	 * underlying, we fall back to FUSE_CREATE. here we are required to handle
> +	 * DCACHE_PAR_LOOKUP flag.
> +	 */
> +	if (!atomic_create && !d_in_lookup(entry) && fm->fc->no_atomic_create)
> +		d_instantiate(entry, inode);

So if we are trying to do atomic_create first time, then we skipped lookup
in fuse_atomic_open() (assuming DCACHE_PAR_LOOKUP flag is set). If server
does not support, atomic_create, then we will set
fm->fc->no_atomic_create=1 and non-atomic create will be tried and we will
come here. And given we skipped lookup (even for non-atomic-create) case,
we are going to call d_splice_alias(). Right?

IOW, even without non-atomic-create, you are skipping lookup() and be
able to handle file creation and use d_splice_alias(). If this is
correct, I am wondering why do I need atomic create to begin with.

If this is not correct, then fuse_atomic_open() probably should handle
error from fuse_create_open(FUSE_ATOMIC_CREATE) first, retry lookup
and then retry fuse_create_open(FUSE_CREATE).

What am I missing?

Thanks
Vivek



> +	else {
> +		res = d_splice_alias(inode, entry);
> +		if (res) {
> +			 /* Close the file in user space, but do not unlink it,
> +			  * if it was created - with network file systems other
> +			  * clients might have already accessed it.
> +			  */
> +			if (IS_ERR(res)) {
> +				fi = get_fuse_inode(inode);
> +				fuse_sync_release(fi, ff, flags);
> +				fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
> +				err = PTR_ERR(res);
> +				goto out_err;
> +			}
> +			/* res is expected to be NULL since its REG file */
> +			WARN_ON(res);
> +		}
> +	}


>  	fuse_change_entry_timeout(entry, &outentry);
> -	fuse_dir_changed(dir);
> +	/*
> +	 * In case of atomic create, we want to indicate directory change
> +	 * only if USER SPACE actually created the file.
> +	 */
> +	if (!atomic_create || (outopen.open_flags & FOPEN_FILE_CREATED))
> +		fuse_dir_changed(dir);
>  	err = finish_open(file, entry, generic_file_open);
>  	if (err) {
>  		fi = get_fuse_inode(inode);
> @@ -634,6 +671,29 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  	return err;
>  }
>  
> +static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
> +			      struct file *file, unsigned int flags,
> +			      umode_t mode)
> +{
> +	int err;
> +	struct fuse_conn *fc = get_fuse_conn(dir);
> +
> +	if (fc->no_atomic_create)
> +		return -ENOSYS;
> +
> +	err = fuse_create_open(dir, entry, file, flags, mode,
> +			       FUSE_ATOMIC_CREATE);
> +	/* If atomic create is not implemented then indicate in fc so that next
> +	 * request falls back to normal create instead of going into libufse and
> +	 * returning with -ENOSYS.
> +	 */
> +	if (err == -ENOSYS) {
> +		if (!fc->no_atomic_create)
> +			fc->no_atomic_create = 1;
> +	}
> +	return err;
> +}
> +
>  static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
>  		      umode_t, dev_t);
>  static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> @@ -643,11 +703,12 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  	int err;
>  	struct fuse_conn *fc = get_fuse_conn(dir);
>  	struct dentry *res = NULL;
> +	bool create = flags & O_CREAT ? true : false;
>  
>  	if (fuse_is_bad(dir))
>  		return -EIO;
>  
> -	if (d_in_lookup(entry)) {
> +	if ((!create || fc->no_atomic_create) && d_in_lookup(entry)) {
>  		res = fuse_lookup(dir, entry, 0);
>  		if (IS_ERR(res))
>  			return PTR_ERR(res);
> @@ -656,7 +717,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  			entry = res;
>  	}
>  
> -	if (!(flags & O_CREAT) || d_really_is_positive(entry))
> +	if (!create || d_really_is_positive(entry))
>  		goto no_open;
>  
>  	/* Only creates */
> @@ -665,7 +726,13 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  	if (fc->no_create)
>  		goto mknod;
>  
> -	err = fuse_create_open(dir, entry, file, flags, mode);
> +	err = fuse_atomic_create(dir, entry, file, flags, mode);
> +	/* Libfuse/user space has not implemented atomic create, therefore
> +	 * fall back to normal create.
> +	 */
> +	if (err == -ENOSYS)
> +		err = fuse_create_open(dir, entry, file, flags, mode,
> +				       FUSE_CREATE);
>  	if (err == -ENOSYS) {
>  		fc->no_create = 1;
>  		goto mknod;
> @@ -683,6 +750,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  }
>  
>  /*
> +
>   * Code shared between mknod, mkdir, symlink and link
>   */
>  static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e8e59fbdefeb..d577a591ab16 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -669,6 +669,9 @@ struct fuse_conn {
>  	/** Is open/release not implemented by fs? */
>  	unsigned no_open:1;
>  
> +	/** Is atomic create not implemented by fs? */
> +	unsigned no_atomic_create:1;
> +
>  	/** Is opendir/releasedir not implemented by fs? */
>  	unsigned no_opendir:1;
>  
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index d6ccee961891..e4b56004b148 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -301,6 +301,7 @@ struct fuse_file_lock {
>   * FOPEN_CACHE_DIR: allow caching this directory
>   * FOPEN_STREAM: the file is stream-like (no file position at all)
>   * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
> + * FOPEN_FILE_CREATED: the file was actually created
>   */
>  #define FOPEN_DIRECT_IO		(1 << 0)
>  #define FOPEN_KEEP_CACHE	(1 << 1)
> @@ -308,6 +309,7 @@ struct fuse_file_lock {
>  #define FOPEN_CACHE_DIR		(1 << 3)
>  #define FOPEN_STREAM		(1 << 4)
>  #define FOPEN_NOFLUSH		(1 << 5)
> +#define FOPEN_FILE_CREATED	(1 << 6)
>  
>  /**
>   * INIT request/reply flags
> @@ -537,6 +539,7 @@ enum fuse_opcode {
>  	FUSE_SETUPMAPPING	= 48,
>  	FUSE_REMOVEMAPPING	= 49,
>  	FUSE_SYNCFS		= 50,
> +	FUSE_ATOMIC_CREATE	= 51,
>  
>  	/* CUSE specific operations */
>  	CUSE_INIT		= 4096,
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-02 10:25 ` [PATCH v4 1/3] FUSE: Implement atomic lookup + create Dharmendra Singh
  2022-05-03 12:43   ` Vivek Goyal
  2022-05-03 14:13   ` Vivek Goyal
@ 2022-05-03 19:53   ` Vivek Goyal
  2022-05-03 20:48     ` Bernd Schubert
  2022-05-04  4:26     ` Dharmendra Hans
  2 siblings, 2 replies; 37+ messages in thread
From: Vivek Goyal @ 2022-05-03 19:53 UTC (permalink / raw)
  To: Dharmendra Singh
  Cc: miklos, linux-fsdevel, fuse-devel, linux-kernel, bschubert,
	Dharmendra Singh

On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> From: Dharmendra Singh <dsingh@ddn.com>
> 
> When we go for creating a file (O_CREAT), we trigger
> a lookup to FUSE USER SPACE. It is very  much likely
> that file does not exist yet as O_CREAT is passed to
> open(). This lookup can be avoided and can be performed
> as part of create call into libfuse.
> 
> This lookup + create in single call to libfuse and finally
> to USER SPACE has been named as atomic create. It is expected
> that USER SPACE create the file, open it and fills in the
> attributes which are then used to make inode stand/revalidate
> in the kernel cache. Also if file was newly created(does not
> exist yet by this time) in USER SPACE then it should be indicated
> in `struct fuse_file_info` by setting a bit which is again used by
> libfuse to send some flags back to fuse kernel to indicate that
> that file was newly created. These flags are used by kernel to
> indicate changes in parent directory.

Reading the existing code a little bit more and trying to understand
existing semantics. And that will help me unerstand what new is being
done.

So current fuse_atomic_open() does following.

A. Looks up dentry (if d_in_lookup() is set).
B. If dentry is positive or O_CREAT is not set, return. 
C. If server supports atomic create + open, use that to create file and
   open it as well.
D. If server does not support atomic create + open, just create file
   using "mknod" and return. VFS will take care of opening the file.

Now with this patch, new flow is.

A. Look up dentry if d_in_lookup() is set as well as either file is not
   being created or fc->no_atomic_create is set. This basiclally means
   skip lookup if atomic_create is supported and file is being created.

B. Remains same. if dentry is positive or O_CREATE is not set, return.

C. If server supports new atomic_create(), use that.

D. If not, if server supports atomic create + open, use that

E. If not, fall back to mknod and do not open file.

So to me this new functionality is basically atomic "lookup + create +
open"?

Or may be not. I see we check "fc->no_create" and fallback to mknod.

        if (fc->no_create)
                goto mknod;

So fc->no_create is representing both old atomic "create + open" as well
as new "lookup + create + open" ?

It might be obvious to you, but it is not to me. So will be great if
you shed some light on this. 

Thanks
Vivek


> 
> Fuse kernel automatically detects if atomic create is implemented
> by libfuse/USER SPACE or not. And depending upon the outcome of
> this check all further creates are decided to be atomic or non-atomic
> creates.
> 
> If libfuse/USER SPACE has not implemented the atomic create operation
> then by default behaviour remains same i.e we do not optimize lookup
> calls which are triggered before create calls into libfuse.
> 
> Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> ---
>  fs/fuse/dir.c             | 82 +++++++++++++++++++++++++++++++++++----
>  fs/fuse/fuse_i.h          |  3 ++
>  include/uapi/linux/fuse.h |  3 ++
>  3 files changed, 81 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 656e921f3506..cad3322a007f 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -523,7 +523,7 @@ static int get_security_context(struct dentry *entry, umode_t mode,
>   */
>  static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  			    struct file *file, unsigned int flags,
> -			    umode_t mode)
> +			    umode_t mode, uint32_t opcode)
>  {
>  	int err;
>  	struct inode *inode;
> @@ -535,8 +535,10 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  	struct fuse_entry_out outentry;
>  	struct fuse_inode *fi;
>  	struct fuse_file *ff;
> +	struct dentry *res = NULL;
>  	void *security_ctx = NULL;
>  	u32 security_ctxlen;
> +	bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
>  
>  	/* Userspace expects S_IFREG in create mode */
>  	BUG_ON((mode & S_IFMT) != S_IFREG);
> @@ -566,7 +568,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  		inarg.open_flags |= FUSE_OPEN_KILL_SUIDGID;
>  	}
>  
> -	args.opcode = FUSE_CREATE;
> +	args.opcode = opcode;
>  	args.nodeid = get_node_id(dir);
>  	args.in_numargs = 2;
>  	args.in_args[0].size = sizeof(inarg);
> @@ -613,9 +615,44 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  		goto out_err;
>  	}
>  	kfree(forget);
> -	d_instantiate(entry, inode);
> +	/*
> +	 * In atomic create, we skipped lookup and it is very much likely that
> +	 * dentry has DCACHE_PAR_LOOKUP flag set on it so call d_splice_alias().
> +	 * Note: Only REG file is allowed under create/atomic create.
> +	 */
> +	/* There is special case when at very first call where we check if
> +	 * atomic create is implemented by USER SPACE/libfuse or not, we
> +	 * skipped lookup. Now, in case where atomic create is not implemented
> +	 * underlying, we fall back to FUSE_CREATE. here we are required to handle
> +	 * DCACHE_PAR_LOOKUP flag.
> +	 */
> +	if (!atomic_create && !d_in_lookup(entry) && fm->fc->no_atomic_create)
> +		d_instantiate(entry, inode);
> +	else {
> +		res = d_splice_alias(inode, entry);
> +		if (res) {
> +			 /* Close the file in user space, but do not unlink it,
> +			  * if it was created - with network file systems other
> +			  * clients might have already accessed it.
> +			  */
> +			if (IS_ERR(res)) {
> +				fi = get_fuse_inode(inode);
> +				fuse_sync_release(fi, ff, flags);
> +				fuse_queue_forget(fm->fc, forget, outentry.nodeid, 1);
> +				err = PTR_ERR(res);
> +				goto out_err;
> +			}
> +			/* res is expected to be NULL since its REG file */
> +			WARN_ON(res);
> +		}
> +	}
>  	fuse_change_entry_timeout(entry, &outentry);
> -	fuse_dir_changed(dir);
> +	/*
> +	 * In case of atomic create, we want to indicate directory change
> +	 * only if USER SPACE actually created the file.
> +	 */
> +	if (!atomic_create || (outopen.open_flags & FOPEN_FILE_CREATED))
> +		fuse_dir_changed(dir);
>  	err = finish_open(file, entry, generic_file_open);
>  	if (err) {
>  		fi = get_fuse_inode(inode);
> @@ -634,6 +671,29 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  	return err;
>  }
>  
> +static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
> +			      struct file *file, unsigned int flags,
> +			      umode_t mode)
> +{
> +	int err;
> +	struct fuse_conn *fc = get_fuse_conn(dir);
> +
> +	if (fc->no_atomic_create)
> +		return -ENOSYS;
> +
> +	err = fuse_create_open(dir, entry, file, flags, mode,
> +			       FUSE_ATOMIC_CREATE);
> +	/* If atomic create is not implemented then indicate in fc so that next
> +	 * request falls back to normal create instead of going into libufse and
> +	 * returning with -ENOSYS.
> +	 */
> +	if (err == -ENOSYS) {
> +		if (!fc->no_atomic_create)
> +			fc->no_atomic_create = 1;
> +	}
> +	return err;
> +}
> +
>  static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
>  		      umode_t, dev_t);
>  static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> @@ -643,11 +703,12 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  	int err;
>  	struct fuse_conn *fc = get_fuse_conn(dir);
>  	struct dentry *res = NULL;
> +	bool create = flags & O_CREAT ? true : false;
>  
>  	if (fuse_is_bad(dir))
>  		return -EIO;
>  
> -	if (d_in_lookup(entry)) {
> +	if ((!create || fc->no_atomic_create) && d_in_lookup(entry)) {
>  		res = fuse_lookup(dir, entry, 0);
>  		if (IS_ERR(res))
>  			return PTR_ERR(res);
> @@ -656,7 +717,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  			entry = res;
>  	}
>  
> -	if (!(flags & O_CREAT) || d_really_is_positive(entry))
> +	if (!create || d_really_is_positive(entry))
>  		goto no_open;
>  
>  	/* Only creates */
> @@ -665,7 +726,13 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  	if (fc->no_create)
>  		goto mknod;
>  
> -	err = fuse_create_open(dir, entry, file, flags, mode);
> +	err = fuse_atomic_create(dir, entry, file, flags, mode);
> +	/* Libfuse/user space has not implemented atomic create, therefore
> +	 * fall back to normal create.
> +	 */
> +	if (err == -ENOSYS)
> +		err = fuse_create_open(dir, entry, file, flags, mode,
> +				       FUSE_CREATE);
>  	if (err == -ENOSYS) {
>  		fc->no_create = 1;
>  		goto mknod;
> @@ -683,6 +750,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  }
>  
>  /*
> +
>   * Code shared between mknod, mkdir, symlink and link
>   */
>  static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e8e59fbdefeb..d577a591ab16 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -669,6 +669,9 @@ struct fuse_conn {
>  	/** Is open/release not implemented by fs? */
>  	unsigned no_open:1;
>  
> +	/** Is atomic create not implemented by fs? */
> +	unsigned no_atomic_create:1;
> +
>  	/** Is opendir/releasedir not implemented by fs? */
>  	unsigned no_opendir:1;
>  
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index d6ccee961891..e4b56004b148 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -301,6 +301,7 @@ struct fuse_file_lock {
>   * FOPEN_CACHE_DIR: allow caching this directory
>   * FOPEN_STREAM: the file is stream-like (no file position at all)
>   * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
> + * FOPEN_FILE_CREATED: the file was actually created
>   */
>  #define FOPEN_DIRECT_IO		(1 << 0)
>  #define FOPEN_KEEP_CACHE	(1 << 1)
> @@ -308,6 +309,7 @@ struct fuse_file_lock {
>  #define FOPEN_CACHE_DIR		(1 << 3)
>  #define FOPEN_STREAM		(1 << 4)
>  #define FOPEN_NOFLUSH		(1 << 5)
> +#define FOPEN_FILE_CREATED	(1 << 6)
>  
>  /**
>   * INIT request/reply flags
> @@ -537,6 +539,7 @@ enum fuse_opcode {
>  	FUSE_SETUPMAPPING	= 48,
>  	FUSE_REMOVEMAPPING	= 49,
>  	FUSE_SYNCFS		= 50,
> +	FUSE_ATOMIC_CREATE	= 51,
>  
>  	/* CUSE specific operations */
>  	CUSE_INIT		= 4096,
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-03 19:53   ` Vivek Goyal
@ 2022-05-03 20:48     ` Bernd Schubert
  2022-05-04  4:26     ` Dharmendra Hans
  1 sibling, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2022-05-03 20:48 UTC (permalink / raw)
  To: Vivek Goyal, Dharmendra Singh
  Cc: miklos, linux-fsdevel, fuse-devel, linux-kernel, Dharmendra Singh

Hi Vivek,

On 5/3/22 21:53, Vivek Goyal wrote:
> Reading the existing code a little bit more and trying to understand
> existing semantics. And that will help me unerstand what new is being
> done.
> 
> So current fuse_atomic_open() does following.
> 
> A. Looks up dentry (if d_in_lookup() is set).
> B. If dentry is positive or O_CREAT is not set, return.
> C. If server supports atomic create + open, use that to create file and
>     open it as well.
> D. If server does not support atomic create + open, just create file
>     using "mknod" and return. VFS will take care of opening the file.
> 
> Now with this patch, new flow is.
> 
> A. Look up dentry if d_in_lookup() is set as well as either file is not
>     being created or fc->no_atomic_create is set. This basiclally means
>     skip lookup if atomic_create is supported and file is being created.
> 
> B. Remains same. if dentry is positive or O_CREATE is not set, return.
> 
> C. If server supports new atomic_create(), use that.
> 
> D. If not, if server supports atomic create + open, use that
> 
> E. If not, fall back to mknod and do not open file.
> 
> So to me this new functionality is basically atomic "lookup + create +
> open"?
> 
> Or may be not. I see we check "fc->no_create" and fallback to mknod.
> 
>          if (fc->no_create)
>                  goto mknod;
> 
> So fc->no_create is representing both old atomic "create + open" as well
> as new "lookup + create + open" ?
> 
> It might be obvious to you, but it is not to me. So will be great if
> you shed some light on this.

we are going to reply more in detail tomorrow, it gets rather late for 
me as well. Anyway yes, goal is basically to avoid lookups as much 
possible.
AFAIK, lookup-intents had been first introduced years ago by Lustre 
developers - I guess to reduce network and server load - same reason for 
us. Later Miklos had introduced atomic_open, which makes code 
using/avoiding lookup much easier to read.
I guess unoticed that time, fuse was not fully using all possibilities 
of atomic-open - we can see quite some lookup/revalidate traffic for our 
file system.


I guess the commit message and introduction letter should be updated 
with your A/B/C/D/E scheme. A) changes a bit in patch 2/3, which extents 
it to normal file open, and patch 3/3 adds in revalidate.


Hope it helps,
Bernd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-03 19:53   ` Vivek Goyal
  2022-05-03 20:48     ` Bernd Schubert
@ 2022-05-04  4:26     ` Dharmendra Hans
  2022-05-04 14:47       ` Vivek Goyal
  1 sibling, 1 reply; 37+ messages in thread
From: Dharmendra Hans @ 2022-05-04  4:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Bernd Schubert, Dharmendra Singh

On Wed, May 4, 2022 at 1:24 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> > From: Dharmendra Singh <dsingh@ddn.com>
> >
> > When we go for creating a file (O_CREAT), we trigger
> > a lookup to FUSE USER SPACE. It is very  much likely
> > that file does not exist yet as O_CREAT is passed to
> > open(). This lookup can be avoided and can be performed
> > as part of create call into libfuse.
> >
> > This lookup + create in single call to libfuse and finally
> > to USER SPACE has been named as atomic create. It is expected
> > that USER SPACE create the file, open it and fills in the
> > attributes which are then used to make inode stand/revalidate
> > in the kernel cache. Also if file was newly created(does not
> > exist yet by this time) in USER SPACE then it should be indicated
> > in `struct fuse_file_info` by setting a bit which is again used by
> > libfuse to send some flags back to fuse kernel to indicate that
> > that file was newly created. These flags are used by kernel to
> > indicate changes in parent directory.
>
> Reading the existing code a little bit more and trying to understand
> existing semantics. And that will help me unerstand what new is being
> done.
>
> So current fuse_atomic_open() does following.
>
> A. Looks up dentry (if d_in_lookup() is set).
> B. If dentry is positive or O_CREAT is not set, return.
> C. If server supports atomic create + open, use that to create file and
>    open it as well.
> D. If server does not support atomic create + open, just create file
>    using "mknod" and return. VFS will take care of opening the file.
>
> Now with this patch, new flow is.
>
> A. Look up dentry if d_in_lookup() is set as well as either file is not
>    being created or fc->no_atomic_create is set. This basiclally means
>    skip lookup if atomic_create is supported and file is being created.
>
> B. Remains same. if dentry is positive or O_CREATE is not set, return.
>
> C. If server supports new atomic_create(), use that.
>
> D. If not, if server supports atomic create + open, use that
>
> E. If not, fall back to mknod and do not open file.
>
> So to me this new functionality is basically atomic "lookup + create +
> open"?
>
> Or may be not. I see we check "fc->no_create" and fallback to mknod.
>
>         if (fc->no_create)
>                 goto mknod;
>
> So fc->no_create is representing both old atomic "create + open" as well
> as new "lookup + create + open" ?
>
> It might be obvious to you, but it is not to me. So will be great if
> you shed some light on this.
>

I think you got it right now. New atomic create does what you
mentioned as new flow.  It does  lookup + create + open in single call
(being called as atomic create) to USER SPACE. mknod is a special case
where the file system does not have a create call implemented. I think
its legacy probably goes back to Linux 2.4 if I am not wrong. We did
not make any changes into that.

Second patch avoids lookup for open calls. 3rd patch avoids lookup in
de_revalidate() but for non-dir. And only in case when default
permissions are not enabled.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-04  4:26     ` Dharmendra Hans
@ 2022-05-04 14:47       ` Vivek Goyal
  2022-05-04 15:46         ` Bernd Schubert
  2022-05-05  4:51         ` Dharmendra Hans
  0 siblings, 2 replies; 37+ messages in thread
From: Vivek Goyal @ 2022-05-04 14:47 UTC (permalink / raw)
  To: Dharmendra Hans
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Bernd Schubert, Dharmendra Singh

On Wed, May 04, 2022 at 09:56:49AM +0530, Dharmendra Hans wrote:
> On Wed, May 4, 2022 at 1:24 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> > > From: Dharmendra Singh <dsingh@ddn.com>
> > >
> > > When we go for creating a file (O_CREAT), we trigger
> > > a lookup to FUSE USER SPACE. It is very  much likely
> > > that file does not exist yet as O_CREAT is passed to
> > > open(). This lookup can be avoided and can be performed
> > > as part of create call into libfuse.
> > >
> > > This lookup + create in single call to libfuse and finally
> > > to USER SPACE has been named as atomic create. It is expected
> > > that USER SPACE create the file, open it and fills in the
> > > attributes which are then used to make inode stand/revalidate
> > > in the kernel cache. Also if file was newly created(does not
> > > exist yet by this time) in USER SPACE then it should be indicated
> > > in `struct fuse_file_info` by setting a bit which is again used by
> > > libfuse to send some flags back to fuse kernel to indicate that
> > > that file was newly created. These flags are used by kernel to
> > > indicate changes in parent directory.
> >
> > Reading the existing code a little bit more and trying to understand
> > existing semantics. And that will help me unerstand what new is being
> > done.
> >
> > So current fuse_atomic_open() does following.
> >
> > A. Looks up dentry (if d_in_lookup() is set).
> > B. If dentry is positive or O_CREAT is not set, return.
> > C. If server supports atomic create + open, use that to create file and
> >    open it as well.
> > D. If server does not support atomic create + open, just create file
> >    using "mknod" and return. VFS will take care of opening the file.
> >
> > Now with this patch, new flow is.
> >
> > A. Look up dentry if d_in_lookup() is set as well as either file is not
> >    being created or fc->no_atomic_create is set. This basiclally means
> >    skip lookup if atomic_create is supported and file is being created.
> >
> > B. Remains same. if dentry is positive or O_CREATE is not set, return.
> >
> > C. If server supports new atomic_create(), use that.
> >
> > D. If not, if server supports atomic create + open, use that
> >
> > E. If not, fall back to mknod and do not open file.
> >
> > So to me this new functionality is basically atomic "lookup + create +
> > open"?
> >
> > Or may be not. I see we check "fc->no_create" and fallback to mknod.
> >
> >         if (fc->no_create)
> >                 goto mknod;
> >
> > So fc->no_create is representing both old atomic "create + open" as well
> > as new "lookup + create + open" ?
> >
> > It might be obvious to you, but it is not to me. So will be great if
> > you shed some light on this.
> >
> 
> I think you got it right now. New atomic create does what you
> mentioned as new flow.  It does  lookup + create + open in single call
> (being called as atomic create) to USER SPACE.mknod is a special case

Ok, naming is little confusing. I think we will have to put it in
commit message and where you define FUSE_ATOMIC_CREATE that what's
the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
ATOMIC w.r.t what?

May be atomic here means that "lookup + create + open" is a single operation.
But then even FUSE_CREATE is atomic because "creat + open" is a single
operation.

In fact FUSE_CREATE does lookup anyway and returns all the information
in fuse_entry_out. 

IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
later also carries information in reply whether file was actually created
or not (FOPEN_FILE_CREATED). This will be set if file did not exist
already and it was created indeed. Is that right?

I see FOPEN_FILE_CREATED is being used to avoid calling
fuse_dir_changed(). That sounds like a separate optimization and probably
should be in a separate patch.

IOW, I think this patch should be broken in to multiple pieces. First
piece seems to be avoiding lookup() and given the way it is implemented,
looks like we can avoid lookup() even by using existing FUSE_CREATE
command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?

And once that is done, a separate patch should probably should explain
the problem and say fuse_dir_changed() call can be avoided if we knew
if file was actually created or it was already existing there. And that's
when one need to introduce a new command. Given this is just an extension
of existing FUSE_CREATE command and returns additiona info about
FOPEN_FILE_CREATED, we probably should simply call it FUSE_CREATE_EXT
and explain how this operation is different from FUSE_CREATE.

Thanks
Vivek

> where the file system does not have a create call implemented. I think
> its legacy probably goes back to Linux 2.4 if I am not wrong. We did
> not make any changes into that.

> 
> Second patch avoids lookup for open calls. 3rd patch avoids lookup in
> de_revalidate() but for non-dir. And only in case when default
> permissions are not enabled.
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-04 14:47       ` Vivek Goyal
@ 2022-05-04 15:46         ` Bernd Schubert
  2022-05-04 17:31           ` Vivek Goyal
  2022-05-05  4:51         ` Dharmendra Hans
  1 sibling, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2022-05-04 15:46 UTC (permalink / raw)
  To: Vivek Goyal, Dharmendra Hans
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Dharmendra Singh



On 5/4/22 16:47, Vivek Goyal wrote:

> Ok, naming is little confusing. I think we will have to put it in
> commit message and where you define FUSE_ATOMIC_CREATE that what's
> the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
> ATOMIC w.r.t what?
> 
> May be atomic here means that "lookup + create + open" is a single operation.
> But then even FUSE_CREATE is atomic because "creat + open" is a single
> operation.
> 
> In fact FUSE_CREATE does lookup anyway and returns all the information
> in fuse_entry_out.
> 
> IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
> later also carries information in reply whether file was actually created
> or not (FOPEN_FILE_CREATED). This will be set if file did not exist
> already and it was created indeed. Is that right?
> 
> I see FOPEN_FILE_CREATED is being used to avoid calling
> fuse_dir_changed(). That sounds like a separate optimization and probably
> should be in a separate patch.
> 
> IOW, I think this patch should be broken in to multiple pieces. First
> piece seems to be avoiding lookup() and given the way it is implemented,
> looks like we can avoid lookup() even by using existing FUSE_CREATE
> command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?

The initial non-published patches had that, but I had actually asked not 
to go that route, because I'm scared that some user space file system 
implementations might get broken. Right now there is always a lookup 
before fuse_create_open() and when the resulting dentry is positive 
fuse_create_open/FUSE_CREATE is bypassed. I.e. user space 
implementations didn't need to handle existing files. Out of the sudden 
user space implementations might need to handle it and some of them 
might get broken with that kernel update. I guess even a single broken 
user space implementation would count as regression.
So I had asked to change the patch to require a user space flag.

-- Bernd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-04 15:46         ` Bernd Schubert
@ 2022-05-04 17:31           ` Vivek Goyal
  0 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2022-05-04 17:31 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Dharmendra Hans, Miklos Szeredi, linux-fsdevel, fuse-devel,
	linux-kernel, Dharmendra Singh

On Wed, May 04, 2022 at 05:46:27PM +0200, Bernd Schubert wrote:
> 
> 
> On 5/4/22 16:47, Vivek Goyal wrote:
> 
> > Ok, naming is little confusing. I think we will have to put it in
> > commit message and where you define FUSE_ATOMIC_CREATE that what's
> > the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
> > ATOMIC w.r.t what?
> > 
> > May be atomic here means that "lookup + create + open" is a single operation.
> > But then even FUSE_CREATE is atomic because "creat + open" is a single
> > operation.
> > 
> > In fact FUSE_CREATE does lookup anyway and returns all the information
> > in fuse_entry_out.
> > 
> > IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
> > later also carries information in reply whether file was actually created
> > or not (FOPEN_FILE_CREATED). This will be set if file did not exist
> > already and it was created indeed. Is that right?
> > 
> > I see FOPEN_FILE_CREATED is being used to avoid calling
> > fuse_dir_changed(). That sounds like a separate optimization and probably
> > should be in a separate patch.
> > 
> > IOW, I think this patch should be broken in to multiple pieces. First
> > piece seems to be avoiding lookup() and given the way it is implemented,
> > looks like we can avoid lookup() even by using existing FUSE_CREATE
> > command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?
> 
> The initial non-published patches had that, but I had actually asked not to
> go that route, because I'm scared that some user space file system
> implementations might get broken.

> Right now there is always a lookup before
> fuse_create_open() and when the resulting dentry is positive
> fuse_create_open/FUSE_CREATE is bypassed. I.e. user space implementations
> didn't need to handle existing files.

Hmm..., So if dentry is positive, we will call FUSE_OPEN instead in 
current code.

Now with this change, we will call FUSE_CREATE and file could still
be present. If it is a shared filesystem, file could be created by
another client anyway after lookup() completed and returned a non-existent
file. So server can still get FUSE_CREATE and file could be there.

But I understand that risk of regression is not zero. 

Given we are going to implement FUSE_CREATE_EXT in the same patch
series, I guess we could fix it easily by switching to FUSE_CREATE_EXT.

So that's my take. I will be willing to take this chance. Until and
unless ofcourse Miklos disagrees. :-)

Thanks
Vivek

> Out of the sudden user space
> implementations might need to handle it and some of them might get broken
> with that kernel update. I guess even a single broken user space
> implementation would count as regression.
> So I had asked to change the patch to require a user space flag.
> 
> -- Bernd
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 2/3] FUSE: Implement atomic lookup + open
  2022-05-02 10:25 ` [PATCH v4 2/3] FUSE: Implement atomic lookup + open Dharmendra Singh
@ 2022-05-04 18:20   ` Vivek Goyal
  2022-05-05  6:39     ` Dharmendra Hans
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-04 18:20 UTC (permalink / raw)
  To: Dharmendra Singh
  Cc: miklos, linux-fsdevel, fuse-devel, linux-kernel, bschubert,
	Dharmendra Singh

On Mon, May 02, 2022 at 03:55:20PM +0530, Dharmendra Singh wrote:
> From: Dharmendra Singh <dsingh@ddn.com>
> 
> We can optimize aggressive lookups which are triggered when
> there is normal open for file/dir (dentry is new/negative).
> 
> Here in this case since we are anyway going to open the file/dir
> with USER SPACE, avoid this separate lookup call into libfuse
> and combine it with open call into libfuse.
> 
> This lookup + open in single call to libfuse has been named
> as atomic open. It is expected that USER SPACE opens the file
> and fills in the attributes, which are then used to make inode
> stand/revalidate in the kernel cache.
> 
> Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> ---
>  fs/fuse/dir.c             | 78 ++++++++++++++++++++++++++++++---------
>  fs/fuse/fuse_i.h          |  3 ++
>  fs/fuse/inode.c           |  4 +-
>  include/uapi/linux/fuse.h |  2 +
>  4 files changed, 69 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index cad3322a007f..6879d3a86796 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -516,18 +516,18 @@ static int get_security_context(struct dentry *entry, umode_t mode,
>  }
>  
>  /*
> - * Atomic create+open operation
> - *
> - * If the filesystem doesn't support this, then fall back to separate
> - * 'mknod' + 'open' requests.
> + * This is common function for initiating atomic operations into libfuse.
> + * Currently being used by Atomic create(atomic lookup + create) and
> + * Atomic open(atomic lookup + open).
>   */
> -static int fuse_create_open(struct inode *dir, struct dentry *entry,
> -			    struct file *file, unsigned int flags,
> -			    umode_t mode, uint32_t opcode)
> +static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
> +				 struct dentry **alias, struct file *file,
> +				 unsigned int flags, umode_t mode, uint32_t opcode)

If this common function is really useful for atomic create + atomic open,
I will first add a patch which adds a common function and makes FUSE_CREATE_EXT use that function. And in later patch add functionality to do atomic open.
Just makes code more readable.

>  {
>  	int err;
>  	struct inode *inode;
>  	struct fuse_mount *fm = get_fuse_mount(dir);
> +	struct fuse_conn *fc = get_fuse_conn(dir);
>  	FUSE_ARGS(args);
>  	struct fuse_forget_link *forget;
>  	struct fuse_create_in inarg;
> @@ -539,9 +539,13 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  	void *security_ctx = NULL;
>  	u32 security_ctxlen;
>  	bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
> +	bool create_op = (opcode == FUSE_CREATE ||
> +			  opcode == FUSE_ATOMIC_CREATE) ? true : false;
> +	if (alias)
> +		*alias = NULL;
>  
>  	/* Userspace expects S_IFREG in create mode */
> -	BUG_ON((mode & S_IFMT) != S_IFREG);
> +	BUG_ON(create_op && (mode & S_IFMT) != S_IFREG);
>  
>  	forget = fuse_alloc_forget();
>  	err = -ENOMEM;
> @@ -553,7 +557,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  	if (!ff)
>  		goto out_put_forget_req;
>  
> -	if (!fm->fc->dont_mask)
> +	if (!fc->dont_mask)
>  		mode &= ~current_umask();
>  
>  	flags &= ~O_NOCTTY;
> @@ -642,8 +646,9 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
>  				err = PTR_ERR(res);
>  				goto out_err;
>  			}
> -			/* res is expected to be NULL since its REG file */
> -			WARN_ON(res);
> +			entry = res;
> +			if (alias)
> +				*alias = res;
>  		}
>  	}
>  	fuse_change_entry_timeout(entry, &outentry);
> @@ -681,8 +686,8 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
>  	if (fc->no_atomic_create)
>  		return -ENOSYS;
>  
> -	err = fuse_create_open(dir, entry, file, flags, mode,
> -			       FUSE_ATOMIC_CREATE);
> +	err = fuse_atomic_do_common(dir, entry, NULL, file, flags, mode,
> +				    FUSE_ATOMIC_CREATE);
>  	/* If atomic create is not implemented then indicate in fc so that next
>  	 * request falls back to normal create instead of going into libufse and
>  	 * returning with -ENOSYS.
> @@ -694,6 +699,29 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
>  	return err;
>  }
>  
> +static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> +				struct dentry **alias, struct file *file,
> +				unsigned int flags, umode_t mode)
> +{
> +	int err;
> +	struct fuse_conn *fc = get_fuse_conn(dir);
> +
> +	if (!fc->do_atomic_open)
> +		return -ENOSYS;
> +
> +	err = fuse_atomic_do_common(dir, entry, alias, file, flags, mode,
> +				    FUSE_ATOMIC_OPEN);
> +	/* Atomic open imply atomic trunc as well i.e truncate should be performed
> +	 * as part of atomic open call itself.
> +	 */
> +	if (!fc->atomic_o_trunc) {
> +		if (fc->do_atomic_open)
> +			fc->atomic_o_trunc = 1;
> +	}
> +
> +	return err;
> +}
> +
>  static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
>  		      umode_t, dev_t);
>  static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> @@ -702,12 +730,23 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  {
>  	int err;
>  	struct fuse_conn *fc = get_fuse_conn(dir);
> -	struct dentry *res = NULL;
> +	struct dentry *res = NULL, *alias = NULL;
>  	bool create = flags & O_CREAT ? true : false;
>  
>  	if (fuse_is_bad(dir))
>  		return -EIO;
>  
> +	if (!create) {
> +		err = fuse_do_atomic_open(dir, entry, &alias,
> +					  file, flags, mode);

So this is the core change of behavior. As of now, if we are not doing
create operation, we just lookup and return. But now you are doing
open + lookup in a single operation. Interesting. I am not sure if
it breaks anything in VFS.

> +		res = alias;
> +		if (!err || err != -ENOSYS)
> +			goto out_dput;
> +	}
> +	 /*
> +	  * ENOSYS fall back for open- user space does not have full
> +	  * atomic open.
> +	  */

This ENOSYS stuff all the place is making these patches look very unclean
to me. A part of me says that should we negotiate these as feature bits.
Having said that feature bits are precious and should not be used for
trivial purposes. Hmm..., may be we can make handle ENOSYS little better
in general.

>  	if ((!create || fc->no_atomic_create) && d_in_lookup(entry)) {
>  		res = fuse_lookup(dir, entry, 0);
>  		if (IS_ERR(res))
> @@ -730,9 +769,14 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  	/* Libfuse/user space has not implemented atomic create, therefore
>  	 * fall back to normal create.
>  	 */
> -	if (err == -ENOSYS)
> -		err = fuse_create_open(dir, entry, file, flags, mode,
> -				       FUSE_CREATE);
> +	/* Atomic create+open operation
> +	 * If the filesystem doesn't support this, then fall back to separate
> +	 * 'mknod' + 'open' requests.
> +	 */
> +	if (err == -ENOSYS) {
> +		err = fuse_atomic_do_common(dir, entry, NULL, file, flags,
> +					    mode, FUSE_CREATE);
> +	}
>  	if (err == -ENOSYS) {
>  		fc->no_create = 1;
>  		goto mknod;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index d577a591ab16..24793b82303f 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -669,6 +669,9 @@ struct fuse_conn {
>  	/** Is open/release not implemented by fs? */
>  	unsigned no_open:1;
>  
> +	/** Is atomic open implemented by fs ? */
> +	unsigned do_atomic_open : 1;
> +
>  	/** Is atomic create not implemented by fs? */
>  	unsigned no_atomic_create:1;
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index ee846ce371d8..5f667de69115 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1190,6 +1190,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>  				fc->setxattr_ext = 1;
>  			if (flags & FUSE_SECURITY_CTX)
>  				fc->init_security = 1;
> +			if (flags & FUSE_DO_ATOMIC_OPEN)
> +				fc->do_atomic_open = 1;

Hey you are negotiating FUSE_DO_ATOMIC_OPEN flag. If that's the case why
do you have to deal with -ENOSYS in fuse_do_atomic_open(). You should
be able to just check.

if (!create && fc->do_atomic_open) {
        fuse_do_atomic_open().
}

I have yet to check what happens if file does not exist.

>  		} else {
>  			ra_pages = fc->max_read / PAGE_SIZE;
>  			fc->no_lock = 1;
> @@ -1235,7 +1237,7 @@ void fuse_send_init(struct fuse_mount *fm)
>  		FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
>  		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
>  		FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
> -		FUSE_SECURITY_CTX;
> +		FUSE_SECURITY_CTX | FUSE_DO_ATOMIC_OPEN;
>  #ifdef CONFIG_FUSE_DAX
>  	if (fm->fc->dax)
>  		flags |= FUSE_MAP_ALIGNMENT;
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index e4b56004b148..ab91e391241a 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -391,6 +391,7 @@ struct fuse_file_lock {
>  /* bits 32..63 get shifted down 32 bits into the flags2 field */
>  #define FUSE_SECURITY_CTX	(1ULL << 32)
>  #define FUSE_HAS_INODE_DAX	(1ULL << 33)
> +#define FUSE_DO_ATOMIC_OPEN	(1ULL << 34)
>  
>  /**
>   * CUSE INIT request/reply flags
> @@ -540,6 +541,7 @@ enum fuse_opcode {
>  	FUSE_REMOVEMAPPING	= 49,
>  	FUSE_SYNCFS		= 50,
>  	FUSE_ATOMIC_CREATE	= 51,
> +	FUSE_ATOMIC_OPEN	= 52,
>  
>  	/* CUSE specific operations */
>  	CUSE_INIT		= 4096,
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-02 10:25 [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Dharmendra Singh
                   ` (2 preceding siblings ...)
  2022-05-02 10:25 ` [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate() Dharmendra Singh
@ 2022-05-04 19:18 ` Vivek Goyal
  2022-05-05  6:12   ` Dharmendra Hans
  3 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-04 19:18 UTC (permalink / raw)
  To: Dharmendra Singh
  Cc: miklos, linux-fsdevel, fuse-devel, linux-kernel, bschubert

On Mon, May 02, 2022 at 03:55:18PM +0530, Dharmendra Singh wrote:
> In FUSE, as of now, uncached lookups are expensive over the wire.
> E.g additional latencies and stressing (meta data) servers from
> thousands of clients. These lookup calls possibly can be avoided
> in some cases. Incoming three patches address this issue.

BTW, these patches are designed to improve performance by cutting down
on number of fuse commands sent. Are there any performance numbers
which demonstrate what kind of improvement you are seeing.

Say, If I do kernel build, is the performance improvement observable?

Thanks
Vivek

> 
> 
> Fist patch handles the case where we are creating a file with O_CREAT.
> Before we go for file creation, we do a lookup on the file which is most
> likely non-existent. After this lookup is done, we again go into libfuse
> to create file. Such lookups where file is most likely non-existent, can
> be avoided.
> 
> Second patch handles the case where we open first time a file/dir
> but do a lookup first on it. After lookup is performed we make another
> call into libfuse to open the file. Now these two separate calls into
> libfuse can be combined and performed as a single call into libfuse.
> 
> Third patch handles the case when we are opening an already existing file
> (positive dentry). Before this open call, we re-validate the inode and
> this re-validation does a lookup on the file and verify the inode.
> This separate lookup also can be avoided (for non-dir) and combined
> with open call into libfuse. After open returns we can revalidate the inode.
> This optimisation is performed only when we do not have default permissions
> enabled.
> 
> Here is the link to performance numbers
> https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
> 
> 
> Dharmendra Singh (3):
>   FUSE: Implement atomic lookup + create
>   Implement atomic lookup + open
>   Avoid lookup in d_revalidate()
> 
>  fs/fuse/dir.c             | 211 +++++++++++++++++++++++++++++++++++---
>  fs/fuse/file.c            |  30 +++++-
>  fs/fuse/fuse_i.h          |  16 ++-
>  fs/fuse/inode.c           |   4 +-
>  fs/fuse/ioctl.c           |   2 +-
>  include/uapi/linux/fuse.h |   5 +
>  6 files changed, 246 insertions(+), 22 deletions(-)
> 
> ---
> v4: Addressed all comments and refactored the code into 3 separate patches
>     respectively for Atomic create, Atomic open, optimizing lookup in
>     d_revalidate().
> ---
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate()
  2022-05-02 10:25 ` [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate() Dharmendra Singh
@ 2022-05-04 20:39   ` Vivek Goyal
  2022-05-04 21:05     ` Bernd Schubert
  2022-05-05  5:49     ` Dharmendra Hans
  0 siblings, 2 replies; 37+ messages in thread
From: Vivek Goyal @ 2022-05-04 20:39 UTC (permalink / raw)
  To: Dharmendra Singh
  Cc: miklos, linux-fsdevel, fuse-devel, linux-kernel, bschubert,
	Dharmendra Singh

On Mon, May 02, 2022 at 03:55:21PM +0530, Dharmendra Singh wrote:
> From: Dharmendra Singh <dsingh@ddn.com>
> 
> With atomic open + lookup implemented, it is possible
> to avoid lookups in FUSE d_revalidate() for objects
> other than directories.
> 
> If FUSE is mounted with default permissions then this
> optimization is not possible as we need to fetch fresh
> inode attributes for permission check. This lookup
> skipped in d_revalidate() can be performed  as part of
> open call into libfuse which is made from fuse_file_open().
> And when we return from USER SPACE with file opened and
> fresh attributes, we can revalidate the inode.
> 
> Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> Reported-by: kernel test robot <lkp@intel.com>
> 
> ---
>  fs/fuse/dir.c    | 89 ++++++++++++++++++++++++++++++++++++++++++------
>  fs/fuse/file.c   | 30 ++++++++++++++--
>  fs/fuse/fuse_i.h | 10 +++++-
>  fs/fuse/ioctl.c  |  2 +-
>  4 files changed, 115 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 6879d3a86796..1594fecc920f 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -196,6 +196,7 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
>   * the lookup once more.  If the lookup results in the same inode,
>   * then refresh the attributes, timeouts and mark the dentry valid.
>   */
> +
>  static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>  {
>  	struct inode *inode;
> @@ -224,6 +225,17 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>  
>  		fm = get_fuse_mount(inode);
>  
> +		/* If atomic open is supported by FUSE then use this opportunity
> +		 * (only for non-dir) to avoid this lookup and combine
> +		 * lookup + open into single call.
> +		 */
> +		if (!fm->fc->default_permissions && fm->fc->do_atomic_open &&
> +		    !(flags & (LOOKUP_EXCL | LOOKUP_REVAL)) &&
> +		    (flags & LOOKUP_OPEN) && !S_ISDIR(inode->i_mode)) {
> +			ret = 1;

So basically we think that VFS is going to do OPEN and calling
->revalidate() before that. So we are returning "1" to vfs saying
dentry is valid (despite the fact that we have no idea at this
point of time).

And later when open comes we are opening and revalidating inode etc.

Seriously, IMHO, all this seems very fragile and hard to understand
and maintain code. Things can go easily wrong if even little bit
of assumptions change in VFS.

This sounds more like VFS should know about it and if VFS knows
that filesystem supports facility where it can open + revalidate
at the same time, it should probably call that. Something like
->open_revalidate() etc. That would be much more maintainable code but
this feels like very fragile to me, IMHO.

Thanks
Vivek




> +			goto out;
> +		}
> +
>  		forget = fuse_alloc_forget();
>  		ret = -ENOMEM;
>  		if (!forget)
> @@ -515,14 +527,52 @@ static int get_security_context(struct dentry *entry, umode_t mode,
>  	return err;
>  }
>  
> +/*
> + * Revalidate the inode after we got fresh attributes from user space.
> + */
> +static int fuse_atomic_open_revalidate_inode(struct inode *reval_inode,
> +					     struct dentry *entry,
> +					     struct fuse_forget_link *forget,
> +					     struct fuse_entry_out *outentry,
> +					     u64 attr_version)
> +{
> +	struct fuse_inode *fi;
> +	struct fuse_conn *fc = get_fuse_conn(reval_inode);
> +
> +	/* Mode should be other than directory */
> +	BUG_ON(S_ISDIR(reval_inode->i_mode));
> +
> +	if (outentry->nodeid != get_node_id(reval_inode)) {
> +		fuse_queue_forget(fc, forget, outentry->nodeid, 1);
> +		return -ESTALE;
> +	}
> +	if (fuse_stale_inode(reval_inode, outentry->generation,
> +			     &outentry->attr)) {
> +		fuse_make_bad(reval_inode);
> +		return -ESTALE;
> +	}
> +	fi = get_fuse_inode(reval_inode);
> +	spin_lock(&fi->lock);
> +	fi->nlookup++;
> +	spin_unlock(&fi->lock);
> +
> +	forget_all_cached_acls(reval_inode);
> +	fuse_change_attributes(reval_inode, &outentry->attr,
> +			       entry_attr_timeout(outentry), attr_version);
> +	fuse_change_entry_timeout(entry, outentry);
> +	return 0;
> +}
> +
> +
>  /*
>   * This is common function for initiating atomic operations into libfuse.
>   * Currently being used by Atomic create(atomic lookup + create) and
>   * Atomic open(atomic lookup + open).
>   */
> -static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
> +static int fuse_do_atomic_common(struct inode *dir, struct dentry *entry,
>  				 struct dentry **alias, struct file *file,
> -				 unsigned int flags, umode_t mode, uint32_t opcode)
> +				 struct inode *reval_inode, unsigned int flags,
> +				 umode_t mode, uint32_t opcode)
>  {
>  	int err;
>  	struct inode *inode;
> @@ -541,6 +591,8 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
>  	bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
>  	bool create_op = (opcode == FUSE_CREATE ||
>  			  opcode == FUSE_ATOMIC_CREATE) ? true : false;
> +	u64 attr_version = fuse_get_attr_version(fm->fc);
> +
>  	if (alias)
>  		*alias = NULL;
>  
> @@ -609,6 +661,20 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
>  	ff->fh = outopen.fh;
>  	ff->nodeid = outentry.nodeid;
>  	ff->open_flags = outopen.open_flags;
> +
> +	/* Inode revalidation was bypassed previously for type other than
> +	 * directories, revalidate now as we got fresh attributes.
> +	 */
> +	if (reval_inode) {
> +		err = fuse_atomic_open_revalidate_inode(reval_inode, entry,
> +							forget, &outentry,
> +							attr_version);
> +		if (err)
> +			goto out_free_ff;
> +		inode = reval_inode;
> +		kfree(forget);
> +		goto out_finish_open;
> +	}
>  	inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
>  			  &outentry.attr, entry_attr_timeout(&outentry), 0);
>  	if (!inode) {
> @@ -659,6 +725,7 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
>  	if (!atomic_create || (outopen.open_flags & FOPEN_FILE_CREATED))
>  		fuse_dir_changed(dir);
>  	err = finish_open(file, entry, generic_file_open);
> +out_finish_open:
>  	if (err) {
>  		fi = get_fuse_inode(inode);
>  		fuse_sync_release(fi, ff, flags);
> @@ -686,7 +753,7 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
>  	if (fc->no_atomic_create)
>  		return -ENOSYS;
>  
> -	err = fuse_atomic_do_common(dir, entry, NULL, file, flags, mode,
> +	err = fuse_do_atomic_common(dir, entry, NULL, file, NULL, flags, mode,
>  				    FUSE_ATOMIC_CREATE);
>  	/* If atomic create is not implemented then indicate in fc so that next
>  	 * request falls back to normal create instead of going into libufse and
> @@ -699,9 +766,10 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
>  	return err;
>  }
>  
> -static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> -				struct dentry **alias, struct file *file,
> -				unsigned int flags, umode_t mode)
> +int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> +			struct dentry **alias, struct file *file,
> +			struct inode *reval_inode, unsigned int flags,
> +			umode_t mode)
>  {
>  	int err;
>  	struct fuse_conn *fc = get_fuse_conn(dir);
> @@ -709,8 +777,8 @@ static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
>  	if (!fc->do_atomic_open)
>  		return -ENOSYS;
>  
> -	err = fuse_atomic_do_common(dir, entry, alias, file, flags, mode,
> -				    FUSE_ATOMIC_OPEN);
> +	err = fuse_do_atomic_common(dir, entry, alias, file, reval_inode, flags,
> +				    mode, FUSE_ATOMIC_OPEN);
>  	/* Atomic open imply atomic trunc as well i.e truncate should be performed
>  	 * as part of atomic open call itself.
>  	 */
> @@ -718,7 +786,6 @@ static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
>  		if (fc->do_atomic_open)
>  			fc->atomic_o_trunc = 1;
>  	}
> -
>  	return err;
>  }
>  
> @@ -738,7 +805,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  
>  	if (!create) {
>  		err = fuse_do_atomic_open(dir, entry, &alias,
> -					  file, flags, mode);
> +					  file, NULL, flags, mode);
>  		res = alias;
>  		if (!err || err != -ENOSYS)
>  			goto out_dput;
> @@ -774,7 +841,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
>  	 * 'mknod' + 'open' requests.
>  	 */
>  	if (err == -ENOSYS) {
> -		err = fuse_atomic_do_common(dir, entry, NULL, file, flags,
> +		err = fuse_do_atomic_common(dir, entry, NULL, file, NULL, flags,
>  					    mode, FUSE_CREATE);
>  	}
>  	if (err == -ENOSYS) {
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 829094451774..2b0548163249 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -125,11 +125,15 @@ static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
>  }
>  
>  struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> -				 unsigned int open_flags, bool isdir)
> +				 struct file *file, unsigned int open_flags,
> +				 bool isdir)
>  {
>  	struct fuse_conn *fc = fm->fc;
>  	struct fuse_file *ff;
> +	struct dentry *dentry = NULL;
> +	struct dentry *parent = NULL;
>  	int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
> +	int ret;
>  
>  	ff = fuse_file_alloc(fm);
>  	if (!ff)
> @@ -138,6 +142,11 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
>  	ff->fh = 0;
>  	/* Default for no-open */
>  	ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
> +
> +	/* For directories we already had lookup */
> +	if (!isdir && fc->do_atomic_open && file != NULL)
> +		goto revalidate_atomic_open;
> +
>  	if (isdir ? !fc->no_opendir : !fc->no_open) {
>  		struct fuse_open_out outarg;
>  		int err;
> @@ -164,12 +173,27 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
>  	ff->nodeid = nodeid;
>  
>  	return ff;
> +
> +revalidate_atomic_open:
> +	dentry = file->f_path.dentry;
> +	/* Get ref on parent */
> +	parent = dget_parent(dentry);
> +	open_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY);
> +	ret = fuse_do_atomic_open(d_inode_rcu(parent), dentry, NULL, file,
> +				  d_inode_rcu(dentry), open_flags, 0);
> +	dput(parent);
> +	if (ret)
> +		goto err_out;
> +	ff = file->private_data;
> +	return ff;
> +err_out:
> +	return ERR_PTR(ret);
>  }
>  
>  int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
>  		 bool isdir)
>  {
> -	struct fuse_file *ff = fuse_file_open(fm, nodeid, file->f_flags, isdir);
> +	struct fuse_file *ff = fuse_file_open(fm, nodeid, file, file->f_flags, isdir);
>  
>  	if (!IS_ERR(ff))
>  		file->private_data = ff;
> @@ -252,7 +276,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
>  	}
>  
>  	err = fuse_do_open(fm, get_node_id(inode), file, isdir);
> -	if (!err)
> +	if (!err && (!fc->do_atomic_open || isdir))
>  		fuse_finish_open(inode, file);
>  
>  out:
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 24793b82303f..5c83e4249b7e 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1014,6 +1014,13 @@ void fuse_finish_open(struct inode *inode, struct file *file);
>  void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
>  		       unsigned int flags);
>  
> +/**
> + * Send atomic create + open or lookup + open
> + */
> +int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> +			struct dentry **alias, struct file *file,
> +			struct inode *reval_inode, unsigned int flags,
> +			umode_t mode);
>  /**
>   * Send RELEASE or RELEASEDIR request
>   */
> @@ -1317,7 +1324,8 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
>  /* file.c */
>  
>  struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> -				 unsigned int open_flags, bool isdir);
> +				 struct file *file, unsigned int open_flags,
> +				 bool isdir);
>  void fuse_file_release(struct inode *inode, struct fuse_file *ff,
>  		       unsigned int open_flags, fl_owner_t id, bool isdir);
>  
> diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
> index fbc09dab1f85..63106a54ba1a 100644
> --- a/fs/fuse/ioctl.c
> +++ b/fs/fuse/ioctl.c
> @@ -408,7 +408,7 @@ static struct fuse_file *fuse_priv_ioctl_prepare(struct inode *inode)
>  	if (!S_ISREG(inode->i_mode) && !isdir)
>  		return ERR_PTR(-ENOTTY);
>  
> -	return fuse_file_open(fm, get_node_id(inode), O_RDONLY, isdir);
> +	return fuse_file_open(fm, get_node_id(inode), NULL, O_RDONLY, isdir);
>  }
>  
>  static void fuse_priv_ioctl_cleanup(struct inode *inode, struct fuse_file *ff)
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate()
  2022-05-04 20:39   ` Vivek Goyal
@ 2022-05-04 21:05     ` Bernd Schubert
  2022-05-05  5:49     ` Dharmendra Hans
  1 sibling, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2022-05-04 21:05 UTC (permalink / raw)
  To: Vivek Goyal, Dharmendra Singh
  Cc: miklos, linux-fsdevel, fuse-devel, linux-kernel, Dharmendra Singh



On 5/4/22 22:39, Vivek Goyal wrote:
> On Mon, May 02, 2022 at 03:55:21PM +0530, Dharmendra Singh wrote:
>> From: Dharmendra Singh <dsingh@ddn.com>
>>
>> With atomic open + lookup implemented, it is possible
>> to avoid lookups in FUSE d_revalidate() for objects
>> other than directories.
>>
>> If FUSE is mounted with default permissions then this
>> optimization is not possible as we need to fetch fresh
>> inode attributes for permission check. This lookup
>> skipped in d_revalidate() can be performed  as part of
>> open call into libfuse which is made from fuse_file_open().
>> And when we return from USER SPACE with file opened and
>> fresh attributes, we can revalidate the inode.
>>
>> Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
>> Reported-by: kernel test robot <lkp@intel.com>
>>
>> ---
>>   fs/fuse/dir.c    | 89 ++++++++++++++++++++++++++++++++++++++++++------
>>   fs/fuse/file.c   | 30 ++++++++++++++--
>>   fs/fuse/fuse_i.h | 10 +++++-
>>   fs/fuse/ioctl.c  |  2 +-
>>   4 files changed, 115 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>> index 6879d3a86796..1594fecc920f 100644
>> --- a/fs/fuse/dir.c
>> +++ b/fs/fuse/dir.c
>> @@ -196,6 +196,7 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
>>    * the lookup once more.  If the lookup results in the same inode,
>>    * then refresh the attributes, timeouts and mark the dentry valid.
>>    */
>> +
>>   static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>>   {
>>   	struct inode *inode;
>> @@ -224,6 +225,17 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>>   
>>   		fm = get_fuse_mount(inode);
>>   
>> +		/* If atomic open is supported by FUSE then use this opportunity
>> +		 * (only for non-dir) to avoid this lookup and combine
>> +		 * lookup + open into single call.
>> +		 */
>> +		if (!fm->fc->default_permissions && fm->fc->do_atomic_open &&
>> +		    !(flags & (LOOKUP_EXCL | LOOKUP_REVAL)) &&
>> +		    (flags & LOOKUP_OPEN) && !S_ISDIR(inode->i_mode)) {
>> +			ret = 1;
> 
> So basically we think that VFS is going to do OPEN and calling
> ->revalidate() before that. So we are returning "1" to vfs saying
> dentry is valid (despite the fact that we have no idea at this
> point of time).
> 
> And later when open comes we are opening and revalidating inode etc.
> 
> Seriously, IMHO, all this seems very fragile and hard to understand
> and maintain code. Things can go easily wrong if even little bit
> of assumptions change in VFS.
> 
> This sounds more like VFS should know about it and if VFS knows
> that filesystem supports facility where it can open + revalidate
> at the same time, it should probably call that. Something like
> ->open_revalidate() etc. That would be much more maintainable code but
> this feels like very fragile to me, IMHO.
> 

I'm not opposed to make things more clear, but AFAIK these lookup-intent 
flags are the way how it works for quite some time. Also see 
nfs_lookup_verify_inode(), which makes use of that the same way. I 
entirely agree, though, that using a dedicated method would make things 
much easier to understand. It is just a bit more complicated to get in 
patches that change the vfs...

Adding in a vfs ->open_revalidate might have the advantage that we could 
also support 'default_permissions' - ->open_revalidate  needs to 
additionally check the retrieved file permissions and and needs to call 
into generic_permissions for that. Right now that is not easily 
feasible, without adding some code dup to convert flags in MAY_* flags - 
a vfs change would be needed here to pass the right flags.

The other part that is missing in the current patches is something like 
->revalidate_getattr -  stat() of positive dentry first sends a 
revalidate and then another getattr and right now there is no good way 
to combine these.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-04 14:47       ` Vivek Goyal
  2022-05-04 15:46         ` Bernd Schubert
@ 2022-05-05  4:51         ` Dharmendra Hans
  2022-05-05 14:26           ` Vivek Goyal
  1 sibling, 1 reply; 37+ messages in thread
From: Dharmendra Hans @ 2022-05-05  4:51 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Bernd Schubert, Dharmendra Singh

On Wed, May 4, 2022 at 8:17 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, May 04, 2022 at 09:56:49AM +0530, Dharmendra Hans wrote:
> > On Wed, May 4, 2022 at 1:24 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> > > > From: Dharmendra Singh <dsingh@ddn.com>
> > > >
> > > > When we go for creating a file (O_CREAT), we trigger
> > > > a lookup to FUSE USER SPACE. It is very  much likely
> > > > that file does not exist yet as O_CREAT is passed to
> > > > open(). This lookup can be avoided and can be performed
> > > > as part of create call into libfuse.
> > > >
> > > > This lookup + create in single call to libfuse and finally
> > > > to USER SPACE has been named as atomic create. It is expected
> > > > that USER SPACE create the file, open it and fills in the
> > > > attributes which are then used to make inode stand/revalidate
> > > > in the kernel cache. Also if file was newly created(does not
> > > > exist yet by this time) in USER SPACE then it should be indicated
> > > > in `struct fuse_file_info` by setting a bit which is again used by
> > > > libfuse to send some flags back to fuse kernel to indicate that
> > > > that file was newly created. These flags are used by kernel to
> > > > indicate changes in parent directory.
> > >
> > > Reading the existing code a little bit more and trying to understand
> > > existing semantics. And that will help me unerstand what new is being
> > > done.
> > >
> > > So current fuse_atomic_open() does following.
> > >
> > > A. Looks up dentry (if d_in_lookup() is set).
> > > B. If dentry is positive or O_CREAT is not set, return.
> > > C. If server supports atomic create + open, use that to create file and
> > >    open it as well.
> > > D. If server does not support atomic create + open, just create file
> > >    using "mknod" and return. VFS will take care of opening the file.
> > >
> > > Now with this patch, new flow is.
> > >
> > > A. Look up dentry if d_in_lookup() is set as well as either file is not
> > >    being created or fc->no_atomic_create is set. This basiclally means
> > >    skip lookup if atomic_create is supported and file is being created.
> > >
> > > B. Remains same. if dentry is positive or O_CREATE is not set, return.
> > >
> > > C. If server supports new atomic_create(), use that.
> > >
> > > D. If not, if server supports atomic create + open, use that
> > >
> > > E. If not, fall back to mknod and do not open file.
> > >
> > > So to me this new functionality is basically atomic "lookup + create +
> > > open"?
> > >
> > > Or may be not. I see we check "fc->no_create" and fallback to mknod.
> > >
> > >         if (fc->no_create)
> > >                 goto mknod;
> > >
> > > So fc->no_create is representing both old atomic "create + open" as well
> > > as new "lookup + create + open" ?
> > >
> > > It might be obvious to you, but it is not to me. So will be great if
> > > you shed some light on this.
> > >
> >
> > I think you got it right now. New atomic create does what you
> > mentioned as new flow.  It does  lookup + create + open in single call
> > (being called as atomic create) to USER SPACE.mknod is a special case
>
> Ok, naming is little confusing. I think we will have to put it in
> commit message and where you define FUSE_ATOMIC_CREATE that what's
> the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
> ATOMIC w.r.t what?

Sure, I would update the commit message to make the distinction clear
between the two. This operation is atomic w.r.t to USER SPACE FUSE
implementations. i.e USER SPACE would be performing all these
operations in a single call to it.


> May be atomic here means that "lookup + create + open" is a single operation.
> But then even FUSE_CREATE is atomic because "creat + open" is a single
> operation.
>
> In fact FUSE_CREATE does lookup anyway and returns all the information
> in fuse_entry_out.
>
> IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
> later also carries information in reply whether file was actually created
> or not (FOPEN_FILE_CREATED). This will be set if file did not exist
> already and it was created indeed. Is that right?

FUSE_CREATE is atomic but upto level of libfuse. Libfuse separates it
into two calls, create and lookup separately into USER SPACE FUSE
implementation.
This FUSE_ATOMIC_CREATE does all these ops in a single call to FUSE
implementations.  We do not want to break any existing FUSE
implementations, therefore it is being introduced as a new feature. I
forgot to include links to libfuse patches as well. That would have
made it much clearer.  Here is the link to libfuse patch for this call
https://github.com/libfuse/libfuse/pull/673.

>
> I see FOPEN_FILE_CREATED is being used to avoid calling
> fuse_dir_changed(). That sounds like a separate optimization and probably
> should be in a separate patch.

FUSE_ATOMIC_CREATE needs to send back info about if  file was actually
created or not (This is suggestion from Miklos) to correctly convey if
the parent dir is really changing or not. I included this as part of
this patch itself instead of having it as a separate patch.

> IOW, I think this patch should be broken in to multiple pieces. First
> piece seems to be avoiding lookup() and given the way it is implemented,
> looks like we can avoid lookup() even by using existing FUSE_CREATE
> command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?

Its not only about changing fuse kernel code but USER SPACE
implementations also. If we change the you are suggesting we would be
required to twist many things at libfuse and FUSE low level API. So to
keep things simple and not to break any existing implementations we
have kept it as new call (we now pass `struct stat` to USER SPACE FUSE
to filled in).

> And once that is done, a separate patch should probably should explain
> the problem and say fuse_dir_changed() call can be avoided if we knew
> if file was actually created or it was already existing there. And that's
> when one need to introduce a new command. Given this is just an extension
> of existing FUSE_CREATE command and returns additiona info about
> FOPEN_FILE_CREATED, we probably should simply call it FUSE_CREATE_EXT
> and explain how this operation is different from FUSE_CREATE.

As explained above, we are not doing this way as we have kept in mind
all existing libfuse APIs as well.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate()
  2022-05-04 20:39   ` Vivek Goyal
  2022-05-04 21:05     ` Bernd Schubert
@ 2022-05-05  5:49     ` Dharmendra Hans
  1 sibling, 0 replies; 37+ messages in thread
From: Dharmendra Hans @ 2022-05-05  5:49 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Bernd Schubert, Dharmendra Singh

On Thu, May 5, 2022 at 2:09 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, May 02, 2022 at 03:55:21PM +0530, Dharmendra Singh wrote:
> > From: Dharmendra Singh <dsingh@ddn.com>
> >
> > With atomic open + lookup implemented, it is possible
> > to avoid lookups in FUSE d_revalidate() for objects
> > other than directories.
> >
> > If FUSE is mounted with default permissions then this
> > optimization is not possible as we need to fetch fresh
> > inode attributes for permission check. This lookup
> > skipped in d_revalidate() can be performed  as part of
> > open call into libfuse which is made from fuse_file_open().
> > And when we return from USER SPACE with file opened and
> > fresh attributes, we can revalidate the inode.
> >
> > Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> > ---
> >  fs/fuse/dir.c    | 89 ++++++++++++++++++++++++++++++++++++++++++------
> >  fs/fuse/file.c   | 30 ++++++++++++++--
> >  fs/fuse/fuse_i.h | 10 +++++-
> >  fs/fuse/ioctl.c  |  2 +-
> >  4 files changed, 115 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index 6879d3a86796..1594fecc920f 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -196,6 +196,7 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
> >   * the lookup once more.  If the lookup results in the same inode,
> >   * then refresh the attributes, timeouts and mark the dentry valid.
> >   */
> > +
> >  static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
> >  {
> >       struct inode *inode;
> > @@ -224,6 +225,17 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
> >
> >               fm = get_fuse_mount(inode);
> >
> > +             /* If atomic open is supported by FUSE then use this opportunity
> > +              * (only for non-dir) to avoid this lookup and combine
> > +              * lookup + open into single call.
> > +              */
> > +             if (!fm->fc->default_permissions && fm->fc->do_atomic_open &&
> > +                 !(flags & (LOOKUP_EXCL | LOOKUP_REVAL)) &&
> > +                 (flags & LOOKUP_OPEN) && !S_ISDIR(inode->i_mode)) {
> > +                     ret = 1;
>
> So basically we think that VFS is going to do OPEN and calling
> ->revalidate() before that. So we are returning "1" to vfs saying
> dentry is valid (despite the fact that we have no idea at this
> point of time).
>
> And later when open comes we are opening and revalidating inode etc.
>
> Seriously, IMHO, all this seems very fragile and hard to understand
> and maintain code. Things can go easily wrong if even little bit
> of assumptions change in VFS.
>
> This sounds more like VFS should know about it and if VFS knows
> that filesystem supports facility where it can open + revalidate
> at the same time, it should probably call that. Something like
> ->open_revalidate() etc. That would be much more maintainable code but
> this feels like very fragile to me, IMHO.

Actually we did not want to do it this way. We understand that code is
hard to understand and  maintain.  But we found that VFS does not have
support for this kind of functionalities i.e to combine a couple of
ops into one except 'atomic open' which would have made it very easier
for File systems.
As of now I think VFS does not do much (it calls inode permissions
checks though but we are optimising only if default permissions are
not enabled) with inode till the time it opens it. Since we are
revalidating inode after opening, it detects issues if any with the
file.

>
>
>
> > +                     goto out;
> > +             }
> > +
> >               forget = fuse_alloc_forget();
> >               ret = -ENOMEM;
> >               if (!forget)
> > @@ -515,14 +527,52 @@ static int get_security_context(struct dentry *entry, umode_t mode,
> >       return err;
> >  }
> >
> > +/*
> > + * Revalidate the inode after we got fresh attributes from user space.
> > + */
> > +static int fuse_atomic_open_revalidate_inode(struct inode *reval_inode,
> > +                                          struct dentry *entry,
> > +                                          struct fuse_forget_link *forget,
> > +                                          struct fuse_entry_out *outentry,
> > +                                          u64 attr_version)
> > +{
> > +     struct fuse_inode *fi;
> > +     struct fuse_conn *fc = get_fuse_conn(reval_inode);
> > +
> > +     /* Mode should be other than directory */
> > +     BUG_ON(S_ISDIR(reval_inode->i_mode));
> > +
> > +     if (outentry->nodeid != get_node_id(reval_inode)) {
> > +             fuse_queue_forget(fc, forget, outentry->nodeid, 1);
> > +             return -ESTALE;
> > +     }
> > +     if (fuse_stale_inode(reval_inode, outentry->generation,
> > +                          &outentry->attr)) {
> > +             fuse_make_bad(reval_inode);
> > +             return -ESTALE;
> > +     }
> > +     fi = get_fuse_inode(reval_inode);
> > +     spin_lock(&fi->lock);
> > +     fi->nlookup++;
> > +     spin_unlock(&fi->lock);
> > +
> > +     forget_all_cached_acls(reval_inode);
> > +     fuse_change_attributes(reval_inode, &outentry->attr,
> > +                            entry_attr_timeout(outentry), attr_version);
> > +     fuse_change_entry_timeout(entry, outentry);
> > +     return 0;
> > +}
> > +
> > +
> >  /*
> >   * This is common function for initiating atomic operations into libfuse.
> >   * Currently being used by Atomic create(atomic lookup + create) and
> >   * Atomic open(atomic lookup + open).
> >   */
> > -static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
> > +static int fuse_do_atomic_common(struct inode *dir, struct dentry *entry,
> >                                struct dentry **alias, struct file *file,
> > -                              unsigned int flags, umode_t mode, uint32_t opcode)
> > +                              struct inode *reval_inode, unsigned int flags,
> > +                              umode_t mode, uint32_t opcode)
> >  {
> >       int err;
> >       struct inode *inode;
> > @@ -541,6 +591,8 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
> >       bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
> >       bool create_op = (opcode == FUSE_CREATE ||
> >                         opcode == FUSE_ATOMIC_CREATE) ? true : false;
> > +     u64 attr_version = fuse_get_attr_version(fm->fc);
> > +
> >       if (alias)
> >               *alias = NULL;
> >
> > @@ -609,6 +661,20 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
> >       ff->fh = outopen.fh;
> >       ff->nodeid = outentry.nodeid;
> >       ff->open_flags = outopen.open_flags;
> > +
> > +     /* Inode revalidation was bypassed previously for type other than
> > +      * directories, revalidate now as we got fresh attributes.
> > +      */
> > +     if (reval_inode) {
> > +             err = fuse_atomic_open_revalidate_inode(reval_inode, entry,
> > +                                                     forget, &outentry,
> > +                                                     attr_version);
> > +             if (err)
> > +                     goto out_free_ff;
> > +             inode = reval_inode;
> > +             kfree(forget);
> > +             goto out_finish_open;
> > +     }
> >       inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
> >                         &outentry.attr, entry_attr_timeout(&outentry), 0);
> >       if (!inode) {
> > @@ -659,6 +725,7 @@ static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
> >       if (!atomic_create || (outopen.open_flags & FOPEN_FILE_CREATED))
> >               fuse_dir_changed(dir);
> >       err = finish_open(file, entry, generic_file_open);
> > +out_finish_open:
> >       if (err) {
> >               fi = get_fuse_inode(inode);
> >               fuse_sync_release(fi, ff, flags);
> > @@ -686,7 +753,7 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
> >       if (fc->no_atomic_create)
> >               return -ENOSYS;
> >
> > -     err = fuse_atomic_do_common(dir, entry, NULL, file, flags, mode,
> > +     err = fuse_do_atomic_common(dir, entry, NULL, file, NULL, flags, mode,
> >                                   FUSE_ATOMIC_CREATE);
> >       /* If atomic create is not implemented then indicate in fc so that next
> >        * request falls back to normal create instead of going into libufse and
> > @@ -699,9 +766,10 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
> >       return err;
> >  }
> >
> > -static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> > -                             struct dentry **alias, struct file *file,
> > -                             unsigned int flags, umode_t mode)
> > +int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> > +                     struct dentry **alias, struct file *file,
> > +                     struct inode *reval_inode, unsigned int flags,
> > +                     umode_t mode)
> >  {
> >       int err;
> >       struct fuse_conn *fc = get_fuse_conn(dir);
> > @@ -709,8 +777,8 @@ static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> >       if (!fc->do_atomic_open)
> >               return -ENOSYS;
> >
> > -     err = fuse_atomic_do_common(dir, entry, alias, file, flags, mode,
> > -                                 FUSE_ATOMIC_OPEN);
> > +     err = fuse_do_atomic_common(dir, entry, alias, file, reval_inode, flags,
> > +                                 mode, FUSE_ATOMIC_OPEN);
> >       /* Atomic open imply atomic trunc as well i.e truncate should be performed
> >        * as part of atomic open call itself.
> >        */
> > @@ -718,7 +786,6 @@ static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> >               if (fc->do_atomic_open)
> >                       fc->atomic_o_trunc = 1;
> >       }
> > -
> >       return err;
> >  }
> >
> > @@ -738,7 +805,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> >
> >       if (!create) {
> >               err = fuse_do_atomic_open(dir, entry, &alias,
> > -                                       file, flags, mode);
> > +                                       file, NULL, flags, mode);
> >               res = alias;
> >               if (!err || err != -ENOSYS)
> >                       goto out_dput;
> > @@ -774,7 +841,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> >        * 'mknod' + 'open' requests.
> >        */
> >       if (err == -ENOSYS) {
> > -             err = fuse_atomic_do_common(dir, entry, NULL, file, flags,
> > +             err = fuse_do_atomic_common(dir, entry, NULL, file, NULL, flags,
> >                                           mode, FUSE_CREATE);
> >       }
> >       if (err == -ENOSYS) {
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 829094451774..2b0548163249 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -125,11 +125,15 @@ static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
> >  }
> >
> >  struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> > -                              unsigned int open_flags, bool isdir)
> > +                              struct file *file, unsigned int open_flags,
> > +                              bool isdir)
> >  {
> >       struct fuse_conn *fc = fm->fc;
> >       struct fuse_file *ff;
> > +     struct dentry *dentry = NULL;
> > +     struct dentry *parent = NULL;
> >       int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
> > +     int ret;
> >
> >       ff = fuse_file_alloc(fm);
> >       if (!ff)
> > @@ -138,6 +142,11 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> >       ff->fh = 0;
> >       /* Default for no-open */
> >       ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
> > +
> > +     /* For directories we already had lookup */
> > +     if (!isdir && fc->do_atomic_open && file != NULL)
> > +             goto revalidate_atomic_open;
> > +
> >       if (isdir ? !fc->no_opendir : !fc->no_open) {
> >               struct fuse_open_out outarg;
> >               int err;
> > @@ -164,12 +173,27 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> >       ff->nodeid = nodeid;
> >
> >       return ff;
> > +
> > +revalidate_atomic_open:
> > +     dentry = file->f_path.dentry;
> > +     /* Get ref on parent */
> > +     parent = dget_parent(dentry);
> > +     open_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY);
> > +     ret = fuse_do_atomic_open(d_inode_rcu(parent), dentry, NULL, file,
> > +                               d_inode_rcu(dentry), open_flags, 0);
> > +     dput(parent);
> > +     if (ret)
> > +             goto err_out;
> > +     ff = file->private_data;
> > +     return ff;
> > +err_out:
> > +     return ERR_PTR(ret);
> >  }
> >
> >  int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
> >                bool isdir)
> >  {
> > -     struct fuse_file *ff = fuse_file_open(fm, nodeid, file->f_flags, isdir);
> > +     struct fuse_file *ff = fuse_file_open(fm, nodeid, file, file->f_flags, isdir);
> >
> >       if (!IS_ERR(ff))
> >               file->private_data = ff;
> > @@ -252,7 +276,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
> >       }
> >
> >       err = fuse_do_open(fm, get_node_id(inode), file, isdir);
> > -     if (!err)
> > +     if (!err && (!fc->do_atomic_open || isdir))
> >               fuse_finish_open(inode, file);
> >
> >  out:
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index 24793b82303f..5c83e4249b7e 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -1014,6 +1014,13 @@ void fuse_finish_open(struct inode *inode, struct file *file);
> >  void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
> >                      unsigned int flags);
> >
> > +/**
> > + * Send atomic create + open or lookup + open
> > + */
> > +int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> > +                     struct dentry **alias, struct file *file,
> > +                     struct inode *reval_inode, unsigned int flags,
> > +                     umode_t mode);
> >  /**
> >   * Send RELEASE or RELEASEDIR request
> >   */
> > @@ -1317,7 +1324,8 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
> >  /* file.c */
> >
> >  struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> > -                              unsigned int open_flags, bool isdir);
> > +                              struct file *file, unsigned int open_flags,
> > +                              bool isdir);
> >  void fuse_file_release(struct inode *inode, struct fuse_file *ff,
> >                      unsigned int open_flags, fl_owner_t id, bool isdir);
> >
> > diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
> > index fbc09dab1f85..63106a54ba1a 100644
> > --- a/fs/fuse/ioctl.c
> > +++ b/fs/fuse/ioctl.c
> > @@ -408,7 +408,7 @@ static struct fuse_file *fuse_priv_ioctl_prepare(struct inode *inode)
> >       if (!S_ISREG(inode->i_mode) && !isdir)
> >               return ERR_PTR(-ENOTTY);
> >
> > -     return fuse_file_open(fm, get_node_id(inode), O_RDONLY, isdir);
> > +     return fuse_file_open(fm, get_node_id(inode), NULL, O_RDONLY, isdir);
> >  }
> >
> >  static void fuse_priv_ioctl_cleanup(struct inode *inode, struct fuse_file *ff)
> > --
> > 2.17.1
> >
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-04 19:18 ` [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Vivek Goyal
@ 2022-05-05  6:12   ` Dharmendra Hans
  2022-05-05 12:54     ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: Dharmendra Hans @ 2022-05-05  6:12 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel, Bernd Schubert

On Thu, May 5, 2022 at 12:48 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, May 02, 2022 at 03:55:18PM +0530, Dharmendra Singh wrote:
> > In FUSE, as of now, uncached lookups are expensive over the wire.
> > E.g additional latencies and stressing (meta data) servers from
> > thousands of clients. These lookup calls possibly can be avoided
> > in some cases. Incoming three patches address this issue.
>
> BTW, these patches are designed to improve performance by cutting down
> on number of fuse commands sent. Are there any performance numbers
> which demonstrate what kind of improvement you are seeing.
>
> Say, If I do kernel build, is the performance improvement observable?

Here are the numbers I took last time. These were taken on tmpfs to
actually see the effect of reduced calls. On local file systems it
might not be that much visible. But we have observed that on systems
where we have thousands of clients hammering the metadata servers, it
helps a lot (We did not take numbers yet as  we are required to change
a lot of our client code but would be doing it later on).

Note that for a change in performance number due to the new version of
these patches, we have just refactored the code and functionality has
remained the same since then.

here is the link to the performance numbers
https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 2/3] FUSE: Implement atomic lookup + open
  2022-05-04 18:20   ` Vivek Goyal
@ 2022-05-05  6:39     ` Dharmendra Hans
  0 siblings, 0 replies; 37+ messages in thread
From: Dharmendra Hans @ 2022-05-05  6:39 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Bernd Schubert, Dharmendra Singh

On Wed, May 4, 2022 at 11:50 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, May 02, 2022 at 03:55:20PM +0530, Dharmendra Singh wrote:
> > From: Dharmendra Singh <dsingh@ddn.com>
> >
> > We can optimize aggressive lookups which are triggered when
> > there is normal open for file/dir (dentry is new/negative).
> >
> > Here in this case since we are anyway going to open the file/dir
> > with USER SPACE, avoid this separate lookup call into libfuse
> > and combine it with open call into libfuse.
> >
> > This lookup + open in single call to libfuse has been named
> > as atomic open. It is expected that USER SPACE opens the file
> > and fills in the attributes, which are then used to make inode
> > stand/revalidate in the kernel cache.
> >
> > Signed-off-by: Dharmendra Singh <dsingh@ddn.com>
> > ---
> >  fs/fuse/dir.c             | 78 ++++++++++++++++++++++++++++++---------
> >  fs/fuse/fuse_i.h          |  3 ++
> >  fs/fuse/inode.c           |  4 +-
> >  include/uapi/linux/fuse.h |  2 +
> >  4 files changed, 69 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index cad3322a007f..6879d3a86796 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -516,18 +516,18 @@ static int get_security_context(struct dentry *entry, umode_t mode,
> >  }
> >
> >  /*
> > - * Atomic create+open operation
> > - *
> > - * If the filesystem doesn't support this, then fall back to separate
> > - * 'mknod' + 'open' requests.
> > + * This is common function for initiating atomic operations into libfuse.
> > + * Currently being used by Atomic create(atomic lookup + create) and
> > + * Atomic open(atomic lookup + open).
> >   */
> > -static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > -                         struct file *file, unsigned int flags,
> > -                         umode_t mode, uint32_t opcode)
> > +static int fuse_atomic_do_common(struct inode *dir, struct dentry *entry,
> > +                              struct dentry **alias, struct file *file,
> > +                              unsigned int flags, umode_t mode, uint32_t opcode)
>
> If this common function is really useful for atomic create + atomic open,
> I will first add a patch which adds a common function and makes FUSE_CREATE_EXT use that function. And in later patch add functionality to do atomic open.
> Just makes code more readable.

This func is commonly used by atomic create and atomic open, both.
atomic open has fuse_do_atomic_open wrapper over it. I just renamed
this func except doing it in a separate patch (just to avoid another
patch though I understand that it might be a little hard to read the
code).


>
> >  {
> >       int err;
> >       struct inode *inode;
> >       struct fuse_mount *fm = get_fuse_mount(dir);
> > +     struct fuse_conn *fc = get_fuse_conn(dir);
> >       FUSE_ARGS(args);
> >       struct fuse_forget_link *forget;
> >       struct fuse_create_in inarg;
> > @@ -539,9 +539,13 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> >       void *security_ctx = NULL;
> >       u32 security_ctxlen;
> >       bool atomic_create = (opcode == FUSE_ATOMIC_CREATE ? true : false);
> > +     bool create_op = (opcode == FUSE_CREATE ||
> > +                       opcode == FUSE_ATOMIC_CREATE) ? true : false;
> > +     if (alias)
> > +             *alias = NULL;
> >
> >       /* Userspace expects S_IFREG in create mode */
> > -     BUG_ON((mode & S_IFMT) != S_IFREG);
> > +     BUG_ON(create_op && (mode & S_IFMT) != S_IFREG);
> >
> >       forget = fuse_alloc_forget();
> >       err = -ENOMEM;
> > @@ -553,7 +557,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> >       if (!ff)
> >               goto out_put_forget_req;
> >
> > -     if (!fm->fc->dont_mask)
> > +     if (!fc->dont_mask)
> >               mode &= ~current_umask();
> >
> >       flags &= ~O_NOCTTY;
> > @@ -642,8 +646,9 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> >                               err = PTR_ERR(res);
> >                               goto out_err;
> >                       }
> > -                     /* res is expected to be NULL since its REG file */
> > -                     WARN_ON(res);
> > +                     entry = res;
> > +                     if (alias)
> > +                             *alias = res;
> >               }
> >       }
> >       fuse_change_entry_timeout(entry, &outentry);
> > @@ -681,8 +686,8 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
> >       if (fc->no_atomic_create)
> >               return -ENOSYS;
> >
> > -     err = fuse_create_open(dir, entry, file, flags, mode,
> > -                            FUSE_ATOMIC_CREATE);
> > +     err = fuse_atomic_do_common(dir, entry, NULL, file, flags, mode,
> > +                                 FUSE_ATOMIC_CREATE);
> >       /* If atomic create is not implemented then indicate in fc so that next
> >        * request falls back to normal create instead of going into libufse and
> >        * returning with -ENOSYS.
> > @@ -694,6 +699,29 @@ static int fuse_atomic_create(struct inode *dir, struct dentry *entry,
> >       return err;
> >  }
> >
> > +static int fuse_do_atomic_open(struct inode *dir, struct dentry *entry,
> > +                             struct dentry **alias, struct file *file,
> > +                             unsigned int flags, umode_t mode)
> > +{
> > +     int err;
> > +     struct fuse_conn *fc = get_fuse_conn(dir);
> > +
> > +     if (!fc->do_atomic_open)
> > +             return -ENOSYS;
> > +
> > +     err = fuse_atomic_do_common(dir, entry, alias, file, flags, mode,
> > +                                 FUSE_ATOMIC_OPEN);
> > +     /* Atomic open imply atomic trunc as well i.e truncate should be performed
> > +      * as part of atomic open call itself.
> > +      */
> > +     if (!fc->atomic_o_trunc) {
> > +             if (fc->do_atomic_open)
> > +                     fc->atomic_o_trunc = 1;
> > +     }
> > +
> > +     return err;
> > +}
> > +
> >  static int fuse_mknod(struct user_namespace *, struct inode *, struct dentry *,
> >                     umode_t, dev_t);
> >  static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> > @@ -702,12 +730,23 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> >  {
> >       int err;
> >       struct fuse_conn *fc = get_fuse_conn(dir);
> > -     struct dentry *res = NULL;
> > +     struct dentry *res = NULL, *alias = NULL;
> >       bool create = flags & O_CREAT ? true : false;
> >
> >       if (fuse_is_bad(dir))
> >               return -EIO;
> >
> > +     if (!create) {
> > +             err = fuse_do_atomic_open(dir, entry, &alias,
> > +                                       file, flags, mode);
>
> So this is the core change of behavior. As of now, if we are not doing
> create operation, we just lookup and return. But now you are doing
> open + lookup in a single operation. Interesting. I am not sure if
> it breaks anything in VFS.

Yes, here we are handling optimizing lookups for open calls. If atomic
open is implemented underlying we use it otherwise falls back to
normal open.

>
> > +             res = alias;
> > +             if (!err || err != -ENOSYS)
> > +                     goto out_dput;
> > +     }
> > +      /*
> > +       * ENOSYS fall back for open- user space does not have full
> > +       * atomic open.
> > +       */
>
> This ENOSYS stuff all the place is making these patches look very unclean
> to me. A part of me says that should we negotiate these as feature bits.
> Having said that feature bits are precious and should not be used for
> trivial purposes. Hmm..., may be we can make handle ENOSYS little better
> in general.

Actually atomic open is based upon init bits considering the 3rd patch
which optimizes lookups in d_revalidate(). I would see if I can clean
ENOSYS a bit.
>
> >       if ((!create || fc->no_atomic_create) && d_in_lookup(entry)) {
> >               res = fuse_lookup(dir, entry, 0);
> >               if (IS_ERR(res))
> > @@ -730,9 +769,14 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
> >       /* Libfuse/user space has not implemented atomic create, therefore
> >        * fall back to normal create.
> >        */
> > -     if (err == -ENOSYS)
> > -             err = fuse_create_open(dir, entry, file, flags, mode,
> > -                                    FUSE_CREATE);
> > +     /* Atomic create+open operation
> > +      * If the filesystem doesn't support this, then fall back to separate
> > +      * 'mknod' + 'open' requests.
> > +      */
> > +     if (err == -ENOSYS) {
> > +             err = fuse_atomic_do_common(dir, entry, NULL, file, flags,
> > +                                         mode, FUSE_CREATE);
> > +     }
> >       if (err == -ENOSYS) {
> >               fc->no_create = 1;
> >               goto mknod;
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index d577a591ab16..24793b82303f 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -669,6 +669,9 @@ struct fuse_conn {
> >       /** Is open/release not implemented by fs? */
> >       unsigned no_open:1;
> >
> > +     /** Is atomic open implemented by fs ? */
> > +     unsigned do_atomic_open : 1;
> > +
> >       /** Is atomic create not implemented by fs? */
> >       unsigned no_atomic_create:1;
> >
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index ee846ce371d8..5f667de69115 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -1190,6 +1190,8 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
> >                               fc->setxattr_ext = 1;
> >                       if (flags & FUSE_SECURITY_CTX)
> >                               fc->init_security = 1;
> > +                     if (flags & FUSE_DO_ATOMIC_OPEN)
> > +                             fc->do_atomic_open = 1;
>
> Hey you are negotiating FUSE_DO_ATOMIC_OPEN flag. If that's the case why
> do you have to deal with -ENOSYS in fuse_do_atomic_open(). You should
> be able to just check.
>
> if (!create && fc->do_atomic_open) {
>         fuse_do_atomic_open().
> }

You are right. This happens due to the reason that we have auto atomic
create check in atomic create but yes, we do not need ENOSYS here. I
will clean it.


> I have yet to check what happens if file does not exist.
>
> >               } else {
> >                       ra_pages = fc->max_read / PAGE_SIZE;
> >                       fc->no_lock = 1;
> > @@ -1235,7 +1237,7 @@ void fuse_send_init(struct fuse_mount *fm)
> >               FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
> >               FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
> >               FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
> > -             FUSE_SECURITY_CTX;
> > +             FUSE_SECURITY_CTX | FUSE_DO_ATOMIC_OPEN;
> >  #ifdef CONFIG_FUSE_DAX
> >       if (fm->fc->dax)
> >               flags |= FUSE_MAP_ALIGNMENT;
> > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> > index e4b56004b148..ab91e391241a 100644
> > --- a/include/uapi/linux/fuse.h
> > +++ b/include/uapi/linux/fuse.h
> > @@ -391,6 +391,7 @@ struct fuse_file_lock {
> >  /* bits 32..63 get shifted down 32 bits into the flags2 field */
> >  #define FUSE_SECURITY_CTX    (1ULL << 32)
> >  #define FUSE_HAS_INODE_DAX   (1ULL << 33)
> > +#define FUSE_DO_ATOMIC_OPEN  (1ULL << 34)
> >
> >  /**
> >   * CUSE INIT request/reply flags
> > @@ -540,6 +541,7 @@ enum fuse_opcode {
> >       FUSE_REMOVEMAPPING      = 49,
> >       FUSE_SYNCFS             = 50,
> >       FUSE_ATOMIC_CREATE      = 51,
> > +     FUSE_ATOMIC_OPEN        = 52,
> >
> >       /* CUSE specific operations */
> >       CUSE_INIT               = 4096,
> > --
> > 2.17.1
> >
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-05  6:12   ` Dharmendra Hans
@ 2022-05-05 12:54     ` Vivek Goyal
  2022-05-05 15:13       ` Bernd Schubert
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-05 12:54 UTC (permalink / raw)
  To: Dharmendra Hans
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel, Bernd Schubert

On Thu, May 05, 2022 at 11:42:51AM +0530, Dharmendra Hans wrote:
> On Thu, May 5, 2022 at 12:48 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Mon, May 02, 2022 at 03:55:18PM +0530, Dharmendra Singh wrote:
> > > In FUSE, as of now, uncached lookups are expensive over the wire.
> > > E.g additional latencies and stressing (meta data) servers from
> > > thousands of clients. These lookup calls possibly can be avoided
> > > in some cases. Incoming three patches address this issue.
> >
> > BTW, these patches are designed to improve performance by cutting down
> > on number of fuse commands sent. Are there any performance numbers
> > which demonstrate what kind of improvement you are seeing.
> >
> > Say, If I do kernel build, is the performance improvement observable?
> 
> Here are the numbers I took last time. These were taken on tmpfs to
> actually see the effect of reduced calls. On local file systems it
> might not be that much visible. But we have observed that on systems
> where we have thousands of clients hammering the metadata servers, it
> helps a lot (We did not take numbers yet as  we are required to change
> a lot of our client code but would be doing it later on).
> 
> Note that for a change in performance number due to the new version of
> these patches, we have just refactored the code and functionality has
> remained the same since then.
> 
> here is the link to the performance numbers
> https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/

There is a lot going in that table. Trying to understand it. 

- Why care about No-Flush. I mean that's independent of these changes,
  right?  I am assuming this means that upon file close do not send
  a flush to fuse server. Not sure how bringing No-Flush into the
  mix is helpful here.

- What is "Patched Libfuse"? I am assuming that these are changes
  needed in libfuse to support atomic create + atomic open. Similarly
  assuming "Patched FuseK" means patched kernel with your changes.

  If this is correct, I would probably only be interested in 
  looking at "Patched Libfuse + Patched FuseK" numbers to figure out
  what's the effect of your changes w.r.t vanilla kernel + libfuse.
  Am I understanding it right?

- I am wondering why do we measure "Sequential" and "Random" patterns. 
  These optimizations are primarily for file creation + file opening
  and I/O pattern should not matter. 

- Also wondering why performance of Read/s improves. Assuming once
  file has been opened, I think your optimizations get out of the
  way (no create, no open) and we are just going through data path of
  reading file data and no lookups happening. If that's the case, why
  do Read/s numbers show an improvement.

- Why do we measure "Patched Libfuse". It shows performance regression
  of 4-5% in table 0B, Sequential workoad. That sounds bad. So without
  any optimization kicking in, it has a performance cost.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-05  4:51         ` Dharmendra Hans
@ 2022-05-05 14:26           ` Vivek Goyal
  2022-05-06  5:34             ` Dharmendra Hans
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-05 14:26 UTC (permalink / raw)
  To: Dharmendra Hans
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Bernd Schubert, Dharmendra Singh

On Thu, May 05, 2022 at 10:21:21AM +0530, Dharmendra Hans wrote:
> On Wed, May 4, 2022 at 8:17 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, May 04, 2022 at 09:56:49AM +0530, Dharmendra Hans wrote:
> > > On Wed, May 4, 2022 at 1:24 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> > > > > From: Dharmendra Singh <dsingh@ddn.com>
> > > > >
> > > > > When we go for creating a file (O_CREAT), we trigger
> > > > > a lookup to FUSE USER SPACE. It is very  much likely
> > > > > that file does not exist yet as O_CREAT is passed to
> > > > > open(). This lookup can be avoided and can be performed
> > > > > as part of create call into libfuse.
> > > > >
> > > > > This lookup + create in single call to libfuse and finally
> > > > > to USER SPACE has been named as atomic create. It is expected
> > > > > that USER SPACE create the file, open it and fills in the
> > > > > attributes which are then used to make inode stand/revalidate
> > > > > in the kernel cache. Also if file was newly created(does not
> > > > > exist yet by this time) in USER SPACE then it should be indicated
> > > > > in `struct fuse_file_info` by setting a bit which is again used by
> > > > > libfuse to send some flags back to fuse kernel to indicate that
> > > > > that file was newly created. These flags are used by kernel to
> > > > > indicate changes in parent directory.
> > > >
> > > > Reading the existing code a little bit more and trying to understand
> > > > existing semantics. And that will help me unerstand what new is being
> > > > done.
> > > >
> > > > So current fuse_atomic_open() does following.
> > > >
> > > > A. Looks up dentry (if d_in_lookup() is set).
> > > > B. If dentry is positive or O_CREAT is not set, return.
> > > > C. If server supports atomic create + open, use that to create file and
> > > >    open it as well.
> > > > D. If server does not support atomic create + open, just create file
> > > >    using "mknod" and return. VFS will take care of opening the file.
> > > >
> > > > Now with this patch, new flow is.
> > > >
> > > > A. Look up dentry if d_in_lookup() is set as well as either file is not
> > > >    being created or fc->no_atomic_create is set. This basiclally means
> > > >    skip lookup if atomic_create is supported and file is being created.
> > > >
> > > > B. Remains same. if dentry is positive or O_CREATE is not set, return.
> > > >
> > > > C. If server supports new atomic_create(), use that.
> > > >
> > > > D. If not, if server supports atomic create + open, use that
> > > >
> > > > E. If not, fall back to mknod and do not open file.
> > > >
> > > > So to me this new functionality is basically atomic "lookup + create +
> > > > open"?
> > > >
> > > > Or may be not. I see we check "fc->no_create" and fallback to mknod.
> > > >
> > > >         if (fc->no_create)
> > > >                 goto mknod;
> > > >
> > > > So fc->no_create is representing both old atomic "create + open" as well
> > > > as new "lookup + create + open" ?
> > > >
> > > > It might be obvious to you, but it is not to me. So will be great if
> > > > you shed some light on this.
> > > >
> > >
> > > I think you got it right now. New atomic create does what you
> > > mentioned as new flow.  It does  lookup + create + open in single call
> > > (being called as atomic create) to USER SPACE.mknod is a special case
> >
> > Ok, naming is little confusing. I think we will have to put it in
> > commit message and where you define FUSE_ATOMIC_CREATE that what's
> > the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
> > ATOMIC w.r.t what?
> 
> Sure, I would update the commit message to make the distinction clear
> between the two. This operation is atomic w.r.t to USER SPACE FUSE
> implementations. i.e USER SPACE would be performing all these
> operations in a single call to it.

I think even FUSE_CREAT is doing same thing. Creating file, opening and
doing lookup and sending all the data. So that's not the difference
between the two, IMHO. And that's why I am getting confused with the
naming.

From user space file server perspective, only extra operation seems
to be that it sends a flag in response telling the client whether
file was actually created or it already existed. So to me it just
sounds little extension of existing FUSE_CREATE command and that's
why I thought calling it FUSE_CREATE_EXT is probably better naming.


> 
> 
> > May be atomic here means that "lookup + create + open" is a single operation.
> > But then even FUSE_CREATE is atomic because "creat + open" is a single
> > operation.
> >
> > In fact FUSE_CREATE does lookup anyway and returns all the information
> > in fuse_entry_out.
> >
> > IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
> > later also carries information in reply whether file was actually created
> > or not (FOPEN_FILE_CREATED). This will be set if file did not exist
> > already and it was created indeed. Is that right?
> 
> FUSE_CREATE is atomic but upto level of libfuse. Libfuse separates it
> into two calls, create and lookup separately into USER SPACE FUSE
> implementation.

I am not sure what do you mean by "libfuse separates it into two calls,
create and lookup separately". I guess you are referring to lo_create()
in example/passthrough_ll.c which first creates and opens file and
then looks it up and replies.

        fd = openat(lo_fd(req, parent), name,
                    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
	err = lo_do_lookup(req, parent, name, &e);
	fuse_reply_create(req, &e, fi);

I am looking at your proposal for atomic_create implementation here.

https://github.com/libfuse/libfuse/pull/673/commits/88cd25b2857f2bb213d01afbcfd666787d1e6893#diff-a36385ec8fb753d6f4492a5f0d3c6a5750bd370b50df6ef0610efdcd3f8880ffR787

It is doing exactly same thing as lo_create(), except one difference that
it is checking first if file exists. It essentially is doing this.

A. newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
B. fd = openat(lo_fd(req, parent), name,
		    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
C. err = lo_do_lookup(req, parent, name, &e);
D. fuse_reply_create(req, &e, fi);

So what do you mean by libfuse makes it two calls. 

And I think above implementation is racy. What if filesystem is
shared and another client creates the file between calls to
A and B. You will think you created the file but some other
client created it.

So if intent is to know whether we created the file or not, then
you should probably do openat() with O_EXCL flag. If that succeeds
you know you created the file. If it fails with -EEXIST, then you
know file is already there. That's what virtiofs does.

Anyway, coming back to the point. IMHO, from server perspective,
there is no atomicity difference between FUSE_CREATE and
FUSE_ATOMIC_CREATE. Only difference seems to be to send addditional
information back to the client to tell it whether file was created
or not.

In fact for shared filesystem this is probably a problem. What if
guest's cache is stale and it does not know about the file. A client
B creates the file and we think we did not create the file. And we
will return with FOPEN_FILE_CREATED = 0. And in that case client
will not call fuse_dir_changed(). But that seems wrong in case
of shared filesystems. I am concerned about virtiofs which can
be shared between different guests.

Miklos, WDYT?

May be it is not a huge concern. If one guest drops a file, other guest
will not invalidate its dir attrs till timeout happens. Case of shared
filesystem is very tricky with fuse. And sometimes it is not clear
to me what kind of coherency matters.

So in this case say I am booted with cache=auto, if guest B drops a file
bar/foo.txt and guest A does open(bar/foo.txt, O_CREAT), then should
guest A invalidate the attrs of bar/ right away or it will be invalidated
anyway after a second. 

Anyway.., my core point is that difference between FUSE_CREATE and
FUSE_ATOMIC_CREATE is just one flag FOPEN_FILE_CREATED which tells
client whether file was actually created or not. And that is used
to determine whether to invalidate parent dir attributes or not. It
does not have anything extra in terms of ATOMICITY as far as I can
see and that's what confuses me.



> This FUSE_ATOMIC_CREATE does all these ops in a single call to FUSE
> implementations.  We do not want to break any existing FUSE
> implementations, therefore it is being introduced as a new feature. I
> forgot to include links to libfuse patches as well. That would have
> made it much clearer.  Here is the link to libfuse patch for this call
> https://github.com/libfuse/libfuse/pull/673.
> 
> >
> > I see FOPEN_FILE_CREATED is being used to avoid calling
> > fuse_dir_changed(). That sounds like a separate optimization and probably
> > should be in a separate patch.
> 
> FUSE_ATOMIC_CREATE needs to send back info about if  file was actually
> created or not (This is suggestion from Miklos) to correctly convey if
> the parent dir is really changing or not. I included this as part of
> this patch itself instead of having it as a separate patch.

This needs little more thought w.r.t shared filesystems.

> 
> > IOW, I think this patch should be broken in to multiple pieces. First
> > piece seems to be avoiding lookup() and given the way it is implemented,
> > looks like we can avoid lookup() even by using existing FUSE_CREATE
> > command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?
> 
> Its not only about changing fuse kernel code but USER SPACE
> implementations also. If we change the you are suggesting we would be
> required to twist many things at libfuse and FUSE low level API. So to
> keep things simple and not to break any existing implementations we
> have kept it as new call (we now pass `struct stat` to USER SPACE FUSE
> to filled in).
> 
> > And once that is done, a separate patch should probably should explain
> > the problem and say fuse_dir_changed() call can be avoided if we knew
> > if file was actually created or it was already existing there. And that's
> > when one need to introduce a new command. Given this is just an extension
> > of existing FUSE_CREATE command and returns additiona info about
> > FOPEN_FILE_CREATED, we probably should simply call it FUSE_CREATE_EXT
> > and explain how this operation is different from FUSE_CREATE.
> 
> As explained above, we are not doing this way as we have kept in mind
> all existing libfuse APIs as well.
>

I am not seeing what will break in existing passthrough_ll.c if I
simply used FUSE_CREATE for a file which already exists.

        fd = openat(lo_fd(req, parent), name,
                    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);

This call should still succeed even if file already exists. (until and
unless called it with O_EXCL and in that case failing is the correct
behavior). 

Thanks
Vivek


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-05 12:54     ` Vivek Goyal
@ 2022-05-05 15:13       ` Bernd Schubert
  2022-05-05 19:59         ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2022-05-05 15:13 UTC (permalink / raw)
  To: Vivek Goyal, Dharmendra Hans
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel



On 5/5/22 14:54, Vivek Goyal wrote:
> On Thu, May 05, 2022 at 11:42:51AM +0530, Dharmendra Hans wrote:
>> Here are the numbers I took last time. These were taken on tmpfs to
>> actually see the effect of reduced calls. On local file systems it
>> might not be that much visible. But we have observed that on systems
>> where we have thousands of clients hammering the metadata servers, it
>> helps a lot (We did not take numbers yet as  we are required to change
>> a lot of our client code but would be doing it later on).
>>
>> Note that for a change in performance number due to the new version of
>> these patches, we have just refactored the code and functionality has
>> remained the same since then.
>>
>> here is the link to the performance numbers
>> https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
> 
> There is a lot going in that table. Trying to understand it.
> 
> - Why care about No-Flush. I mean that's independent of these changes,
>    right?  I am assuming this means that upon file close do not send
>    a flush to fuse server. Not sure how bringing No-Flush into the
>    mix is helpful here.


It is a basically removing another call from kernel to user space. The 
calls there are, the lower is the resulting percentage for atomic-open.


> 
> - What is "Patched Libfuse"? I am assuming that these are changes
>    needed in libfuse to support atomic create + atomic open. Similarly
>    assuming "Patched FuseK" means patched kernel with your changes.

Yes, I did that to ensure there is no regression with the patches, when 
the other side is not patched.

> 
>    If this is correct, I would probably only be interested in
>    looking at "Patched Libfuse + Patched FuseK" numbers to figure out
>    what's the effect of your changes w.r.t vanilla kernel + libfuse.
>    Am I understanding it right?

Yes.

> 
> - I am wondering why do we measure "Sequential" and "Random" patterns.
>    These optimizations are primarily for file creation + file opening
>    and I/O pattern should not matter.

bonnie++ does this automatically and it just convenient to take the 
bonnie++ csv value and to paste them into a table.

In our HPC world mdtest is more common, but it has MPI as requirement - 
make it harder to run. Reproducing the values with bonnie++ should be 
rather easy for you.

Only issue with bonnie++ is that bonnie++ by default does not run 
multi-threaded and the old 3rd party perl scripts I had to let it run 
with multiple processes and to sum up the values don't work anymore with 
recent perl versions. I need to find some time to fix that.


> 
> - Also wondering why performance of Read/s improves. Assuming once
>    file has been opened, I think your optimizations get out of the
>    way (no create, no open) and we are just going through data path of
>    reading file data and no lookups happening. If that's the case, why
>    do Read/s numbers show an improvement.

That is now bonnie++ works. It creates the files, closes them (which 
causes the flush) and then re-opens for stat and read - atomic open 
comes into the picture here. Also read() is totally skipped when the 
files are empty - which is why one should use something like 1B files.

If you have another metadata benchmark - please let us know.

> 
> - Why do we measure "Patched Libfuse". It shows performance regression
>    of 4-5% in table 0B, Sequential workoad. That sounds bad. So without
>    any optimization kicking in, it has a performance cost.

Yes, I'm not sure yet. There is not so much code that has changed on the 
libfuse side.
However the table needs to be redone with fixed libfuse - limiting the 
number of threads caused a permanent libfuse thread creation and destruction

https://github.com/libfuse/libfuse/pull/652

The numbers in table are also with paththrough_ll, which has its own 
issue due to linear inode search. paththrough_hp uses a C++ map and 
avoids that. I noticed too late when I started to investigate why there 
are regressions....

Also the table made me to investigate/profile all the fuse operations, 
which resulted in my waitq question. Please see that thread for more 
details 
https://lore.kernel.org/lkml/9326bb76-680f-05f6-6f78-df6170afaa2c@fastmail.fm/T/

Regarding atomic-open/create with avoiding lookup/revalidate, our 
primary goal is to reduce network calls. A file system that handles it 
locally only reduces the number of fuse kernel/user space crossing. A 
network file system that fully supports it needs to do the atomic open 
(or in old terms lookup-intent-open) on the server side of the network 
and needs to transfer attributes together with the open result.

Lustre does this, although I cannot easily point you to the right code. 
It all started almost two decades ago:
https://groups.google.com/g/lucky.linux.fsdevel/c/iYNFIIrkJ1s


BeeGFS does this as well
https://git.beegfs.io/pub/v7/-/blob/master/client_module/source/filesystem/FhgfsOpsInode.c
See for examaple FhgfsOps_atomicOpen() and FhgfsOps_createIntent()

(FhGFS is the old name when I was still involved in the project.)

 From my head I'm not sure if NFS does it over the wire, maybe v4.


Thanks,
Bernd







^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-05 15:13       ` Bernd Schubert
@ 2022-05-05 19:59         ` Vivek Goyal
  2022-05-11  9:40           ` Miklos Szeredi
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-05 19:59 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Dharmendra Hans, Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel

On Thu, May 05, 2022 at 05:13:00PM +0200, Bernd Schubert wrote:
> 
> 
> On 5/5/22 14:54, Vivek Goyal wrote:
> > On Thu, May 05, 2022 at 11:42:51AM +0530, Dharmendra Hans wrote:
> > > Here are the numbers I took last time. These were taken on tmpfs to
> > > actually see the effect of reduced calls. On local file systems it
> > > might not be that much visible. But we have observed that on systems
> > > where we have thousands of clients hammering the metadata servers, it
> > > helps a lot (We did not take numbers yet as  we are required to change
> > > a lot of our client code but would be doing it later on).
> > > 
> > > Note that for a change in performance number due to the new version of
> > > these patches, we have just refactored the code and functionality has
> > > remained the same since then.
> > > 
> > > here is the link to the performance numbers
> > > https://lore.kernel.org/linux-fsdevel/20220322121212.5087-1-dharamhans87@gmail.com/
> > 
> > There is a lot going in that table. Trying to understand it.
> > 
> > - Why care about No-Flush. I mean that's independent of these changes,
> >    right?  I am assuming this means that upon file close do not send
> >    a flush to fuse server. Not sure how bringing No-Flush into the
> >    mix is helpful here.
> 
> 
> It is a basically removing another call from kernel to user space. The calls
> there are, the lower is the resulting percentage for atomic-open.

Ok. You want to get rid of FUSE_FLUSH call so that % of FUSE_LOOKUP calls
go up and that will effectively show higher % of improvement due to
this. 

> 
> 
> > 
> > - What is "Patched Libfuse"? I am assuming that these are changes
> >    needed in libfuse to support atomic create + atomic open. Similarly
> >    assuming "Patched FuseK" means patched kernel with your changes.
> 
> Yes, I did that to ensure there is no regression with the patches, when the
> other side is not patched.
> 
> > 
> >    If this is correct, I would probably only be interested in
> >    looking at "Patched Libfuse + Patched FuseK" numbers to figure out
> >    what's the effect of your changes w.r.t vanilla kernel + libfuse.
> >    Am I understanding it right?
> 
> Yes.
> 
> > 
> > - I am wondering why do we measure "Sequential" and "Random" patterns.
> >    These optimizations are primarily for file creation + file opening
> >    and I/O pattern should not matter.
> 
> bonnie++ does this automatically and it just convenient to take the bonnie++
> csv value and to paste them into a table.
> 
> In our HPC world mdtest is more common, but it has MPI as requirement - make
> it harder to run. Reproducing the values with bonnie++ should be rather easy
> for you.
> 
> Only issue with bonnie++ is that bonnie++ by default does not run
> multi-threaded and the old 3rd party perl scripts I had to let it run with
> multiple processes and to sum up the values don't work anymore with recent
> perl versions. I need to find some time to fix that.
> 
> 
> > 
> > - Also wondering why performance of Read/s improves. Assuming once
> >    file has been opened, I think your optimizations get out of the
> >    way (no create, no open) and we are just going through data path of
> >    reading file data and no lookups happening. If that's the case, why
> >    do Read/s numbers show an improvement.
> 
> That is now bonnie++ works. It creates the files, closes them (which causes
> the flush) and then re-opens for stat and read - atomic open comes into the
> picture here. Also read() is totally skipped when the files are empty -
> which is why one should use something like 1B files.
> 
> If you have another metadata benchmark - please let us know.

Once I was pointed at smallfile.

https://github.com/distributed-system-analysis/smallfile

I ran this when I was posting the patches for virtiofs.

https://patchwork.kernel.org/project/linux-fsdevel/cover/20181210171318.16998-1-vgoyal@redhat.com/

See if this is something interesting to you.


> 
> > 
> > - Why do we measure "Patched Libfuse". It shows performance regression
> >    of 4-5% in table 0B, Sequential workoad. That sounds bad. So without
> >    any optimization kicking in, it has a performance cost.
> 
> Yes, I'm not sure yet. There is not so much code that has changed on the
> libfuse side.
> However the table needs to be redone with fixed libfuse - limiting the
> number of threads caused a permanent libfuse thread creation and destruction
> 
> https://github.com/libfuse/libfuse/pull/652
> 
> The numbers in table are also with paththrough_ll, which has its own issue
> due to linear inode search. paththrough_hp uses a C++ map and avoids that. I
> noticed too late when I started to investigate why there are regressions....
> 
> Also the table made me to investigate/profile all the fuse operations, which
> resulted in my waitq question. Please see that thread for more details https://lore.kernel.org/lkml/9326bb76-680f-05f6-6f78-df6170afaa2c@fastmail.fm/T/
> 
> Regarding atomic-open/create with avoiding lookup/revalidate, our primary
> goal is to reduce network calls. A file system that handles it locally only
> reduces the number of fuse kernel/user space crossing. A network file system
> that fully supports it needs to do the atomic open (or in old terms
> lookup-intent-open) on the server side of the network and needs to transfer
> attributes together with the open result.

Oh, I have no issues with the intent. I will like to see cut in network
traffic too (if we can do this without introducing problems). My primary
interest is that this kind of change should benefit virtiofs as well.

I am just trying to understand how much performance improvement is
actually there. And also trying to improve the quality of implementation.
Frankly speaking, it all seems very twisted and hard to read (and
hence maintain) code at this point fo time.

That's why I am going into the details to understand and suggest some
improvements.

Thanks
Vivek

> 
> Lustre does this, although I cannot easily point you to the right code. It
> all started almost two decades ago:
> https://groups.google.com/g/lucky.linux.fsdevel/c/iYNFIIrkJ1s
> 
> 
> BeeGFS does this as well
> https://git.beegfs.io/pub/v7/-/blob/master/client_module/source/filesystem/FhgfsOpsInode.c
> See for examaple FhgfsOps_atomicOpen() and FhgfsOps_createIntent()
> 
> (FhGFS is the old name when I was still involved in the project.)
> 
> From my head I'm not sure if NFS does it over the wire, maybe v4.
> 
> 
> Thanks,
> Bernd
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-05 14:26           ` Vivek Goyal
@ 2022-05-06  5:34             ` Dharmendra Hans
  2022-05-06 14:12               ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: Dharmendra Hans @ 2022-05-06  5:34 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Bernd Schubert, Dharmendra Singh

On Thu, May 5, 2022 at 7:56 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Thu, May 05, 2022 at 10:21:21AM +0530, Dharmendra Hans wrote:
> > On Wed, May 4, 2022 at 8:17 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Wed, May 04, 2022 at 09:56:49AM +0530, Dharmendra Hans wrote:
> > > > On Wed, May 4, 2022 at 1:24 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > >
> > > > > On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> > > > > > From: Dharmendra Singh <dsingh@ddn.com>
> > > > > >
> > > > > > When we go for creating a file (O_CREAT), we trigger
> > > > > > a lookup to FUSE USER SPACE. It is very  much likely
> > > > > > that file does not exist yet as O_CREAT is passed to
> > > > > > open(). This lookup can be avoided and can be performed
> > > > > > as part of create call into libfuse.
> > > > > >
> > > > > > This lookup + create in single call to libfuse and finally
> > > > > > to USER SPACE has been named as atomic create. It is expected
> > > > > > that USER SPACE create the file, open it and fills in the
> > > > > > attributes which are then used to make inode stand/revalidate
> > > > > > in the kernel cache. Also if file was newly created(does not
> > > > > > exist yet by this time) in USER SPACE then it should be indicated
> > > > > > in `struct fuse_file_info` by setting a bit which is again used by
> > > > > > libfuse to send some flags back to fuse kernel to indicate that
> > > > > > that file was newly created. These flags are used by kernel to
> > > > > > indicate changes in parent directory.
> > > > >
> > > > > Reading the existing code a little bit more and trying to understand
> > > > > existing semantics. And that will help me unerstand what new is being
> > > > > done.
> > > > >
> > > > > So current fuse_atomic_open() does following.
> > > > >
> > > > > A. Looks up dentry (if d_in_lookup() is set).
> > > > > B. If dentry is positive or O_CREAT is not set, return.
> > > > > C. If server supports atomic create + open, use that to create file and
> > > > >    open it as well.
> > > > > D. If server does not support atomic create + open, just create file
> > > > >    using "mknod" and return. VFS will take care of opening the file.
> > > > >
> > > > > Now with this patch, new flow is.
> > > > >
> > > > > A. Look up dentry if d_in_lookup() is set as well as either file is not
> > > > >    being created or fc->no_atomic_create is set. This basiclally means
> > > > >    skip lookup if atomic_create is supported and file is being created.
> > > > >
> > > > > B. Remains same. if dentry is positive or O_CREATE is not set, return.
> > > > >
> > > > > C. If server supports new atomic_create(), use that.
> > > > >
> > > > > D. If not, if server supports atomic create + open, use that
> > > > >
> > > > > E. If not, fall back to mknod and do not open file.
> > > > >
> > > > > So to me this new functionality is basically atomic "lookup + create +
> > > > > open"?
> > > > >
> > > > > Or may be not. I see we check "fc->no_create" and fallback to mknod.
> > > > >
> > > > >         if (fc->no_create)
> > > > >                 goto mknod;
> > > > >
> > > > > So fc->no_create is representing both old atomic "create + open" as well
> > > > > as new "lookup + create + open" ?
> > > > >
> > > > > It might be obvious to you, but it is not to me. So will be great if
> > > > > you shed some light on this.
> > > > >
> > > >
> > > > I think you got it right now. New atomic create does what you
> > > > mentioned as new flow.  It does  lookup + create + open in single call
> > > > (being called as atomic create) to USER SPACE.mknod is a special case
> > >
> > > Ok, naming is little confusing. I think we will have to put it in
> > > commit message and where you define FUSE_ATOMIC_CREATE that what's
> > > the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
> > > ATOMIC w.r.t what?
> >
> > Sure, I would update the commit message to make the distinction clear
> > between the two. This operation is atomic w.r.t to USER SPACE FUSE
> > implementations. i.e USER SPACE would be performing all these
> > operations in a single call to it.
>
> I think even FUSE_CREAT is doing same thing. Creating file, opening and
> doing lookup and sending all the data. So that's not the difference
> between the two, IMHO. And that's why I am getting confused with the
> naming.
>
> From user space file server perspective, only extra operation seems
> to be that it sends a flag in response telling the client whether
> file was actually created or it already existed. So to me it just
> sounds little extension of existing FUSE_CREATE command and that's
> why I thought calling it FUSE_CREATE_EXT is probably better naming.

The difference between the two is of atomicity from top to bottom i.e
single call from fuse kernel to user space file server. Generally
FUSE_CREATE goes upto libfuse low level API as atomic (check
fuse_lib_create()) and it gets separated there into two calls (create
+ getattr). This is not what we want as it results in two network
trips each for create and lookup to the file server.

>
> >
> >
> > > May be atomic here means that "lookup + create + open" is a single operation.
> > > But then even FUSE_CREATE is atomic because "creat + open" is a single
> > > operation.
> > >
> > > In fact FUSE_CREATE does lookup anyway and returns all the information
> > > in fuse_entry_out.
> > >
> > > IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
> > > later also carries information in reply whether file was actually created
> > > or not (FOPEN_FILE_CREATED). This will be set if file did not exist
> > > already and it was created indeed. Is that right?
> >
> > FUSE_CREATE is atomic but upto level of libfuse. Libfuse separates it
> > into two calls, create and lookup separately into USER SPACE FUSE
> > implementation.
>
> I am not sure what do you mean by "libfuse separates it into two calls,
> create and lookup separately". I guess you are referring to lo_create()
> in example/passthrough_ll.c which first creates and opens file and
> then looks it up and replies.
>
>         fd = openat(lo_fd(req, parent), name,
>                     (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
>         err = lo_do_lookup(req, parent, name, &e);
>         fuse_reply_create(req, &e, fi);
>
> I am looking at your proposal for atomic_create implementation here.
>
> https://github.com/libfuse/libfuse/pull/673/commits/88cd25b2857f2bb213d01afbcfd666787d1e6893#diff-a36385ec8fb753d6f4492a5f0d3c6a5750bd370b50df6ef0610efdcd3f8880ffR787
>
> It is doing exactly same thing as lo_create(), except one difference that
> it is checking first if file exists. It essentially is doing this.
>
> A. newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
> B. fd = openat(lo_fd(req, parent), name,
>                     (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
> C. err = lo_do_lookup(req, parent, name, &e);
> D. fuse_reply_create(req, &e, fi);
>
> So what do you mean by libfuse makes it two calls.
>
> And I think above implementation is racy. What if filesystem is
> shared and another client creates the file between calls to
> A and B. You will think you created the file but some other
> client created it.
>
> So if intent is to know whether we created the file or not, then
> you should probably do openat() with O_EXCL flag. If that succeeds
> you know you created the file. If it fails with -EEXIST, then you
> know file is already there. That's what virtiofs does.
>
> Anyway, coming back to the point. IMHO, from server perspective,
> there is no atomicity difference between FUSE_CREATE and
> FUSE_ATOMIC_CREATE. Only difference seems to be to send addditional
> information back to the client to tell it whether file was created
> or not.
>
> In fact for shared filesystem this is probably a problem. What if
> guest's cache is stale and it does not know about the file. A client
> B creates the file and we think we did not create the file. And we
> will return with FOPEN_FILE_CREATED = 0. And in that case client
> will not call fuse_dir_changed(). But that seems wrong in case
> of shared filesystems. I am concerned about virtiofs which can
> be shared between different guests.
>
> Miklos, WDYT?
>
> May be it is not a huge concern. If one guest drops a file, other guest
> will not invalidate its dir attrs till timeout happens. Case of shared
> filesystem is very tricky with fuse. And sometimes it is not clear
> to me what kind of coherency matters.
>
> So in this case say I am booted with cache=auto, if guest B drops a file
> bar/foo.txt and guest A does open(bar/foo.txt, O_CREAT), then should
> guest A invalidate the attrs of bar/ right away or it will be invalidated
> anyway after a second.
>
> Anyway.., my core point is that difference between FUSE_CREATE and
> FUSE_ATOMIC_CREATE is just one flag FOPEN_FILE_CREATED which tells
> client whether file was actually created or not. And that is used
> to determine whether to invalidate parent dir attributes or not. It
> does not have anything extra in terms of ATOMICITY as far as I can
> see and that's what confuses me.

You checked the wrong code. pasthrough_ll is not production code, we
do not use it at all, just using it to test these patches here. There
is a main patch which implements atomic create in libfuse for low
level api(). It is in same pull request, check it here
https://github.com/libfuse/libfuse/pull/673/commits/f86fe92bef7bb529ef1617e077d69a39eb26bc9f

What this FUSE_ATOMIC_CREATE(fuse_lib_atomic_create()) does in libfuse
is it make a single call into fuse_operations where file server would
be doing all 'lookup + create + open' in one call itself  whereas
FUSE_CREATE gets separated into two calls 'create + lookup' which
eventually are two rpcs to the user space file server for a single
operation.  Now you can see FUSE_CREATE  is not atomic from file
systems point of view(i.e  from user space file server perspective)
but from fuse kernel point of view only. We can make existing libfuse
code i.e fuse_lib_create() to call new atomic create but then the
interface becomes messy and it might start a trend to twist libfuse.
We want to keep libfuse APIs neat and clean. And do not break existing
systems. No flag from fuse kernel to libfuse to decide which code to
traverse.  Therefore a new call for all this work.

>
>
>
> > This FUSE_ATOMIC_CREATE does all these ops in a single call to FUSE
> > implementations.  We do not want to break any existing FUSE
> > implementations, therefore it is being introduced as a new feature. I
> > forgot to include links to libfuse patches as well. That would have
> > made it much clearer.  Here is the link to libfuse patch for this call
> > https://github.com/libfuse/libfuse/pull/673.
> >
> > >
> > > I see FOPEN_FILE_CREATED is being used to avoid calling
> > > fuse_dir_changed(). That sounds like a separate optimization and probably
> > > should be in a separate patch.
> >
> > FUSE_ATOMIC_CREATE needs to send back info about if  file was actually
> > created or not (This is suggestion from Miklos) to correctly convey if
> > the parent dir is really changing or not. I included this as part of
> > this patch itself instead of having it as a separate patch.
>
> This needs little more thought w.r.t shared filesystems.
>
> >
> > > IOW, I think this patch should be broken in to multiple pieces. First
> > > piece seems to be avoiding lookup() and given the way it is implemented,
> > > looks like we can avoid lookup() even by using existing FUSE_CREATE
> > > command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?
> >
> > Its not only about changing fuse kernel code but USER SPACE
> > implementations also. If we change the you are suggesting we would be
> > required to twist many things at libfuse and FUSE low level API. So to
> > keep things simple and not to break any existing implementations we
> > have kept it as new call (we now pass `struct stat` to USER SPACE FUSE
> > to filled in).
> >
> > > And once that is done, a separate patch should probably should explain
> > > the problem and say fuse_dir_changed() call can be avoided if we knew
> > > if file was actually created or it was already existing there. And that's
> > > when one need to introduce a new command. Given this is just an extension
> > > of existing FUSE_CREATE command and returns additiona info about
> > > FOPEN_FILE_CREATED, we probably should simply call it FUSE_CREATE_EXT
> > > and explain how this operation is different from FUSE_CREATE.
> >
> > As explained above, we are not doing this way as we have kept in mind
> > all existing libfuse APIs as well.
> >
>
> I am not seeing what will break in existing passthrough_ll.c if I
> simply used FUSE_CREATE for a file which already exists.
>
>         fd = openat(lo_fd(req, parent), name,
>                     (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
>
> This call should still succeed even if file already exists. (until and
> unless called it with O_EXCL and in that case failing is the correct
> behavior).

It's not passthrogh_ll but low level libfuse API. Passthrough_ll is
just another user of low level API. It has been used just to test
these patches. We do not use passthrough_ll at all for any other
purpose.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-06  5:34             ` Dharmendra Hans
@ 2022-05-06 14:12               ` Vivek Goyal
  2022-05-06 16:41                 ` Bernd Schubert
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-06 14:12 UTC (permalink / raw)
  To: Dharmendra Hans
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Bernd Schubert, Dharmendra Singh

On Fri, May 06, 2022 at 11:04:05AM +0530, Dharmendra Hans wrote:
> On Thu, May 5, 2022 at 7:56 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Thu, May 05, 2022 at 10:21:21AM +0530, Dharmendra Hans wrote:
> > > On Wed, May 4, 2022 at 8:17 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > On Wed, May 04, 2022 at 09:56:49AM +0530, Dharmendra Hans wrote:
> > > > > On Wed, May 4, 2022 at 1:24 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> > > > > > > From: Dharmendra Singh <dsingh@ddn.com>
> > > > > > >
> > > > > > > When we go for creating a file (O_CREAT), we trigger
> > > > > > > a lookup to FUSE USER SPACE. It is very  much likely
> > > > > > > that file does not exist yet as O_CREAT is passed to
> > > > > > > open(). This lookup can be avoided and can be performed
> > > > > > > as part of create call into libfuse.
> > > > > > >
> > > > > > > This lookup + create in single call to libfuse and finally
> > > > > > > to USER SPACE has been named as atomic create. It is expected
> > > > > > > that USER SPACE create the file, open it and fills in the
> > > > > > > attributes which are then used to make inode stand/revalidate
> > > > > > > in the kernel cache. Also if file was newly created(does not
> > > > > > > exist yet by this time) in USER SPACE then it should be indicated
> > > > > > > in `struct fuse_file_info` by setting a bit which is again used by
> > > > > > > libfuse to send some flags back to fuse kernel to indicate that
> > > > > > > that file was newly created. These flags are used by kernel to
> > > > > > > indicate changes in parent directory.
> > > > > >
> > > > > > Reading the existing code a little bit more and trying to understand
> > > > > > existing semantics. And that will help me unerstand what new is being
> > > > > > done.
> > > > > >
> > > > > > So current fuse_atomic_open() does following.
> > > > > >
> > > > > > A. Looks up dentry (if d_in_lookup() is set).
> > > > > > B. If dentry is positive or O_CREAT is not set, return.
> > > > > > C. If server supports atomic create + open, use that to create file and
> > > > > >    open it as well.
> > > > > > D. If server does not support atomic create + open, just create file
> > > > > >    using "mknod" and return. VFS will take care of opening the file.
> > > > > >
> > > > > > Now with this patch, new flow is.
> > > > > >
> > > > > > A. Look up dentry if d_in_lookup() is set as well as either file is not
> > > > > >    being created or fc->no_atomic_create is set. This basiclally means
> > > > > >    skip lookup if atomic_create is supported and file is being created.
> > > > > >
> > > > > > B. Remains same. if dentry is positive or O_CREATE is not set, return.
> > > > > >
> > > > > > C. If server supports new atomic_create(), use that.
> > > > > >
> > > > > > D. If not, if server supports atomic create + open, use that
> > > > > >
> > > > > > E. If not, fall back to mknod and do not open file.
> > > > > >
> > > > > > So to me this new functionality is basically atomic "lookup + create +
> > > > > > open"?
> > > > > >
> > > > > > Or may be not. I see we check "fc->no_create" and fallback to mknod.
> > > > > >
> > > > > >         if (fc->no_create)
> > > > > >                 goto mknod;
> > > > > >
> > > > > > So fc->no_create is representing both old atomic "create + open" as well
> > > > > > as new "lookup + create + open" ?
> > > > > >
> > > > > > It might be obvious to you, but it is not to me. So will be great if
> > > > > > you shed some light on this.
> > > > > >
> > > > >
> > > > > I think you got it right now. New atomic create does what you
> > > > > mentioned as new flow.  It does  lookup + create + open in single call
> > > > > (being called as atomic create) to USER SPACE.mknod is a special case
> > > >
> > > > Ok, naming is little confusing. I think we will have to put it in
> > > > commit message and where you define FUSE_ATOMIC_CREATE that what's
> > > > the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
> > > > ATOMIC w.r.t what?
> > >
> > > Sure, I would update the commit message to make the distinction clear
> > > between the two. This operation is atomic w.r.t to USER SPACE FUSE
> > > implementations. i.e USER SPACE would be performing all these
> > > operations in a single call to it.
> >
> > I think even FUSE_CREAT is doing same thing. Creating file, opening and
> > doing lookup and sending all the data. So that's not the difference
> > between the two, IMHO. And that's why I am getting confused with the
> > naming.
> >
> > From user space file server perspective, only extra operation seems
> > to be that it sends a flag in response telling the client whether
> > file was actually created or it already existed. So to me it just
> > sounds little extension of existing FUSE_CREATE command and that's
> > why I thought calling it FUSE_CREATE_EXT is probably better naming.
> 
> The difference between the two is of atomicity from top to bottom i.e
> single call from fuse kernel to user space file server. Generally
> FUSE_CREATE goes upto libfuse low level API as atomic (check
> fuse_lib_create()) and it gets separated there into two calls (create
> + getattr). This is not what we want as it results in two network
> trips each for create and lookup to the file server.

Ok, looks like your fuse file server is talking to a another file
server on network and that's why you are mentioning two network trips.

Let us differentiate between two things first. 

A. FUSE protocol semantics
B. Implementation of FUSE protocl by libfuse.

I think I am stressing on A and you are stressing on B. I just want
to see what's the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE
from fuse protocol point of view. Again look at from kernel's point of
view and don't worry about libfuse is going to implement it.
Implementations can vary.

From kernel's perspective FUSE_CREATE is supposed to create + open a 
file. It is possible file already exists. Look at include/fuse_lowlevel.h
description for create().

        /**
         * Create and open a file
         *
         * If the file does not exist, first create it with the specified
         * mode, and then open it.
         */

I notice that fuse is offering a high level API as well as low level
API. I primarily know about low level API. To me these are just two
different implementation but things don't change how kernel sends
fuse messages and what it expects from server in return.

Now with FUSE_ATOMIC_CREATE, from kernel's perspective, only difference
is that in reply message file server will also indicate if file was
actually created or not. Is that right?

And I am focussing on this FUSE API apsect. I am least concerned at
this point of time who libfuse decides to actually implement FUSE_CREATE
or FUSE_ATOMIC_CREATE etc. You might make a single call in libfuse 
server (instead of two) and that's performance optimization in libfuse.
Kernel does not care how many calls did you make in file server to
implement FUSE_CREATE or FUSE_ATOMIC_CREATE. All it cares is that
create and open the file.

So while you might do things in more atomic manner in file server and
cut down on network traffic, kernel fuse API does not care. All it cares
about is create + open a file.

Anyway, from kernel's perspective, I think you should be able to
just use FUSE_CREATE and still be do "lookup + create + open".
FUSE_ATOMIC_CREATE is just allows one additional optimization so
that you know whether to invalidate parent dir's attrs or not.

In fact kernel is not putting any atomicity requirements as well on
file server. And that's why I think this new command should probably
be called FUSE_CREATE_EXT because it just sends back additional
info.

All the atomicity stuff you have been describing is that you are
trying to do some optimizations in libfuse implementation to implement
FUSE_ATOMIC_CREATE so that you send less number of commands over
network. That's a good idea but fuse kernel API does not require you
do these atomically, AFAICS. 

Given I know little bit of fuse low level API, If I were to implement
this in virtiofs/passthrough_ll.c, I probably will do following.

A. Check if caller provided O_EXCL flag.
B. openat(O_CREAT | O_EXCL)
C. If success, we created the file. Set file_created = 1.

D. If error and error != -EEXIST, send error back to client.
E. If error and error == -EEXIST, if caller did provide O_EXCL flag,
   return error.
F. openat() returned -EEXIST and caller did not provide O_EXCL flag,
   that means file already exists.  Set file_created = 0.
G. Do lookup() etc to create internal lo_inode and stat() of file.
H. Send response back to client using fuse_reply_create(). 
   
This is one sample implementation for fuse lowlevel API. There could
be other ways to implement. But all that is libfuse + filesystem
specific and kernel does not care how many operations you use to
complete and what's the atomicity etc. Of course less number of
operations you do better it is.

Anyway, I think I have said enough on this topic. IMHO, FUSE_CREATE
descritpion (fuse_lowlevel.h) already mentions that "If the file does not
exist, first create it with the specified mode and then open it". That
means intent of protocol is that file could already be there as well.
So I think we probably should implement this optimization (in kernel)
using FUSE_CREATE command and then add FUSE_CREATE_EXT to add optimization
about knowing whether file was actually created or not.

W.r.t libfuse optimizations, I am not sure why can't you do optimizations
with FUSE_CREATE and why do you need FUSE_CREATE_EXT necessarily. If
are you worried that some existing filesystems will break, I think
you can create an internal helper say fuse_create_atomic() and then
use that if filesystem offers it. IOW, libfuse will have two
ways to implement FUSE_CREATE. And if filesystem offers a new way which
cuts down on network traffic, libfuse uses more efficient method. We
should not have to change kernel FUSE API just because libfuse can
do create + open operation more efficiently.

By introducing FUSE_ATOMIC_CREATE and hiding new implementation behind
it, I think we are mixing things and that's why it is getting confusing.

Thanks
Vivek


> 
> >
> > >
> > >
> > > > May be atomic here means that "lookup + create + open" is a single operation.
> > > > But then even FUSE_CREATE is atomic because "creat + open" is a single
> > > > operation.
> > > >
> > > > In fact FUSE_CREATE does lookup anyway and returns all the information
> > > > in fuse_entry_out.
> > > >
> > > > IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
> > > > later also carries information in reply whether file was actually created
> > > > or not (FOPEN_FILE_CREATED). This will be set if file did not exist
> > > > already and it was created indeed. Is that right?
> > >
> > > FUSE_CREATE is atomic but upto level of libfuse. Libfuse separates it
> > > into two calls, create and lookup separately into USER SPACE FUSE
> > > implementation.
> >
> > I am not sure what do you mean by "libfuse separates it into two calls,
> > create and lookup separately". I guess you are referring to lo_create()
> > in example/passthrough_ll.c which first creates and opens file and
> > then looks it up and replies.
> >
> >         fd = openat(lo_fd(req, parent), name,
> >                     (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
> >         err = lo_do_lookup(req, parent, name, &e);
> >         fuse_reply_create(req, &e, fi);
> >
> > I am looking at your proposal for atomic_create implementation here.
> >
> > https://github.com/libfuse/libfuse/pull/673/commits/88cd25b2857f2bb213d01afbcfd666787d1e6893#diff-a36385ec8fb753d6f4492a5f0d3c6a5750bd370b50df6ef0610efdcd3f8880ffR787
> >
> > It is doing exactly same thing as lo_create(), except one difference that
> > it is checking first if file exists. It essentially is doing this.
> >
> > A. newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
> > B. fd = openat(lo_fd(req, parent), name,
> >                     (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
> > C. err = lo_do_lookup(req, parent, name, &e);
> > D. fuse_reply_create(req, &e, fi);
> >
> > So what do you mean by libfuse makes it two calls.
> >
> > And I think above implementation is racy. What if filesystem is
> > shared and another client creates the file between calls to
> > A and B. You will think you created the file but some other
> > client created it.
> >
> > So if intent is to know whether we created the file or not, then
> > you should probably do openat() with O_EXCL flag. If that succeeds
> > you know you created the file. If it fails with -EEXIST, then you
> > know file is already there. That's what virtiofs does.
> >
> > Anyway, coming back to the point. IMHO, from server perspective,
> > there is no atomicity difference between FUSE_CREATE and
> > FUSE_ATOMIC_CREATE. Only difference seems to be to send addditional
> > information back to the client to tell it whether file was created
> > or not.
> >
> > In fact for shared filesystem this is probably a problem. What if
> > guest's cache is stale and it does not know about the file. A client
> > B creates the file and we think we did not create the file. And we
> > will return with FOPEN_FILE_CREATED = 0. And in that case client
> > will not call fuse_dir_changed(). But that seems wrong in case
> > of shared filesystems. I am concerned about virtiofs which can
> > be shared between different guests.
> >
> > Miklos, WDYT?
> >
> > May be it is not a huge concern. If one guest drops a file, other guest
> > will not invalidate its dir attrs till timeout happens. Case of shared
> > filesystem is very tricky with fuse. And sometimes it is not clear
> > to me what kind of coherency matters.
> >
> > So in this case say I am booted with cache=auto, if guest B drops a file
> > bar/foo.txt and guest A does open(bar/foo.txt, O_CREAT), then should
> > guest A invalidate the attrs of bar/ right away or it will be invalidated
> > anyway after a second.
> >
> > Anyway.., my core point is that difference between FUSE_CREATE and
> > FUSE_ATOMIC_CREATE is just one flag FOPEN_FILE_CREATED which tells
> > client whether file was actually created or not. And that is used
> > to determine whether to invalidate parent dir attributes or not. It
> > does not have anything extra in terms of ATOMICITY as far as I can
> > see and that's what confuses me.
> 
> You checked the wrong code. pasthrough_ll is not production code, we
> do not use it at all, just using it to test these patches here. There
> is a main patch which implements atomic create in libfuse for low
> level api(). It is in same pull request, check it here
> https://github.com/libfuse/libfuse/pull/673/commits/f86fe92bef7bb529ef1617e077d69a39eb26bc9f
> 
> What this FUSE_ATOMIC_CREATE(fuse_lib_atomic_create()) does in libfuse
> is it make a single call into fuse_operations where file server would
> be doing all 'lookup + create + open' in one call itself  whereas
> FUSE_CREATE gets separated into two calls 'create + lookup' which
> eventually are two rpcs to the user space file server for a single
> operation.  Now you can see FUSE_CREATE  is not atomic from file
> systems point of view(i.e  from user space file server perspective)
> but from fuse kernel point of view only. We can make existing libfuse
> code i.e fuse_lib_create() to call new atomic create but then the
> interface becomes messy and it might start a trend to twist libfuse.
> We want to keep libfuse APIs neat and clean. And do not break existing
> systems. No flag from fuse kernel to libfuse to decide which code to
> traverse.  Therefore a new call for all this work.
> 
> >
> >
> >
> > > This FUSE_ATOMIC_CREATE does all these ops in a single call to FUSE
> > > implementations.  We do not want to break any existing FUSE
> > > implementations, therefore it is being introduced as a new feature. I
> > > forgot to include links to libfuse patches as well. That would have
> > > made it much clearer.  Here is the link to libfuse patch for this call
> > > https://github.com/libfuse/libfuse/pull/673.
> > >
> > > >
> > > > I see FOPEN_FILE_CREATED is being used to avoid calling
> > > > fuse_dir_changed(). That sounds like a separate optimization and probably
> > > > should be in a separate patch.
> > >
> > > FUSE_ATOMIC_CREATE needs to send back info about if  file was actually
> > > created or not (This is suggestion from Miklos) to correctly convey if
> > > the parent dir is really changing or not. I included this as part of
> > > this patch itself instead of having it as a separate patch.
> >
> > This needs little more thought w.r.t shared filesystems.
> >
> > >
> > > > IOW, I think this patch should be broken in to multiple pieces. First
> > > > piece seems to be avoiding lookup() and given the way it is implemented,
> > > > looks like we can avoid lookup() even by using existing FUSE_CREATE
> > > > command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?
> > >
> > > Its not only about changing fuse kernel code but USER SPACE
> > > implementations also. If we change the you are suggesting we would be
> > > required to twist many things at libfuse and FUSE low level API. So to
> > > keep things simple and not to break any existing implementations we
> > > have kept it as new call (we now pass `struct stat` to USER SPACE FUSE
> > > to filled in).
> > >
> > > > And once that is done, a separate patch should probably should explain
> > > > the problem and say fuse_dir_changed() call can be avoided if we knew
> > > > if file was actually created or it was already existing there. And that's
> > > > when one need to introduce a new command. Given this is just an extension
> > > > of existing FUSE_CREATE command and returns additiona info about
> > > > FOPEN_FILE_CREATED, we probably should simply call it FUSE_CREATE_EXT
> > > > and explain how this operation is different from FUSE_CREATE.
> > >
> > > As explained above, we are not doing this way as we have kept in mind
> > > all existing libfuse APIs as well.
> > >
> >
> > I am not seeing what will break in existing passthrough_ll.c if I
> > simply used FUSE_CREATE for a file which already exists.
> >
> >         fd = openat(lo_fd(req, parent), name,
> >                     (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
> >
> > This call should still succeed even if file already exists. (until and
> > unless called it with O_EXCL and in that case failing is the correct
> > behavior).
> 
> It's not passthrogh_ll but low level libfuse API. Passthrough_ll is
> just another user of low level API. It has been used just to test
> these patches. We do not use passthrough_ll at all for any other
> purpose.
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-06 14:12               ` Vivek Goyal
@ 2022-05-06 16:41                 ` Bernd Schubert
  2022-05-06 17:07                   ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2022-05-06 16:41 UTC (permalink / raw)
  To: Vivek Goyal, Dharmendra Hans
  Cc: Miklos Szeredi, linux-fsdevel, fuse-devel, linux-kernel,
	Dharmendra Singh



On 5/6/22 16:12, Vivek Goyal wrote:

[...]

> On Fri, May 06, 2022 at 11:04:05AM +0530, Dharmendra Hans wrote:

> 
> Ok, looks like your fuse file server is talking to a another file
> server on network and that's why you are mentioning two network trips.
> 
> Let us differentiate between two things first.
> 
> A. FUSE protocol semantics
> B. Implementation of FUSE protocl by libfuse.
> 
> I think I am stressing on A and you are stressing on B. I just want
> to see what's the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE
> from fuse protocol point of view. Again look at from kernel's point of
> view and don't worry about libfuse is going to implement it.
> Implementations can vary.

Agreed, I don't think we need to bring in network for the kernel to 
libfuse API.

> 
>  From kernel's perspective FUSE_CREATE is supposed to create + open a
> file. It is possible file already exists. Look at include/fuse_lowlevel.h
> description for create().
> 
>          /**
>           * Create and open a file
>           *
>           * If the file does not exist, first create it with the specified
>           * mode, and then open it.
>           */
> 
> I notice that fuse is offering a high level API as well as low level
> API. I primarily know about low level API. To me these are just two
> different implementation but things don't change how kernel sends
> fuse messages and what it expects from server in return.
> 
> Now with FUSE_ATOMIC_CREATE, from kernel's perspective, only difference
> is that in reply message file server will also indicate if file was
> actually created or not. Is that right?
> 
> And I am focussing on this FUSE API apsect. I am least concerned at
> this point of time who libfuse decides to actually implement FUSE_CREATE
> or FUSE_ATOMIC_CREATE etc. You might make a single call in libfuse
> server (instead of two) and that's performance optimization in libfuse.
> Kernel does not care how many calls did you make in file server to
> implement FUSE_CREATE or FUSE_ATOMIC_CREATE. All it cares is that
> create and open the file.
> 
> So while you might do things in more atomic manner in file server and
> cut down on network traffic, kernel fuse API does not care. All it cares
> about is create + open a file.
> 
> Anyway, from kernel's perspective, I think you should be able to
> just use FUSE_CREATE and still be do "lookup + create + open".
> FUSE_ATOMIC_CREATE is just allows one additional optimization so
> that you know whether to invalidate parent dir's attrs or not.
> 
> In fact kernel is not putting any atomicity requirements as well on
> file server. And that's why I think this new command should probably
> be called FUSE_CREATE_EXT because it just sends back additional
> info.
> 
> All the atomicity stuff you have been describing is that you are
> trying to do some optimizations in libfuse implementation to implement
> FUSE_ATOMIC_CREATE so that you send less number of commands over
> network. That's a good idea but fuse kernel API does not require you
> do these atomically, AFAICS.
> 
> Given I know little bit of fuse low level API, If I were to implement
> this in virtiofs/passthrough_ll.c, I probably will do following.
> 
> A. Check if caller provided O_EXCL flag.
> B. openat(O_CREAT | O_EXCL)
> C. If success, we created the file. Set file_created = 1.
> 
> D. If error and error != -EEXIST, send error back to client.
> E. If error and error == -EEXIST, if caller did provide O_EXCL flag,
>     return error.
> F. openat() returned -EEXIST and caller did not provide O_EXCL flag,
>     that means file already exists.  Set file_created = 0.
> G. Do lookup() etc to create internal lo_inode and stat() of file.
> H. Send response back to client using fuse_reply_create().
>     
> This is one sample implementation for fuse lowlevel API. There could
> be other ways to implement. But all that is libfuse + filesystem
> specific and kernel does not care how many operations you use to
> complete and what's the atomicity etc. Of course less number of
> operations you do better it is.
> 
> Anyway, I think I have said enough on this topic. IMHO, FUSE_CREATE
> descritpion (fuse_lowlevel.h) already mentions that "If the file does not
> exist, first create it with the specified mode and then open it". That
> means intent of protocol is that file could already be there as well.
> So I think we probably should implement this optimization (in kernel)
> using FUSE_CREATE command and then add FUSE_CREATE_EXT to add optimization
> about knowing whether file was actually created or not.
> 
> W.r.t libfuse optimizations, I am not sure why can't you do optimizations
> with FUSE_CREATE and why do you need FUSE_CREATE_EXT necessarily. If
> are you worried that some existing filesystems will break, I think
> you can create an internal helper say fuse_create_atomic() and then
> use that if filesystem offers it. IOW, libfuse will have two
> ways to implement FUSE_CREATE. And if filesystem offers a new way which
> cuts down on network traffic, libfuse uses more efficient method. We
> should not have to change kernel FUSE API just because libfuse can
> do create + open operation more efficiently.

Ah right, I like this. As I had written before, the first patch version 
was using FUSE_CREATE and I was worried to break something. Yes, it 
should be possible split into lookup+create on the libfuse side. That 
being said, libfuse will need to know which version it is - there might 
be an old kernel sending the non-optimized version - libfuse should not 
do another lookup then. Now there is 'fi.flags = arg->flags', but these 
are already taken by open/fcntl flags - I would not feel comfortable to 
overload these. At best, struct fuse_create_in currently had a padding 
field, we could convert these to something like 'ext_fuse_open_flags' 
and then use it for fuse internal things. Difficulty here is that I 
don't know if all kernel implementations zero the struct (BSD, MacOS), 
so I guess we would need to negotiate at startup/init time and would 
need another main feature flag? And with that I'm not be sure anymore if 
the result would be actually more simple than what we have right now for 
the first patch.


Thanks,
Bernd


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-06 16:41                 ` Bernd Schubert
@ 2022-05-06 17:07                   ` Vivek Goyal
  2022-05-06 18:45                     ` Bernd Schubert
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-06 17:07 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Dharmendra Hans, Miklos Szeredi, linux-fsdevel, fuse-devel,
	linux-kernel, Dharmendra Singh

On Fri, May 06, 2022 at 06:41:17PM +0200, Bernd Schubert wrote:
> 
> 
> On 5/6/22 16:12, Vivek Goyal wrote:
> 
> [...]
> 
> > On Fri, May 06, 2022 at 11:04:05AM +0530, Dharmendra Hans wrote:
> 
> > 
> > Ok, looks like your fuse file server is talking to a another file
> > server on network and that's why you are mentioning two network trips.
> > 
> > Let us differentiate between two things first.
> > 
> > A. FUSE protocol semantics
> > B. Implementation of FUSE protocl by libfuse.
> > 
> > I think I am stressing on A and you are stressing on B. I just want
> > to see what's the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE
> > from fuse protocol point of view. Again look at from kernel's point of
> > view and don't worry about libfuse is going to implement it.
> > Implementations can vary.
> 
> Agreed, I don't think we need to bring in network for the kernel to libfuse
> API.
> 
> > 
> >  From kernel's perspective FUSE_CREATE is supposed to create + open a
> > file. It is possible file already exists. Look at include/fuse_lowlevel.h
> > description for create().
> > 
> >          /**
> >           * Create and open a file
> >           *
> >           * If the file does not exist, first create it with the specified
> >           * mode, and then open it.
> >           */
> > 
> > I notice that fuse is offering a high level API as well as low level
> > API. I primarily know about low level API. To me these are just two
> > different implementation but things don't change how kernel sends
> > fuse messages and what it expects from server in return.
> > 
> > Now with FUSE_ATOMIC_CREATE, from kernel's perspective, only difference
> > is that in reply message file server will also indicate if file was
> > actually created or not. Is that right?
> > 
> > And I am focussing on this FUSE API apsect. I am least concerned at
> > this point of time who libfuse decides to actually implement FUSE_CREATE
> > or FUSE_ATOMIC_CREATE etc. You might make a single call in libfuse
> > server (instead of two) and that's performance optimization in libfuse.
> > Kernel does not care how many calls did you make in file server to
> > implement FUSE_CREATE or FUSE_ATOMIC_CREATE. All it cares is that
> > create and open the file.
> > 
> > So while you might do things in more atomic manner in file server and
> > cut down on network traffic, kernel fuse API does not care. All it cares
> > about is create + open a file.
> > 
> > Anyway, from kernel's perspective, I think you should be able to
> > just use FUSE_CREATE and still be do "lookup + create + open".
> > FUSE_ATOMIC_CREATE is just allows one additional optimization so
> > that you know whether to invalidate parent dir's attrs or not.
> > 
> > In fact kernel is not putting any atomicity requirements as well on
> > file server. And that's why I think this new command should probably
> > be called FUSE_CREATE_EXT because it just sends back additional
> > info.
> > 
> > All the atomicity stuff you have been describing is that you are
> > trying to do some optimizations in libfuse implementation to implement
> > FUSE_ATOMIC_CREATE so that you send less number of commands over
> > network. That's a good idea but fuse kernel API does not require you
> > do these atomically, AFAICS.
> > 
> > Given I know little bit of fuse low level API, If I were to implement
> > this in virtiofs/passthrough_ll.c, I probably will do following.
> > 
> > A. Check if caller provided O_EXCL flag.
> > B. openat(O_CREAT | O_EXCL)
> > C. If success, we created the file. Set file_created = 1.
> > 
> > D. If error and error != -EEXIST, send error back to client.
> > E. If error and error == -EEXIST, if caller did provide O_EXCL flag,
> >     return error.
> > F. openat() returned -EEXIST and caller did not provide O_EXCL flag,
> >     that means file already exists.  Set file_created = 0.
> > G. Do lookup() etc to create internal lo_inode and stat() of file.
> > H. Send response back to client using fuse_reply_create().
> > This is one sample implementation for fuse lowlevel API. There could
> > be other ways to implement. But all that is libfuse + filesystem
> > specific and kernel does not care how many operations you use to
> > complete and what's the atomicity etc. Of course less number of
> > operations you do better it is.
> > 
> > Anyway, I think I have said enough on this topic. IMHO, FUSE_CREATE
> > descritpion (fuse_lowlevel.h) already mentions that "If the file does not
> > exist, first create it with the specified mode and then open it". That
> > means intent of protocol is that file could already be there as well.
> > So I think we probably should implement this optimization (in kernel)
> > using FUSE_CREATE command and then add FUSE_CREATE_EXT to add optimization
> > about knowing whether file was actually created or not.
> > 
> > W.r.t libfuse optimizations, I am not sure why can't you do optimizations
> > with FUSE_CREATE and why do you need FUSE_CREATE_EXT necessarily. If
> > are you worried that some existing filesystems will break, I think
> > you can create an internal helper say fuse_create_atomic() and then
> > use that if filesystem offers it. IOW, libfuse will have two
> > ways to implement FUSE_CREATE. And if filesystem offers a new way which
> > cuts down on network traffic, libfuse uses more efficient method. We
> > should not have to change kernel FUSE API just because libfuse can
> > do create + open operation more efficiently.
> 
> Ah right, I like this. As I had written before, the first patch version was
> using FUSE_CREATE and I was worried to break something. Yes, it should be
> possible split into lookup+create on the libfuse side. That being said,
> libfuse will need to know which version it is - there might be an old kernel
> sending the non-optimized version - libfuse should not do another lookup
> then.

I am confused about one thing. For FUSE_CREATE command, how does it
matter whether kernel has done lookup() before sending FUSE_CREATE. All
FUSE_CREATE seems to say that crate a file (if it does not exist already)
and then open it and return file handle as well as inode attributes. It
does not say anything about whether a LOOKUP has already been done 
by kernel or not.

It looks like you are assuming that if FUSE_CREATE is coming, that
means client has already done FUSE_LOOKUP. So there is something we
are not on same page about.

I looked at fuse_lowlevel API and passthrough_ll.c and there is no
assumption whether FUSE_LOOKUP has already been called by client
before calling FUSE_CREATE. Similarly, I looked at virtiofs code
and I can't see any such assumption there as well.

https://github.com/qemu/qemu/blob/master/tools/virtiofsd/passthrough_ll.c

So I am sort of lost. May be you can help me understsand this.

> Now there is 'fi.flags = arg->flags', but these are already taken by
> open/fcntl flags - I would not feel comfortable to overload these. At best,
> struct fuse_create_in currently had a padding field, we could convert these
> to something like 'ext_fuse_open_flags' and then use it for fuse internal
> things. Difficulty here is that I don't know if all kernel implementations
> zero the struct (BSD, MacOS), so I guess we would need to negotiate at
> startup/init time and would need another main feature flag? And with that
> I'm not be sure anymore if the result would be actually more simple than
> what we have right now for the first patch.

If FUSE_CREATE indeed has a dependency on FUSE_LOOKUP have been called
before that, then I agree that we can't implement new semantics with
FUSE_CREATE and we will have to introduce a new op say
FUSE_ATOMIC_CREATE/FUSE_LOOKUP_CREATE/FUSE_CREATE_EXT.

But looking at fuse API, I don't see FUSE_CREATE ever guaranteeing that
a FUSE_LOOKUP has been done before this.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-06 17:07                   ` Vivek Goyal
@ 2022-05-06 18:45                     ` Bernd Schubert
  2022-05-07 10:42                       ` Jean-Pierre André
  0 siblings, 1 reply; 37+ messages in thread
From: Bernd Schubert @ 2022-05-06 18:45 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Dharmendra Hans, Miklos Szeredi, linux-fsdevel, fuse-devel,
	linux-kernel, Dharmendra Singh



On 5/6/22 19:07, Vivek Goyal wrote:
> On Fri, May 06, 2022 at 06:41:17PM +0200, Bernd Schubert wrote:
>>
>>
>> On 5/6/22 16:12, Vivek Goyal wrote:
>>
>> [...]
>>
>>> On Fri, May 06, 2022 at 11:04:05AM +0530, Dharmendra Hans wrote:
>>
>>>
>>> Ok, looks like your fuse file server is talking to a another file
>>> server on network and that's why you are mentioning two network trips.
>>>
>>> Let us differentiate between two things first.
>>>
>>> A. FUSE protocol semantics
>>> B. Implementation of FUSE protocl by libfuse.
>>>
>>> I think I am stressing on A and you are stressing on B. I just want
>>> to see what's the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE
>>> from fuse protocol point of view. Again look at from kernel's point of
>>> view and don't worry about libfuse is going to implement it.
>>> Implementations can vary.
>>
>> Agreed, I don't think we need to bring in network for the kernel to libfuse
>> API.
>>
>>>
>>>   From kernel's perspective FUSE_CREATE is supposed to create + open a
>>> file. It is possible file already exists. Look at include/fuse_lowlevel.h
>>> description for create().
>>>
>>>           /**
>>>            * Create and open a file
>>>            *
>>>            * If the file does not exist, first create it with the specified
>>>            * mode, and then open it.
>>>            */
>>>
>>> I notice that fuse is offering a high level API as well as low level
>>> API. I primarily know about low level API. To me these are just two
>>> different implementation but things don't change how kernel sends
>>> fuse messages and what it expects from server in return.
>>>
>>> Now with FUSE_ATOMIC_CREATE, from kernel's perspective, only difference
>>> is that in reply message file server will also indicate if file was
>>> actually created or not. Is that right?
>>>
>>> And I am focussing on this FUSE API apsect. I am least concerned at
>>> this point of time who libfuse decides to actually implement FUSE_CREATE
>>> or FUSE_ATOMIC_CREATE etc. You might make a single call in libfuse
>>> server (instead of two) and that's performance optimization in libfuse.
>>> Kernel does not care how many calls did you make in file server to
>>> implement FUSE_CREATE or FUSE_ATOMIC_CREATE. All it cares is that
>>> create and open the file.
>>>
>>> So while you might do things in more atomic manner in file server and
>>> cut down on network traffic, kernel fuse API does not care. All it cares
>>> about is create + open a file.
>>>
>>> Anyway, from kernel's perspective, I think you should be able to
>>> just use FUSE_CREATE and still be do "lookup + create + open".
>>> FUSE_ATOMIC_CREATE is just allows one additional optimization so
>>> that you know whether to invalidate parent dir's attrs or not.
>>>
>>> In fact kernel is not putting any atomicity requirements as well on
>>> file server. And that's why I think this new command should probably
>>> be called FUSE_CREATE_EXT because it just sends back additional
>>> info.
>>>
>>> All the atomicity stuff you have been describing is that you are
>>> trying to do some optimizations in libfuse implementation to implement
>>> FUSE_ATOMIC_CREATE so that you send less number of commands over
>>> network. That's a good idea but fuse kernel API does not require you
>>> do these atomically, AFAICS.
>>>
>>> Given I know little bit of fuse low level API, If I were to implement
>>> this in virtiofs/passthrough_ll.c, I probably will do following.
>>>
>>> A. Check if caller provided O_EXCL flag.
>>> B. openat(O_CREAT | O_EXCL)
>>> C. If success, we created the file. Set file_created = 1.
>>>
>>> D. If error and error != -EEXIST, send error back to client.
>>> E. If error and error == -EEXIST, if caller did provide O_EXCL flag,
>>>      return error.
>>> F. openat() returned -EEXIST and caller did not provide O_EXCL flag,
>>>      that means file already exists.  Set file_created = 0.
>>> G. Do lookup() etc to create internal lo_inode and stat() of file.
>>> H. Send response back to client using fuse_reply_create().
>>> This is one sample implementation for fuse lowlevel API. There could
>>> be other ways to implement. But all that is libfuse + filesystem
>>> specific and kernel does not care how many operations you use to
>>> complete and what's the atomicity etc. Of course less number of
>>> operations you do better it is.
>>>
>>> Anyway, I think I have said enough on this topic. IMHO, FUSE_CREATE
>>> descritpion (fuse_lowlevel.h) already mentions that "If the file does not
>>> exist, first create it with the specified mode and then open it". That
>>> means intent of protocol is that file could already be there as well.
>>> So I think we probably should implement this optimization (in kernel)
>>> using FUSE_CREATE command and then add FUSE_CREATE_EXT to add optimization
>>> about knowing whether file was actually created or not.
>>>
>>> W.r.t libfuse optimizations, I am not sure why can't you do optimizations
>>> with FUSE_CREATE and why do you need FUSE_CREATE_EXT necessarily. If
>>> are you worried that some existing filesystems will break, I think
>>> you can create an internal helper say fuse_create_atomic() and then
>>> use that if filesystem offers it. IOW, libfuse will have two
>>> ways to implement FUSE_CREATE. And if filesystem offers a new way which
>>> cuts down on network traffic, libfuse uses more efficient method. We
>>> should not have to change kernel FUSE API just because libfuse can
>>> do create + open operation more efficiently.
>>
>> Ah right, I like this. As I had written before, the first patch version was
>> using FUSE_CREATE and I was worried to break something. Yes, it should be
>> possible split into lookup+create on the libfuse side. That being said,
>> libfuse will need to know which version it is - there might be an old kernel
>> sending the non-optimized version - libfuse should not do another lookup
>> then.
> 
> I am confused about one thing. For FUSE_CREATE command, how does it
> matter whether kernel has done lookup() before sending FUSE_CREATE. All
> FUSE_CREATE seems to say that crate a file (if it does not exist already)
> and then open it and return file handle as well as inode attributes. It
> does not say anything about whether a LOOKUP has already been done
> by kernel or not.
> 
> It looks like you are assuming that if FUSE_CREATE is coming, that
> means client has already done FUSE_LOOKUP. So there is something we
> are not on same page about.
> 
> I looked at fuse_lowlevel API and passthrough_ll.c and there is no
> assumption whether FUSE_LOOKUP has already been called by client
> before calling FUSE_CREATE. Similarly, I looked at virtiofs code
> and I can't see any such assumption there as well.

The current linux kernel code does this right now, skipping the lookup 
just changes behavior.  Personally I would see it as bug if the 
userspace implementation does not handle EEXIST for FUSE_CREATE. 
Implementation developer and especially users might see it differently 
if a kernel update breaks/changes things out of the sudden. 
passthrough_ll.c is not the issue here, it handles it correctly, but 
what about the XYZ other file systems out there - do you want to check 
them one by one, including closed source ones? And wouldn't even a 
single broken application count as regression?

> 
> https://github.com/qemu/qemu/blob/master/tools/virtiofsd/passthrough_ll.c
> 
> So I am sort of lost. May be you can help me understsand this.

I guess it would be more interesting to look at different file systems 
that are not overlay based. Like ntfs-3g - I have not looked at the code 
yet, but especially disk based file system didn't have a reason so far 
to handle EEXIST. And

> 
>> Now there is 'fi.flags = arg->flags', but these are already taken by
>> open/fcntl flags - I would not feel comfortable to overload these. At best,
>> struct fuse_create_in currently had a padding field, we could convert these
>> to something like 'ext_fuse_open_flags' and then use it for fuse internal
>> things. Difficulty here is that I don't know if all kernel implementations
>> zero the struct (BSD, MacOS), so I guess we would need to negotiate at
>> startup/init time and would need another main feature flag? And with that
>> I'm not be sure anymore if the result would be actually more simple than
>> what we have right now for the first patch.
> 
> If FUSE_CREATE indeed has a dependency on FUSE_LOOKUP have been called
> before that, then I agree that we can't implement new semantics with
> FUSE_CREATE and we will have to introduce a new op say
> FUSE_ATOMIC_CREATE/FUSE_LOOKUP_CREATE/FUSE_CREATE_EXT.
> 
> But looking at fuse API, I don't see FUSE_CREATE ever guaranteeing that
> a FUSE_LOOKUP has been done before this.

It is not in written document, but in the existing (linux) kernel behavior.


You can look up "regressions due to 64-bit ext4 directory cookies" - 
patches from me had been once already the reason for accidental breakage 
and back that time I did not even have the slightest chance to predict 
it, as glusterfs was relying on max 32bit telldir results and using the 
other 32bit for its own purposes. Kernel behavior changed and 
application broke, even though the application was relying on non-posix 
behavior. In case of fuse there is no document like posix, but just API 
headers and current behavior...

Thanks,
Bernd





^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-06 18:45                     ` Bernd Schubert
@ 2022-05-07 10:42                       ` Jean-Pierre André
  2022-05-11 10:08                         ` Bernd Schubert
  0 siblings, 1 reply; 37+ messages in thread
From: Jean-Pierre André @ 2022-05-07 10:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, fuse-devel, linux-fsdevel, linux-kernel

Bernd Schubert wrote on 5/6/22 8:45 PM:
> 
> 
> On 5/6/22 19:07, Vivek Goyal wrote:
>> On Fri, May 06, 2022 at 06:41:17PM +0200, Bernd Schubert wrote:
>>>
>>>
>>> On 5/6/22 16:12, Vivek Goyal wrote:
>>>
>>> [...]
>>>
>>>> On Fri, May 06, 2022 at 11:04:05AM +0530, Dharmendra Hans wrote:
>>>
>>>>
>>>> Ok, looks like your fuse file server is talking to a another file
>>>> server on network and that's why you are mentioning two network trips.
>>>>
>>>> Let us differentiate between two things first.
>>>>
>>>> A. FUSE protocol semantics
>>>> B. Implementation of FUSE protocl by libfuse.
>>>>
>>>> I think I am stressing on A and you are stressing on B. I just want
>>>> to see what's the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE
>>>> from fuse protocol point of view. Again look at from kernel's point of
>>>> view and don't worry about libfuse is going to implement it.
>>>> Implementations can vary.
>>>
>>> Agreed, I don't think we need to bring in network for the kernel to 
>>> libfuse
>>> API.
>>>
>>>>
>>>>   From kernel's perspective FUSE_CREATE is supposed to create + open a
>>>> file. It is possible file already exists. Look at 
>>>> include/fuse_lowlevel.h
>>>> description for create().
>>>>
>>>>           /**
>>>>            * Create and open a file
>>>>            *
>>>>            * If the file does not exist, first create it with the 
>>>> specified
>>>>            * mode, and then open it.
>>>>            */
>>>>
>>>> I notice that fuse is offering a high level API as well as low level
>>>> API. I primarily know about low level API. To me these are just two
>>>> different implementation but things don't change how kernel sends
>>>> fuse messages and what it expects from server in return.
>>>>
>>>> Now with FUSE_ATOMIC_CREATE, from kernel's perspective, only difference
>>>> is that in reply message file server will also indicate if file was
>>>> actually created or not. Is that right?
>>>>
>>>> And I am focussing on this FUSE API apsect. I am least concerned at
>>>> this point of time who libfuse decides to actually implement 
>>>> FUSE_CREATE
>>>> or FUSE_ATOMIC_CREATE etc. You might make a single call in libfuse
>>>> server (instead of two) and that's performance optimization in libfuse.
>>>> Kernel does not care how many calls did you make in file server to
>>>> implement FUSE_CREATE or FUSE_ATOMIC_CREATE. All it cares is that
>>>> create and open the file.
>>>>
>>>> So while you might do things in more atomic manner in file server and
>>>> cut down on network traffic, kernel fuse API does not care. All it 
>>>> cares
>>>> about is create + open a file.
>>>>
>>>> Anyway, from kernel's perspective, I think you should be able to
>>>> just use FUSE_CREATE and still be do "lookup + create + open".
>>>> FUSE_ATOMIC_CREATE is just allows one additional optimization so
>>>> that you know whether to invalidate parent dir's attrs or not.
>>>>
>>>> In fact kernel is not putting any atomicity requirements as well on
>>>> file server. And that's why I think this new command should probably
>>>> be called FUSE_CREATE_EXT because it just sends back additional
>>>> info.
>>>>
>>>> All the atomicity stuff you have been describing is that you are
>>>> trying to do some optimizations in libfuse implementation to implement
>>>> FUSE_ATOMIC_CREATE so that you send less number of commands over
>>>> network. That's a good idea but fuse kernel API does not require you
>>>> do these atomically, AFAICS.
>>>>
>>>> Given I know little bit of fuse low level API, If I were to implement
>>>> this in virtiofs/passthrough_ll.c, I probably will do following.
>>>>
>>>> A. Check if caller provided O_EXCL flag.
>>>> B. openat(O_CREAT | O_EXCL)
>>>> C. If success, we created the file. Set file_created = 1.
>>>>
>>>> D. If error and error != -EEXIST, send error back to client.
>>>> E. If error and error == -EEXIST, if caller did provide O_EXCL flag,
>>>>      return error.
>>>> F. openat() returned -EEXIST and caller did not provide O_EXCL flag,
>>>>      that means file already exists.  Set file_created = 0.
>>>> G. Do lookup() etc to create internal lo_inode and stat() of file.
>>>> H. Send response back to client using fuse_reply_create().
>>>> This is one sample implementation for fuse lowlevel API. There could
>>>> be other ways to implement. But all that is libfuse + filesystem
>>>> specific and kernel does not care how many operations you use to
>>>> complete and what's the atomicity etc. Of course less number of
>>>> operations you do better it is.
>>>>
>>>> Anyway, I think I have said enough on this topic. IMHO, FUSE_CREATE
>>>> descritpion (fuse_lowlevel.h) already mentions that "If the file 
>>>> does not
>>>> exist, first create it with the specified mode and then open it". That
>>>> means intent of protocol is that file could already be there as well.
>>>> So I think we probably should implement this optimization (in kernel)
>>>> using FUSE_CREATE command and then add FUSE_CREATE_EXT to add 
>>>> optimization
>>>> about knowing whether file was actually created or not.
>>>>
>>>> W.r.t libfuse optimizations, I am not sure why can't you do 
>>>> optimizations
>>>> with FUSE_CREATE and why do you need FUSE_CREATE_EXT necessarily. If
>>>> are you worried that some existing filesystems will break, I think
>>>> you can create an internal helper say fuse_create_atomic() and then
>>>> use that if filesystem offers it. IOW, libfuse will have two
>>>> ways to implement FUSE_CREATE. And if filesystem offers a new way which
>>>> cuts down on network traffic, libfuse uses more efficient method. We
>>>> should not have to change kernel FUSE API just because libfuse can
>>>> do create + open operation more efficiently.
>>>
>>> Ah right, I like this. As I had written before, the first patch 
>>> version was
>>> using FUSE_CREATE and I was worried to break something. Yes, it 
>>> should be
>>> possible split into lookup+create on the libfuse side. That being said,
>>> libfuse will need to know which version it is - there might be an old 
>>> kernel
>>> sending the non-optimized version - libfuse should not do another lookup
>>> then.
>>
>> I am confused about one thing. For FUSE_CREATE command, how does it
>> matter whether kernel has done lookup() before sending FUSE_CREATE. All
>> FUSE_CREATE seems to say that crate a file (if it does not exist already)
>> and then open it and return file handle as well as inode attributes. It
>> does not say anything about whether a LOOKUP has already been done
>> by kernel or not.
>>
>> It looks like you are assuming that if FUSE_CREATE is coming, that
>> means client has already done FUSE_LOOKUP. So there is something we
>> are not on same page about.
>>
>> I looked at fuse_lowlevel API and passthrough_ll.c and there is no
>> assumption whether FUSE_LOOKUP has already been called by client
>> before calling FUSE_CREATE. Similarly, I looked at virtiofs code
>> and I can't see any such assumption there as well.
> 
> The current linux kernel code does this right now, skipping the lookup 
> just changes behavior.  Personally I would see it as bug if the 
> userspace implementation does not handle EEXIST for FUSE_CREATE. 
> Implementation developer and especially users might see it differently 
> if a kernel update breaks/changes things out of the sudden. 
> passthrough_ll.c is not the issue here, it handles it correctly, but 
> what about the XYZ other file systems out there - do you want to check 
> them one by one, including closed source ones? And wouldn't even a 
> single broken application count as regression?
> 
>>
>> https://github.com/qemu/qemu/blob/master/tools/virtiofsd/passthrough_ll.c
>>
>> So I am sort of lost. May be you can help me understsand this.
> 
> I guess it would be more interesting to look at different file systems 
> that are not overlay based. Like ntfs-3g - I have not looked at the code 
> yet, but especially disk based file system didn't have a reason so far 
> to handle EEXIST. And

AFAIK ntfs-3g proper does not keep a context across calls and does
not know what LOOKUP was preparing a CREATE. However this may have
consequences in the high level of libfuse for managing the file tree.

The kernel apparently issues a LOOKUP to decide whether issuing a
CREATE (create+open) or an OPEN. If it sent blindly a CREATE,
ntfs-3g would return EEXIST if the name was already present in
the directory.

For a test, can you suggest a way to force ignore of such lookup
within libfuse, without applying your kernel patches ? Is there a
way to detect the purpose of a lookup ? (A possible way is to
hardcode a directory inode within which the lookups return ENOENT).

> 
>>
>>> Now there is 'fi.flags = arg->flags', but these are already taken by
>>> open/fcntl flags - I would not feel comfortable to overload these. At 
>>> best,
>>> struct fuse_create_in currently had a padding field, we could convert 
>>> these
>>> to something like 'ext_fuse_open_flags' and then use it for fuse 
>>> internal
>>> things. Difficulty here is that I don't know if all kernel 
>>> implementations
>>> zero the struct (BSD, MacOS), so I guess we would need to negotiate at
>>> startup/init time and would need another main feature flag? And with 
>>> that
>>> I'm not be sure anymore if the result would be actually more simple than
>>> what we have right now for the first patch.
>>
>> If FUSE_CREATE indeed has a dependency on FUSE_LOOKUP have been called
>> before that, then I agree that we can't implement new semantics with
>> FUSE_CREATE and we will have to introduce a new op say
>> FUSE_ATOMIC_CREATE/FUSE_LOOKUP_CREATE/FUSE_CREATE_EXT.
>>
>> But looking at fuse API, I don't see FUSE_CREATE ever guaranteeing that
>> a FUSE_LOOKUP has been done before this.
> 
> It is not in written document, but in the existing (linux) kernel behavior.
> 
> 
> You can look up "regressions due to 64-bit ext4 directory cookies" - 
> patches from me had been once already the reason for accidental breakage 
> and back that time I did not even have the slightest chance to predict 
> it, as glusterfs was relying on max 32bit telldir results and using the 
> other 32bit for its own purposes. Kernel behavior changed and 
> application broke, even though the application was relying on non-posix 
> behavior. In case of fuse there is no document like posix, but just API 
> headers and current behavior...
> 
> Thanks,
> Bernd
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-05 19:59         ` Vivek Goyal
@ 2022-05-11  9:40           ` Miklos Szeredi
  2022-05-11  9:59             ` Bernd Schubert
  2022-05-11 17:21             ` Vivek Goyal
  0 siblings, 2 replies; 37+ messages in thread
From: Miklos Szeredi @ 2022-05-11  9:40 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Bernd Schubert, Dharmendra Hans, linux-fsdevel, fuse-devel, linux-kernel

On Thu, 5 May 2022 at 21:59, Vivek Goyal <vgoyal@redhat.com> wrote:

> Oh, I have no issues with the intent. I will like to see cut in network
> traffic too (if we can do this without introducing problems). My primary
> interest is that this kind of change should benefit virtiofs as well.

One issue with that appears to be checking permissions.   AFAIU this
patchset only enables the optimization if default_permissions is
turned off (i.e. all permission checking is done by the server).  But
virtiofs uses the default_permissions model.

I'm not quite sure about this limitation, guessing that it's related
to the fact that the permissions may be stale at the time of checking?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-11  9:40           ` Miklos Szeredi
@ 2022-05-11  9:59             ` Bernd Schubert
  2022-05-11 17:21             ` Vivek Goyal
  1 sibling, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2022-05-11  9:59 UTC (permalink / raw)
  To: Miklos Szeredi, Vivek Goyal
  Cc: Dharmendra Hans, linux-fsdevel, fuse-devel, linux-kernel



On 5/11/22 11:40, Miklos Szeredi wrote:
> On Thu, 5 May 2022 at 21:59, Vivek Goyal <vgoyal@redhat.com> wrote:
> 
>> Oh, I have no issues with the intent. I will like to see cut in network
>> traffic too (if we can do this without introducing problems). My primary
>> interest is that this kind of change should benefit virtiofs as well.
> 
> One issue with that appears to be checking permissions.   AFAIU this
> patchset only enables the optimization if default_permissions is
> turned off (i.e. all permission checking is done by the server).  But
> virtiofs uses the default_permissions model.
> 
> I'm not quite sure about this limitation, guessing that it's related
> to the fact that the permissions may be stale at the time of checking?

Yes exactly, I had actually pointed this out in one of the mails.

<quote myself from 2022-04-05 >
Adding in a vfs ->open_revalidate might have the advantage that we could 
also support 'default_permissions' - ->open_revalidate  needs to 
additionally check the retrieved file permissions and and needs to call 
into generic_permissions for that. Right now that is not easily 
feasible, without adding some code dup to convert flags in MAY_* flags - 
a vfs change would be needed here to pass the right flags.
</quote>


Avoiding lookup for create should work without default_permissions, though.


With the current patches this comment should be added in 
fuse_dentry_revalidate() to clarify things (somehow it got lost in the 
(internal) review process).

/* Default permissions actually also would not need revalidate, if 
atomic_open() could verify attributes after opening the file. But the 
MAY_ flags are missing and the vfs build_open_flags() to recreate these 
flags not exported. Thus, default_permissions() cannot be called from 
here - vfs updates would be required */


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
  2022-05-07 10:42                       ` Jean-Pierre André
@ 2022-05-11 10:08                         ` Bernd Schubert
  0 siblings, 0 replies; 37+ messages in thread
From: Bernd Schubert @ 2022-05-11 10:08 UTC (permalink / raw)
  To: Jean-Pierre André, linux-fsdevel, Vivek Goyal, Miklos Szeredi
  Cc: fuse-devel, linux-kernel



On 5/7/22 12:42, Jean-Pierre André wrote:
> Bernd Schubert wrote on 5/6/22 8:45 PM:
>>
>>
>> On 5/6/22 19:07, Vivek Goyal wrote:
>>> I looked at fuse_lowlevel API and passthrough_ll.c and there is no
>>> assumption whether FUSE_LOOKUP has already been called by client
>>> before calling FUSE_CREATE. Similarly, I looked at virtiofs code
>>> and I can't see any such assumption there as well.
>>
>> The current linux kernel code does this right now, skipping the lookup 
>> just changes behavior.  Personally I would see it as bug if the 
>> userspace implementation does not handle EEXIST for FUSE_CREATE. 
>> Implementation developer and especially users might see it differently 
>> if a kernel update breaks/changes things out of the sudden. 
>> passthrough_ll.c is not the issue here, it handles it correctly, but 
>> what about the XYZ other file systems out there - do you want to check 
>> them one by one, including closed source ones? And wouldn't even a 
>> single broken application count as regression?
>>
>>>
>>> https://github.com/qemu/qemu/blob/master/tools/virtiofsd/passthrough_ll.c 
>>>
>>>
>>> So I am sort of lost. May be you can help me understsand this.
>>
>> I guess it would be more interesting to look at different file systems 
>> that are not overlay based. Like ntfs-3g - I have not looked at the 
>> code yet, but especially disk based file system didn't have a reason 
>> so far to handle EEXIST. And
> 
> AFAIK ntfs-3g proper does not keep a context across calls and does
> not know what LOOKUP was preparing a CREATE. However this may have
> consequences in the high level of libfuse for managing the file tree.

I don't think high level is effected. I'm really just scared to break

> 
> The kernel apparently issues a LOOKUP to decide whether issuing a
> CREATE (create+open) or an OPEN. If it sent blindly a CREATE,
> ntfs-3g would return EEXIST if the name was already present in
> the directory.

Ok, so ntfs-3g is ok - which leaves N-1 file systems to check...

> 
> For a test, can you suggest a way to force ignore of such lookup
> within libfuse, without applying your kernel patches ? Is there a
> way to detect the purpose of a lookup ? (A possible way is to
> hardcode a directory inode within which the lookups return ENOENT).


What I mean is that we need to check the code or test N file systems - 
if ntfs-3g handles it, N-1 are left...

I we all agree on that there is no issue in skipping the lookup - fine 
with me - it slightly simplifies the patches.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-11  9:40           ` Miklos Szeredi
  2022-05-11  9:59             ` Bernd Schubert
@ 2022-05-11 17:21             ` Vivek Goyal
  2022-05-11 19:30               ` Vivek Goyal
  1 sibling, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-11 17:21 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Dharmendra Hans, linux-fsdevel, fuse-devel, linux-kernel

On Wed, May 11, 2022 at 11:40:59AM +0200, Miklos Szeredi wrote:
> On Thu, 5 May 2022 at 21:59, Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > Oh, I have no issues with the intent. I will like to see cut in network
> > traffic too (if we can do this without introducing problems). My primary
> > interest is that this kind of change should benefit virtiofs as well.
> 
> One issue with that appears to be checking permissions.   AFAIU this
> patchset only enables the optimization if default_permissions is
> turned off (i.e. all permission checking is done by the server).  But
> virtiofs uses the default_permissions model.

IIUC, only 3rd patch mentions that default_permission should be turned
off. IOW, first patch where lookup + create + open is a single operation
and second patch which does "lookup + open" in a single operation does
not seem to require that default_permissions are not in effect.

So if first two patches work fine, I think virtiofs should benefit too.
(IMHO, 3rd patch is too hacky anyway)

W.r.t permission checks, looks like may_open() will finally be called
after ->atomic_open(). So even if we open the file, we should still be
able to check whether we have permissions to open the file or not
after the fact.

fs/namei.c

path_openat()
{
	open_last_lookups()  <--- This calls ->atomic_open()
	do_open()  <--- This calls may_open()
}

Thanks
Vivek

> 
> I'm not quite sure about this limitation, guessing that it's related
> to the fact that the permissions may be stale at the time of checking?
> 
> Thanks,
> Miklos
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-11 17:21             ` Vivek Goyal
@ 2022-05-11 19:30               ` Vivek Goyal
  2022-05-12  8:16                 ` Dharmendra Hans
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2022-05-11 19:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Bernd Schubert, Dharmendra Hans, linux-fsdevel, fuse-devel, linux-kernel

On Wed, May 11, 2022 at 01:21:13PM -0400, Vivek Goyal wrote:
> On Wed, May 11, 2022 at 11:40:59AM +0200, Miklos Szeredi wrote:
> > On Thu, 5 May 2022 at 21:59, Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > Oh, I have no issues with the intent. I will like to see cut in network
> > > traffic too (if we can do this without introducing problems). My primary
> > > interest is that this kind of change should benefit virtiofs as well.
> > 
> > One issue with that appears to be checking permissions.   AFAIU this
> > patchset only enables the optimization if default_permissions is
> > turned off (i.e. all permission checking is done by the server).  But
> > virtiofs uses the default_permissions model.
> 
> IIUC, only 3rd patch mentions that default_permission should be turned
> off. IOW, first patch where lookup + create + open is a single operation
> and second patch which does "lookup + open" in a single operation does
> not seem to require that default_permissions are not in effect.
> 
> So if first two patches work fine, I think virtiofs should benefit too.
> (IMHO, 3rd patch is too hacky anyway)
> 
> W.r.t permission checks, looks like may_open() will finally be called
> after ->atomic_open(). So even if we open the file, we should still be
> able to check whether we have permissions to open the file or not
> after the fact.
> 
> fs/namei.c
> 
> path_openat()
> {
> 	open_last_lookups()  <--- This calls ->atomic_open()
> 	do_open()  <--- This calls may_open()
> }

Actually I am not sure about it. I was playing with 

open(foo.txt, O_CREAT | O_RDWR, O_IRUSR)

This succeeds if file is newly created but if file already existed, this
fails with -EACCESS

So man 2 open says following. Thanks to Andy Price for pointing me to it.

    Note that mode applies only to future accesses of the newly cre‐
    ated  file;  the  open()  call that creates a read-only file may
    well return a read/write file descriptor.


Now I am wondering how will it look like with first patch. Assume file
already exists on the server (But there is no negative dentry present)
and I do following. And assume file only has read permission for user
and I am trying to open it read-write.

open(foo.txt, O_CREAT | O_RDWR, O_IRUSR)

In normal circumstances, user will expect -EACCESS as file is read-only
and user is trying to open it read-write.

I am wondering how will it look like with this first patch.

Current fuse ->atomic_open() looks up the dentry and does not open
the file if dentry is positive.

New implementation will skip lookup and open the file anyway and
set file->f_mode |= FMODE_CREATED; (First patch in series)

So first of all this seems wrong. I thought FMODE_CREATED should be
set only if file was newly created. Is that a correct understanding.

And I am looking at do_open() code. It does bunch of things based
on FMODE_CREATED flag. One of the things it does is reset acc_mode =0

        if (file->f_mode & FMODE_CREATED) {
                /* Don't check for write permission, don't truncate */
                open_flag &= ~O_TRUNC;
                acc_mode = 0;
	}
	error = may_open(mnt_userns, &nd->path, acc_mode, open_flag);

I suspect this is the code which allows opening a newly created read-only
file as O_RDWR. (Though I am not 100% sure).

I suspect with first patch this will be broken. We will set FMODE_CREATED
even if file already existed and VFS will assume a new file has been
created and do bunch of things which is wrong.

So looks like fuse ->atomic_open() should set FMODE_CREATED only if
it really created the file.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-11 19:30               ` Vivek Goyal
@ 2022-05-12  8:16                 ` Dharmendra Hans
  2022-05-12 15:24                   ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: Dharmendra Hans @ 2022-05-12  8:16 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Bernd Schubert, linux-fsdevel, fuse-devel, linux-kernel

On Thu, May 12, 2022 at 1:00 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, May 11, 2022 at 01:21:13PM -0400, Vivek Goyal wrote:
> > On Wed, May 11, 2022 at 11:40:59AM +0200, Miklos Szeredi wrote:
> > > On Thu, 5 May 2022 at 21:59, Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > > Oh, I have no issues with the intent. I will like to see cut in network
> > > > traffic too (if we can do this without introducing problems). My primary
> > > > interest is that this kind of change should benefit virtiofs as well.
> > >
> > > One issue with that appears to be checking permissions.   AFAIU this
> > > patchset only enables the optimization if default_permissions is
> > > turned off (i.e. all permission checking is done by the server).  But
> > > virtiofs uses the default_permissions model.
> >
> > IIUC, only 3rd patch mentions that default_permission should be turned
> > off. IOW, first patch where lookup + create + open is a single operation
> > and second patch which does "lookup + open" in a single operation does
> > not seem to require that default_permissions are not in effect.
> >
> > So if first two patches work fine, I think virtiofs should benefit too.
> > (IMHO, 3rd patch is too hacky anyway)
> >
> > W.r.t permission checks, looks like may_open() will finally be called
> > after ->atomic_open(). So even if we open the file, we should still be
> > able to check whether we have permissions to open the file or not
> > after the fact.
> >
> > fs/namei.c
> >
> > path_openat()
> > {
> >       open_last_lookups()  <--- This calls ->atomic_open()
> >       do_open()  <--- This calls may_open()
> > }
>
> Actually I am not sure about it. I was playing with
>
> open(foo.txt, O_CREAT | O_RDWR, O_IRUSR)
>
> This succeeds if file is newly created but if file already existed, this
> fails with -EACCESS
>
> So man 2 open says following. Thanks to Andy Price for pointing me to it.
>
>     Note that mode applies only to future accesses of the newly cre‐
>     ated  file;  the  open()  call that creates a read-only file may
>     well return a read/write file descriptor.
>
>
> Now I am wondering how will it look like with first patch. Assume file
> already exists on the server (But there is no negative dentry present)
> and I do following. And assume file only has read permission for user
> and I am trying to open it read-write.
>
> open(foo.txt, O_CREAT | O_RDWR, O_IRUSR)
>
> In normal circumstances, user will expect -EACCESS as file is read-only
> and user is trying to open it read-write.
>
> I am wondering how will it look like with this first patch.
>
> Current fuse ->atomic_open() looks up the dentry and does not open
> the file if dentry is positive.
>
> New implementation will skip lookup and open the file anyway and
> set file->f_mode |= FMODE_CREATED; (First patch in series)
>
> So first of all this seems wrong. I thought FMODE_CREATED should be
> set only if file was newly created. Is that a correct understanding.

You are right. we should mark in f_mode only if the file was actually created.
Thanks for catching this. Appreciate your efforts:)

>
> And I am looking at do_open() code. It does bunch of things based
> on FMODE_CREATED flag. One of the things it does is reset acc_mode =0
>
>         if (file->f_mode & FMODE_CREATED) {
>                 /* Don't check for write permission, don't truncate */
>                 open_flag &= ~O_TRUNC;
>                 acc_mode = 0;
>         }
>         error = may_open(mnt_userns, &nd->path, acc_mode, open_flag);
>
> I suspect this is the code which allows opening a newly created read-only
> file as O_RDWR. (Though I am not 100% sure).

Correct. I could see it.

>
> I suspect with first patch this will be broken. We will set FMODE_CREATED
> even if file already existed and VFS will assume a new file has been
> created and do bunch of things which is wrong.
>
> So looks like fuse ->atomic_open() should set FMODE_CREATED only if
> it really created the file.

This looks like an obvious bug with new patches. But If I do not miss
anything then its a bug on distributed file systems with current code
as well.
It could happen this way(Taking your example which is perfect to trace
this on distributed systems):
Lets start with File is non-existent yet on the file system. There
comes the first client which does a lookup on the file, it does not
find the file. Meanwhile another client created the file on the File
system. Now  when this first client goes to create the file, before
going down it sets FMODE_CREATED on the file handle and then goes down
the lower layers. It comes back with inode but f->mode as
FMODE_CREATED which is incorrect. This mode then results in acc_mode
being set to zero which then allows access to the file as O_RDWR.

I think Miklos proposed to return the flag from user space if the file
was actually created, this would solve two problems 1) this access
problem and code execution going the wrong path 2) correct update if
parent dir changed or not.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create
  2022-05-12  8:16                 ` Dharmendra Hans
@ 2022-05-12 15:24                   ` Vivek Goyal
  0 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2022-05-12 15:24 UTC (permalink / raw)
  To: Dharmendra Hans
  Cc: Miklos Szeredi, Bernd Schubert, linux-fsdevel, fuse-devel, linux-kernel

On Thu, May 12, 2022 at 01:46:12PM +0530, Dharmendra Hans wrote:
> On Thu, May 12, 2022 at 1:00 AM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, May 11, 2022 at 01:21:13PM -0400, Vivek Goyal wrote:
> > > On Wed, May 11, 2022 at 11:40:59AM +0200, Miklos Szeredi wrote:
> > > > On Thu, 5 May 2022 at 21:59, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > >
> > > > > Oh, I have no issues with the intent. I will like to see cut in network
> > > > > traffic too (if we can do this without introducing problems). My primary
> > > > > interest is that this kind of change should benefit virtiofs as well.
> > > >
> > > > One issue with that appears to be checking permissions.   AFAIU this
> > > > patchset only enables the optimization if default_permissions is
> > > > turned off (i.e. all permission checking is done by the server).  But
> > > > virtiofs uses the default_permissions model.
> > >
> > > IIUC, only 3rd patch mentions that default_permission should be turned
> > > off. IOW, first patch where lookup + create + open is a single operation
> > > and second patch which does "lookup + open" in a single operation does
> > > not seem to require that default_permissions are not in effect.
> > >
> > > So if first two patches work fine, I think virtiofs should benefit too.
> > > (IMHO, 3rd patch is too hacky anyway)
> > >
> > > W.r.t permission checks, looks like may_open() will finally be called
> > > after ->atomic_open(). So even if we open the file, we should still be
> > > able to check whether we have permissions to open the file or not
> > > after the fact.
> > >
> > > fs/namei.c
> > >
> > > path_openat()
> > > {
> > >       open_last_lookups()  <--- This calls ->atomic_open()
> > >       do_open()  <--- This calls may_open()
> > > }
> >
> > Actually I am not sure about it. I was playing with
> >
> > open(foo.txt, O_CREAT | O_RDWR, O_IRUSR)
> >
> > This succeeds if file is newly created but if file already existed, this
> > fails with -EACCESS
> >
> > So man 2 open says following. Thanks to Andy Price for pointing me to it.
> >
> >     Note that mode applies only to future accesses of the newly cre‐
> >     ated  file;  the  open()  call that creates a read-only file may
> >     well return a read/write file descriptor.
> >
> >
> > Now I am wondering how will it look like with first patch. Assume file
> > already exists on the server (But there is no negative dentry present)
> > and I do following. And assume file only has read permission for user
> > and I am trying to open it read-write.
> >
> > open(foo.txt, O_CREAT | O_RDWR, O_IRUSR)
> >
> > In normal circumstances, user will expect -EACCESS as file is read-only
> > and user is trying to open it read-write.
> >
> > I am wondering how will it look like with this first patch.
> >
> > Current fuse ->atomic_open() looks up the dentry and does not open
> > the file if dentry is positive.
> >
> > New implementation will skip lookup and open the file anyway and
> > set file->f_mode |= FMODE_CREATED; (First patch in series)
> >
> > So first of all this seems wrong. I thought FMODE_CREATED should be
> > set only if file was newly created. Is that a correct understanding.
> 
> You are right. we should mark in f_mode only if the file was actually created.
> Thanks for catching this. Appreciate your efforts:)
> 
> >
> > And I am looking at do_open() code. It does bunch of things based
> > on FMODE_CREATED flag. One of the things it does is reset acc_mode =0
> >
> >         if (file->f_mode & FMODE_CREATED) {
> >                 /* Don't check for write permission, don't truncate */
> >                 open_flag &= ~O_TRUNC;
> >                 acc_mode = 0;
> >         }
> >         error = may_open(mnt_userns, &nd->path, acc_mode, open_flag);
> >
> > I suspect this is the code which allows opening a newly created read-only
> > file as O_RDWR. (Though I am not 100% sure).
> 
> Correct. I could see it.
> 
> >
> > I suspect with first patch this will be broken. We will set FMODE_CREATED
> > even if file already existed and VFS will assume a new file has been
> > created and do bunch of things which is wrong.
> >
> > So looks like fuse ->atomic_open() should set FMODE_CREATED only if
> > it really created the file.
> 
> This looks like an obvious bug with new patches. But If I do not miss
> anything then its a bug on distributed file systems with current code
> as well.
> It could happen this way(Taking your example which is perfect to trace
> this on distributed systems):
> Lets start with File is non-existent yet on the file system. There
> comes the first client which does a lookup on the file, it does not
> find the file. Meanwhile another client created the file on the File
> system. Now  when this first client goes to create the file, before
> going down it sets FMODE_CREATED on the file handle and then goes down
> the lower layers. It comes back with inode but f->mode as
> FMODE_CREATED which is incorrect. This mode then results in acc_mode
> being set to zero which then allows access to the file as O_RDWR.

I think with current code (FUSE_CREATE), it is a race with shared
filesystems where multiple clients might be sharing this directory.

We are looking up dentry first and issue FUSE_CREATE only if dentry
is negative. Now it is possible that between lookup() + FUSE_CREATE,
another client dropped in same file in the directory. But one could
argue, that's something not detectable by this client. So from user's
perspective, this client created the file.

If we have a negative dentry, then we will not do the lookup and
assumption is that we created file (even if another client created
it between previous lookup and this creation).

So agreed, this is a race but might not be very harmful one because
looks like we are providing weaker coherency with FUSE as opposed
to local filesystems. In fact this case fuse server running on a
directory which is shared by multiple clients bothers me a lot. There
seem to be many corner cases where it is not clear what will happen
if another client is doing operation.

> 
> I think Miklos proposed to return the flag from user space if the file
> was actually created, this would solve two problems 1) this access
> problem and code execution going the wrong path 2) correct update if
> parent dir changed or not.

Agreed that we could use new command FUSE_ATOMIC_CREATE/FUSE_CREATE_EXT
to figure out if file was newly created or not and then set FMODE_CREATED
accordingly.

W.r.t dir change, I am assuming we are invalidating dir attributes just
because if we created file, it could change dir's mtime/ctime. So yes,
FUSE_ATOMIC_CREATE/FUSE_CREATE_EXT could return this info whether file
was actually crated or not and invalidate dir's attribute accordingly.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2022-05-12 15:25 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-02 10:25 [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Dharmendra Singh
2022-05-02 10:25 ` [PATCH v4 1/3] FUSE: Implement atomic lookup + create Dharmendra Singh
2022-05-03 12:43   ` Vivek Goyal
2022-05-03 14:13   ` Vivek Goyal
2022-05-03 19:53   ` Vivek Goyal
2022-05-03 20:48     ` Bernd Schubert
2022-05-04  4:26     ` Dharmendra Hans
2022-05-04 14:47       ` Vivek Goyal
2022-05-04 15:46         ` Bernd Schubert
2022-05-04 17:31           ` Vivek Goyal
2022-05-05  4:51         ` Dharmendra Hans
2022-05-05 14:26           ` Vivek Goyal
2022-05-06  5:34             ` Dharmendra Hans
2022-05-06 14:12               ` Vivek Goyal
2022-05-06 16:41                 ` Bernd Schubert
2022-05-06 17:07                   ` Vivek Goyal
2022-05-06 18:45                     ` Bernd Schubert
2022-05-07 10:42                       ` Jean-Pierre André
2022-05-11 10:08                         ` Bernd Schubert
2022-05-02 10:25 ` [PATCH v4 2/3] FUSE: Implement atomic lookup + open Dharmendra Singh
2022-05-04 18:20   ` Vivek Goyal
2022-05-05  6:39     ` Dharmendra Hans
2022-05-02 10:25 ` [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate() Dharmendra Singh
2022-05-04 20:39   ` Vivek Goyal
2022-05-04 21:05     ` Bernd Schubert
2022-05-05  5:49     ` Dharmendra Hans
2022-05-04 19:18 ` [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Vivek Goyal
2022-05-05  6:12   ` Dharmendra Hans
2022-05-05 12:54     ` Vivek Goyal
2022-05-05 15:13       ` Bernd Schubert
2022-05-05 19:59         ` Vivek Goyal
2022-05-11  9:40           ` Miklos Szeredi
2022-05-11  9:59             ` Bernd Schubert
2022-05-11 17:21             ` Vivek Goyal
2022-05-11 19:30               ` Vivek Goyal
2022-05-12  8:16                 ` Dharmendra Hans
2022-05-12 15:24                   ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).