All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
@ 2022-03-25 13:21 Jiachen Zhang
  2022-04-25 12:41 ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Jiachen Zhang @ 2022-03-25 13:21 UTC (permalink / raw)
  To: miklos, linux-fsdevel; +Cc: linux-kernel, xieyongji, Jiachen Zhang

Hi all,

This RFC patch implements attr cache and data cache revalidation for
fuse writeback_cache mode in kernel. Looking forward to any suggestions
or comments on this feature.

BACKGROUND

FUSE writeback_cache mode significantly improves write performance. However,
the control between performance and consistency among clients is somehow too
coarse-grained. That is, if a file is re-opened without FOPEN_KEEP_CACHE,
all cached pages of the file will be invalidated in fuse_finish_open().
While getting a good consistency among clients by re-opening files, the
unconditional page cache invalidation on file opening may lead to
performance issues.

This patch tries to implement a more relaxed cache invalidation strategy
that behaves like NFS's close-to-open (CTO) consistency [1]. In the NFSv4
CTO consistency implementation, every file in the server is associated with
a counter called change attribute (change_attr) [1]. The change_attr counter
is monotonically increased every time the file is modified. Every client
should also keep the latest change_attr it gets from the server. If a
client finds the change_attr of a file has been changed by another client,
the attr and data cache of that file will be marked as invalid and will be
evicted from kernel later. The CTO consistency also requires flushing all
dirty pages to server on file closing, which is already implemented by FUSE
in fuse_flush().

DESIGN & IMPLEMENTATION

This patch adds change_attr fileds to fuse_entry_out and fuse_open_out, and
fuse_write_out. The change_attr in fuse_entry_out is to initialize the
fi->change_attr. The change_attrs in fuse_open_out and fuse_write_out are
used by cache revalidation and fi->change_attr updating.

If the change_attr kept in kernel fuse_inode is different from the
change_attr returned from server, attr cache and page cache will be
invalidated. As for the implementation, this patch uses fuse_invalidate_attr()
and a newly added fi->wb_attr_cache_valid flag for attr invalidation.
There is also a new FUSE inode state (FUSE_I_CACHE_INVALID) for marking the
page cache of a file as invalid. The page cache marked as invalid will be
invalidated later in a fuse_finish_open() call.

The reason we need 2 change_attrs (pre_change_attr and post_change_attr)
in fuse_write_out is the FUSE_WRITE request itself will increase the server
change_attr, by returning the change_attr before and after server
change_attr increment, kernel can avoid false cache invalidation
caused by self-writings. This is so-called weak-cache-consistency (WCC)
in the NFS world [1].

To avoid false cache invalidation caused by out-of-order FUSE request
replying. This patch also adds change_attrs lists for delayed cache
revalidation. Change_attrs returned from server will be added to the
lists. Later, a single thread workqueue worker will sort the change_attrs
in the revalidation list and revalidate cache based on the sorted
change_attrs. In the implementation of this patch, we use 3 change_attr
lists to avoid contention between the revalidation workqueue thread and
threads that add change_atts to the list. The cache revalidation can
only be delayed in FUSE_WRITE cases, but no later than the file is closed
in fuse_flush. FUSE_OPEN cases will perform cache revalidation immediately.

To enable cache revalidation in FUSE writeback_cache mode, this patch adds
2 new FUSE_INIT flags, FUSE_WB_CTO and FUSE_WB_CTO_WCC. If we only set
FUSE_WB_CTO, kernel will do open revalidation. If FUSE_WB_CTO_WCC is set
with FUSE_WB_CTO, kernel will also perform WCC revalidation with pre and
post change_attrs returned within fuse_write_out.

[1] https://linux.die.net/man/5/nfs

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>

Thanks,
Jiachen
---
 fs/fuse/dir.c             |   9 +-
 fs/fuse/file.c            | 253 +++++++++++++++++++++++++++++++++++++-
 fs/fuse/fuse_i.h          |  67 +++++++++-
 fs/fuse/inode.c           |  43 +++++--
 fs/fuse/readdir.c         |   2 +-
 include/uapi/linux/fuse.h |  10 +-
 6 files changed, 363 insertions(+), 21 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 656e921f3506..914624579fe6 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -396,7 +396,7 @@ int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name
 
 	*inode = fuse_iget(sb, outarg->nodeid, outarg->generation,
 			   &outarg->attr, entry_attr_timeout(outarg),
-			   attr_version);
+			   attr_version, outarg->change_attr);
 	err = -ENOMEM;
 	if (!*inode) {
 		fuse_queue_forget(fm->fc, forget, outarg->nodeid, 1);
@@ -604,7 +604,8 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
 	ff->nodeid = outentry.nodeid;
 	ff->open_flags = outopen.open_flags;
 	inode = fuse_iget(dir->i_sb, outentry.nodeid, outentry.generation,
-			  &outentry.attr, entry_attr_timeout(&outentry), 0);
+			  &outentry.attr, entry_attr_timeout(&outentry), 0,
+			  outentry.change_attr);
 	if (!inode) {
 		flags &= ~(O_CREAT | O_EXCL | O_TRUNC);
 		fuse_sync_release(NULL, ff, flags);
@@ -736,7 +737,8 @@ static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
 		goto out_put_forget_req;
 
 	inode = fuse_iget(dir->i_sb, outarg.nodeid, outarg.generation,
-			  &outarg.attr, entry_attr_timeout(&outarg), 0);
+			  &outarg.attr, entry_attr_timeout(&outarg), 0,
+			  outarg.change_attr);
 	if (!inode) {
 		fuse_queue_forget(fm->fc, forget, outarg.nodeid, 1);
 		return -ENOMEM;
@@ -859,6 +861,7 @@ static void fuse_entry_unlinked(struct dentry *entry)
 
 	spin_lock(&fi->lock);
 	fi->attr_version = atomic64_inc_return(&fc->attr_version);
+	fi->wb_attr_cache_valid = true;
 	/*
 	 * If i_nlink == 0 then unlink doesn't make sense, yet this can
 	 * happen if userspace filesystem is careless.  It would be
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 0fc150c1c50b..c7c8b84052bb 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -18,6 +18,7 @@
 #include <linux/falloc.h>
 #include <linux/uio.h>
 #include <linux/fs.h>
+#include <linux/list_sort.h>
 
 static int fuse_send_open(struct fuse_mount *fm, u64 nodeid,
 			  unsigned int open_flags, int opcode,
@@ -124,12 +125,70 @@ static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
 	}
 }
 
+static void fuse_revalidation_queue_work(struct fuse_inode *fi, bool drain)
+{
+	struct fuse_conn *fc = get_fuse_conn(&fi->inode);
+
+	if (!fc->reval_wq)
+		return;
+
+	if (drain) {
+		queue_work(fc->reval_wq, &fi->reval_work_drain);
+		flush_work(&fi->reval_work_drain);
+	} else {
+		queue_work(fc->reval_wq, &fi->reval_work);
+	}
+}
+
+static void fuse_add_change_attr(struct fuse_inode *fi,
+			       struct list_head *entry, int from, int to,
+			       bool dispatch, bool queue_work, bool drain)
+{
+	struct timespec64 reval_interval = { 1, 0 };
+
+	spin_lock(&fi->reval_lock);
+	if (entry) {
+		list_add_tail(entry, &fi->wcc_attrs[from]);
+		fi->wcc_attrs_ctr[from]++;
+	}
+
+	if (dispatch) {
+		list_splice_tail_init(&fi->wcc_attrs[from], &fi->wcc_attrs[to]);
+		fi->wcc_attrs_ctr[to] += fi->wcc_attrs_ctr[from];
+		fi->wcc_attrs_ctr[from] = 0;
+	}
+
+	if (queue_work) {
+		fi->next_reval_time = get_jiffies_64()
+				      + timespec64_to_jiffies(&reval_interval);
+		spin_unlock(&fi->reval_lock);
+		fuse_revalidation_queue_work(fi, drain);
+	} else {
+		spin_unlock(&fi->reval_lock);
+	}
+}
+
+static void fuse_open_revalidate(struct inode *inode, u64 change_attr)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	struct fuse_wcc_attr *fwa = kzalloc(sizeof(*fwa), GFP_NOFS);
+
+	/* Fake up a pre_change_attr */
+	fwa->pre_change_attr = change_attr;
+	fwa->post_change_attr = change_attr;
+
+	fuse_add_change_attr(fi, &fwa->entry, FUSE_WCC_LIST_RECEIVING,
+			   FUSE_WCC_LIST_PENDING, true, true, true);
+}
+
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
-				 unsigned int open_flags, bool isdir)
+				 struct file *file, bool isdir)
 {
 	struct fuse_conn *fc = fm->fc;
 	struct fuse_file *ff;
 	int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
+	unsigned int open_flags = file->f_flags;
+	struct inode *inode = file_inode(file);
 
 	ff = fuse_file_alloc(fm);
 	if (!ff)
@@ -147,6 +206,8 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 			ff->fh = outarg.fh;
 			ff->open_flags = outarg.open_flags;
 
+			if (fc->wb_cto)
+				fuse_open_revalidate(inode, outarg.change_attr);
 		} else if (err != -ENOSYS) {
 			fuse_file_free(ff);
 			return ERR_PTR(err);
@@ -169,7 +230,7 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 int fuse_do_open(struct fuse_mount *fm, u64 nodeid, struct file *file,
 		 bool isdir)
 {
-	struct fuse_file *ff = fuse_file_open(fm, nodeid, file->f_flags, isdir);
+	struct fuse_file *ff = fuse_file_open(fm, nodeid, file, isdir);
 
 	if (!IS_ERR(ff))
 		file->private_data = ff;
@@ -197,6 +258,7 @@ void fuse_finish_open(struct inode *inode, struct file *file)
 {
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
 
 	if (ff->open_flags & FOPEN_STREAM)
 		stream_open(inode, file);
@@ -204,16 +266,16 @@ void fuse_finish_open(struct inode *inode, struct file *file)
 		nonseekable_open(inode, file);
 
 	if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) {
-		struct fuse_inode *fi = get_fuse_inode(inode);
-
 		spin_lock(&fi->lock);
 		fi->attr_version = atomic64_inc_return(&fc->attr_version);
+		fi->wb_attr_cache_valid = true;
 		i_size_write(inode, 0);
 		spin_unlock(&fi->lock);
 		truncate_pagecache(inode, 0);
 		file_update_time(file);
 		fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
-	} else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) {
+	} else if (test_and_clear_bit(FUSE_I_CACHE_INVALID, &fi->state) ||
+		   (!fc->wb_cto && !(ff->open_flags & FOPEN_KEEP_CACHE))) {
 		invalidate_inode_pages2(inode->i_mapping);
 	}
 
@@ -464,10 +526,139 @@ static void fuse_sync_writes(struct inode *inode)
 	fuse_release_nowrite(inode);
 }
 
+void fuse_update_change_attr(struct inode *inode, u64 change_attr)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	fi->change_attr = change_attr;
+}
+
+static void fuse_update_change_attr_wcc(struct inode *inode,
+					 u64 pre_change_attr,
+					 u64 post_change_attr)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	if (fi->change_attr == pre_change_attr)
+		fuse_update_change_attr(inode, post_change_attr);
+}
+
+static int compare_change_attr(void *priv, const struct list_head *a,
+			       const struct list_head *b)
+{
+	struct fuse_wcc_attr *fwa1;
+	struct fuse_wcc_attr *fwa2;
+
+	fwa1 = container_of(a, struct fuse_wcc_attr, entry);
+	fwa2 = container_of(b, struct fuse_wcc_attr, entry);
+	if (fwa1->pre_change_attr < fwa2->pre_change_attr)
+		return -1;
+	else if (fwa1->pre_change_attr > fwa2->pre_change_attr)
+		return 1;
+	else
+		return 0;
+}
+
+static void fuse_revalidate_cache(struct inode *inode, u64 change_attr)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	/* Mark attr and data cache as invalid if change_attr is changed. */
+	if (fi->change_attr != change_attr) {
+		fuse_invalidate_attr(inode);
+		fi->wb_attr_cache_valid = false;
+		set_bit(FUSE_I_CACHE_INVALID, &fi->state);
+	}
+}
+
+static void fuse_do_revalidation(struct inode *inode, bool drain)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	struct fuse_wcc_attr *fwa1, *fwa2, *fwa_last;
+	int n_reval, remain;
+	struct list_head *reval_list;
+
+	fuse_add_change_attr(fi, NULL, FUSE_WCC_LIST_PENDING,
+			   FUSE_WCC_LIST_REVALIDATING, true, false, false);
+
+	reval_list = &fi->wcc_attrs[FUSE_WCC_LIST_REVALIDATING];
+
+	/*
+	 * The reason we keep wcc attrs in a list is to sort them here,
+	 * we can reduce false negative data cache invalidations this
+	 * way.
+	 */
+	list_sort(NULL, reval_list, compare_change_attr);
+
+	if (drain)
+		n_reval = fi->wcc_attrs_ctr[FUSE_WCC_LIST_REVALIDATING];
+	else
+		n_reval = fi->wcc_attrs_ctr[FUSE_WCC_LIST_REVALIDATING] / 2;
+
+	if (n_reval == 0)
+		return;
+
+	remain = n_reval;
+	list_for_each_entry(fwa2, reval_list, entry) {
+		if (remain == n_reval)
+			fwa1 = NULL;
+		else
+			fwa1 = list_prev_entry(fwa2, entry);
+
+		if (--remain == 0) {
+			if (fwa1)
+				fuse_update_change_attr(inode, fwa1->post_change_attr);
+			goto revalidate;
+		}
+
+		if (fwa1 && (fwa1->post_change_attr != fwa2->pre_change_attr)) {
+			set_bit(FUSE_I_CACHE_INVALID, &fi->state);
+			fwa_last = list_last_entry(reval_list,
+					struct fuse_wcc_attr, entry);
+			fuse_update_change_attr(inode, fwa_last->post_change_attr);
+			n_reval = fi->wcc_attrs_ctr[FUSE_WCC_LIST_REVALIDATING];
+			goto out;
+		}
+	}
+
+revalidate:
+	fuse_update_change_attr_wcc(inode, fwa2->pre_change_attr,
+				     fwa2->post_change_attr);
+	fuse_revalidate_cache(inode, fwa2->post_change_attr);
+	fuse_update_change_attr(inode, fwa2->post_change_attr);
+out:
+	remain = n_reval;
+
+	list_for_each_entry_safe(fwa1, fwa2, reval_list, entry) {
+		if (remain-- == 0)
+			break;
+		list_del(&fwa1->entry);
+		kfree(fwa1);
+	}
+	fi->wcc_attrs_ctr[FUSE_WCC_LIST_REVALIDATING] -= n_reval;
+}
+
+static void fuse_revalidation_work(struct work_struct *work)
+{
+	struct fuse_inode *fi = container_of(work, struct fuse_inode,
+					     reval_work);
+
+	fuse_do_revalidation(&fi->inode, false);
+}
+
+static void fuse_revalidation_work_drain(struct work_struct *work)
+{
+	struct fuse_inode *fi = container_of(work, struct fuse_inode,
+					     reval_work_drain);
+
+	fuse_do_revalidation(&fi->inode, true);
+}
+
 static int fuse_flush(struct file *file, fl_owner_t id)
 {
 	struct inode *inode = file_inode(file);
 	struct fuse_mount *fm = get_fuse_mount(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
 	struct fuse_file *ff = file->private_data;
 	struct fuse_flush_in inarg;
 	FUSE_ARGS(args);
@@ -487,6 +678,10 @@ static int fuse_flush(struct file *file, fl_owner_t id)
 	fuse_sync_writes(inode);
 	inode_unlock(inode);
 
+	if (fm->fc->wb_cto)
+		fuse_add_change_attr(fi, NULL, FUSE_WCC_LIST_RECEIVING,
+				   FUSE_WCC_LIST_PENDING, true, true, true);
+
 	err = filemap_check_errors(file->f_mapping);
 	if (err)
 		return err;
@@ -680,6 +875,7 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos)
 
 			spin_lock(&fi->lock);
 			fi->attr_version = atomic64_inc_return(&fc->attr_version);
+			fi->wb_attr_cache_valid = true;
 			spin_unlock(&fi->lock);
 		}
 
@@ -792,6 +988,7 @@ static void fuse_read_update_size(struct inode *inode, loff_t size,
 	if (attr_ver >= fi->attr_version && size < inode->i_size &&
 	    !test_bit(FUSE_I_SIZE_UNSTABLE, &fi->state)) {
 		fi->attr_version = atomic64_inc_return(&fc->attr_version);
+		fi->wb_attr_cache_valid = true;
 		i_size_write(inode, size);
 	}
 	spin_unlock(&fi->lock);
@@ -1076,6 +1273,7 @@ bool fuse_write_update_attr(struct inode *inode, loff_t pos, ssize_t written)
 
 	spin_lock(&fi->lock);
 	fi->attr_version = atomic64_inc_return(&fc->attr_version);
+	fi->wb_attr_cache_valid = true;
 	if (written > 0 && pos > inode->i_size) {
 		i_size_write(inode, pos);
 		ret = true;
@@ -1748,6 +1946,22 @@ static struct fuse_writepage_args *fuse_insert_writeback(struct rb_root *root,
 	return NULL;
 }
 
+static void fuse_wcc_revalidate(struct inode *inode,
+				      u64 pre_change_attr,
+				      u64 post_change_attr)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	struct fuse_wcc_attr *fwa = kzalloc(sizeof(*fwa), GFP_NOFS);
+	bool is_timeout = time_before64(fi->next_reval_time, get_jiffies_64());
+
+	fwa->pre_change_attr = pre_change_attr;
+	fwa->post_change_attr = post_change_attr;
+
+	fuse_add_change_attr(fi, &fwa->entry, FUSE_WCC_LIST_RECEIVING,
+			     FUSE_WCC_LIST_PENDING,
+			     is_timeout, is_timeout, false);
+}
+
 static void tree_insert(struct rb_root *root, struct fuse_writepage_args *wpa)
 {
 	WARN_ON(fuse_insert_writeback(root, wpa));
@@ -1762,6 +1976,11 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args,
 	struct fuse_inode *fi = get_fuse_inode(inode);
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
+	if (fc->wb_cto_wcc)
+		fuse_wcc_revalidate(inode,
+				    wpa->ia.write.out.pre_change_attr,
+				    wpa->ia.write.out.post_change_attr);
+
 	mapping_set_error(inode->i_mapping, error);
 	/*
 	 * A writeback finished and this might have updated mtime/ctime on
@@ -1838,6 +2057,7 @@ static struct fuse_file *fuse_write_file_get(struct fuse_inode *fi)
 int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
+	struct fuse_mount *fm = get_fuse_mount(inode);
 	struct fuse_file *ff;
 	int err;
 
@@ -1852,6 +2072,10 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 	 */
 	WARN_ON(wbc->for_reclaim);
 
+	/* In the case of CTO consistency, server updates cmtime by itself. */
+	if (fm->fc->wb_cto)
+		return 0;
+
 	ff = __fuse_write_file_get(fi);
 	err = fuse_flush_times(inode, ff);
 	if (ff)
@@ -3186,3 +3410,22 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
 		fuse_dax_inode_init(inode, flags);
 }
+
+void fuse_init_cto(struct inode *inode)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	int i;
+	struct timespec64 reval_interval = { 1, 0 };
+
+	fi->wb_attr_cache_valid = true;
+	for (i = 0; i < FUSE_WCC_LIST_NUM; i++) {
+		fi->wcc_attrs_ctr[i] = 0;
+		INIT_LIST_HEAD(&fi->wcc_attrs[i]);
+	}
+	spin_lock_init(&fi->reval_lock);
+	fi->next_reval_time = get_jiffies_64()
+			      + timespec64_to_jiffies(&reval_interval);
+
+	INIT_WORK(&fi->reval_work, fuse_revalidation_work);
+	INIT_WORK(&fi->reval_work_drain, fuse_revalidation_work_drain);
+}
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index eac4984cc753..1093287475b4 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -63,6 +63,24 @@ struct fuse_forget_link {
 	struct fuse_forget_link *next;
 };
 
+/** List node for cache revalidation in writeback mode */
+struct fuse_wcc_attr {
+	struct list_head entry;
+	uint64_t pre_change_attr;
+	uint64_t post_change_attr;
+};
+
+/**
+ * List types for delayed cache revalidation, in case of out-of-order
+ * change_attr arriving.
+ */
+enum fuse_wcc_attr_list_types {
+	FUSE_WCC_LIST_RECEIVING,
+	FUSE_WCC_LIST_PENDING,
+	FUSE_WCC_LIST_REVALIDATING,
+	FUSE_WCC_LIST_NUM
+};
+
 /** FUSE inode */
 struct fuse_inode {
 	/** Inode data */
@@ -155,6 +173,33 @@ struct fuse_inode {
 	 */
 	struct fuse_inode_dax *dax;
 #endif
+	/**
+	 * Counter returned from server, used for cache revalidation in
+	 * writeback mode.
+	 */
+	u64 change_attr;
+
+	/**
+	 * Whether local cmtime and size cache are still valid in writeback
+	 * mode.
+	 */
+	bool wb_attr_cache_valid;
+
+	/**
+	 * In case of out-of-order write request replies, delay WCC cache
+	 * revalidation by storing change_attrs in a list on receiving them.
+	 * In the per-second revalidation workqueue worker function, the
+	 * change_attrs will be sorted according to pre_change_attr before
+	 * they are used for cache revalidateion.
+	 */
+	struct list_head wcc_attrs[FUSE_WCC_LIST_NUM];
+	u64 wcc_attrs_ctr[FUSE_WCC_LIST_NUM];
+	u64 next_reval_time;
+	struct work_struct reval_work;
+	struct work_struct reval_work_drain;
+
+	/** Lock to protect cache revalidation lists */
+	spinlock_t reval_lock;
 };
 
 /** FUSE inode state bits */
@@ -167,6 +212,8 @@ enum {
 	FUSE_I_SIZE_UNSTABLE,
 	/* Bad inode */
 	FUSE_I_BAD,
+	/* Data cache is invalid in writeback mode */
+	FUSE_I_CACHE_INVALID,
 };
 
 struct fuse_conn;
@@ -642,6 +689,12 @@ struct fuse_conn {
 	/** write-back cache policy (default is write-through) */
 	unsigned writeback_cache:1;
 
+	/** Enable close-to-open (CTO) consistency in write-back mode */
+	unsigned wb_cto:1;
+
+	/** Enable CTO weak cache consistency (WCC) revalidation */
+	unsigned wb_cto_wcc:1;
+
 	/** allow parallel lookups and readdir (default is serialized) */
 	unsigned parallel_dirops:1;
 
@@ -833,6 +886,9 @@ struct fuse_conn {
 
 	/* New writepages go into this bucket */
 	struct fuse_sync_bucket __rcu *curr_bucket;
+
+	/** Workqueue for writeback cache revalidateion (CTO consistency). */
+	struct workqueue_struct *reval_wq;
 };
 
 /*
@@ -956,7 +1012,7 @@ extern const struct dentry_operations fuse_root_dentry_operations;
  */
 struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 			int generation, struct fuse_attr *attr,
-			u64 attr_valid, u64 attr_version);
+			u64 attr_valid, u64 attr_version, u64 change_attr);
 
 int fuse_lookup_name(struct super_block *sb, u64 nodeid, const struct qstr *name,
 		     struct fuse_entry_out *outarg, struct inode **inode);
@@ -1046,6 +1102,11 @@ void fuse_init_dir(struct inode *inode);
  */
 void fuse_init_symlink(struct inode *inode);
 
+/**
+ * Initialize close-to-open related structures
+ */
+void fuse_init_cto(struct inode *inode);
+
 /**
  * Change attributes of an inode
  */
@@ -1185,6 +1246,8 @@ void fuse_update_ctime(struct inode *inode);
 
 int fuse_update_attributes(struct inode *inode, struct file *file, u32 mask);
 
+void fuse_update_change_attr(struct inode *inode, u64 change_attr);
+
 void fuse_flush_writepages(struct inode *inode);
 
 void fuse_set_nowrite(struct inode *inode);
@@ -1312,7 +1375,7 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 /* file.c */
 
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
-				 unsigned int open_flags, bool isdir);
+				 struct file *file, bool isdir);
 void fuse_file_release(struct inode *inode, struct fuse_file *ff,
 		       unsigned int open_flags, fl_owner_t id, bool isdir);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9ee36aa73251..a1ddf9ae17ba 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -222,8 +222,10 @@ void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 u32 fuse_get_cache_mask(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
 
-	if (!fc->writeback_cache || !S_ISREG(inode->i_mode))
+	if (!fc->writeback_cache || !S_ISREG(inode->i_mode) ||
+	    !fi->wb_attr_cache_valid)
 		return 0;
 
 	return STATX_MTIME | STATX_CTIME | STATX_SIZE;
@@ -297,7 +299,7 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 				inval = true;
 		}
 
-		if (inval)
+		if (inval && !test_bit(FUSE_I_CACHE_INVALID, &fi->state))
 			invalidate_inode_pages2(inode->i_mapping);
 	}
 
@@ -307,6 +309,8 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 {
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
 	inode->i_mode = attr->mode & S_IFMT;
 	inode->i_size = attr->size;
 	inode->i_mtime.tv_sec  = attr->mtime;
@@ -327,6 +331,9 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 				   new_decode_dev(attr->rdev));
 	} else
 		BUG();
+
+	if (fc->wb_cto && (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)))
+		fuse_init_cto(inode);
 }
 
 static int fuse_inode_eq(struct inode *inode, void *_nodeidp)
@@ -347,7 +354,7 @@ static int fuse_inode_set(struct inode *inode, void *_nodeidp)
 
 struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 			int generation, struct fuse_attr *attr,
-			u64 attr_valid, u64 attr_version)
+			u64 attr_valid, u64 attr_version, u64 change_attr)
 {
 	struct inode *inode;
 	struct fuse_inode *fi;
@@ -367,7 +374,9 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 			return NULL;
 
 		fuse_init_inode(inode, attr);
-		get_fuse_inode(inode)->nodeid = nodeid;
+		fuse_update_change_attr(inode, change_attr);
+		fi = get_fuse_inode(inode);
+		fi->nodeid = nodeid;
 		inode->i_flags |= S_AUTOMOUNT;
 		goto done;
 	}
@@ -377,12 +386,14 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 	if (!inode)
 		return NULL;
 
+	fi = get_fuse_inode(inode);
 	if ((inode->i_state & I_NEW)) {
 		inode->i_flags |= S_NOATIME;
 		if (!fc->writeback_cache || !S_ISREG(attr->mode))
 			inode->i_flags |= S_NOCMTIME;
 		inode->i_generation = generation;
 		fuse_init_inode(inode, attr);
+		fuse_update_change_attr(inode, change_attr);
 		unlock_new_inode(inode);
 	} else if (fuse_stale_inode(inode, generation, attr)) {
 		/* nodeid was reused, any I/O on the old inode should fail */
@@ -390,8 +401,8 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 		iput(inode);
 		goto retry;
 	}
+
 done:
-	fi = get_fuse_inode(inode);
 	spin_lock(&fi->lock);
 	fi->nlookup++;
 	spin_unlock(&fi->lock);
@@ -437,6 +448,7 @@ int fuse_reverse_inval_inode(struct fuse_conn *fc, u64 nodeid,
 	fi = get_fuse_inode(inode);
 	spin_lock(&fi->lock);
 	fi->attr_version = atomic64_inc_return(&fc->attr_version);
+	fi->wb_attr_cache_valid = true;
 	spin_unlock(&fi->lock);
 
 	fuse_invalidate_attr(inode);
@@ -839,6 +851,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 	INIT_LIST_HEAD(&fc->mounts);
 	list_add(&fm->fc_entry, &fc->mounts);
 	fm->fc = fc;
+	fc->wb_cto = 0;
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
@@ -879,7 +892,7 @@ static struct inode *fuse_get_root_inode(struct super_block *sb, unsigned mode)
 	attr.mode = mode;
 	attr.ino = FUSE_ROOT_ID;
 	attr.nlink = 1;
-	return fuse_iget(sb, 1, 0, &attr, 0, 0);
+	return fuse_iget(sb, 1, 0, &attr, 0, 0, 0);
 }
 
 struct fuse_inode_handle {
@@ -1151,8 +1164,18 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 			}
 			if (flags & FUSE_ASYNC_DIO)
 				fc->async_dio = 1;
-			if (flags & FUSE_WRITEBACK_CACHE)
+			if (flags & FUSE_WRITEBACK_CACHE) {
 				fc->writeback_cache = 1;
+				if (arg->flags & FUSE_WB_CTO) {
+					fc->wb_cto = 1;
+					if (arg->flags & FUSE_WB_CTO_WCC)
+						fc->wb_cto_wcc = 1;
+					fuse_init_cto(fm->sb->s_root->d_inode);
+				}
+				fc->reval_wq = create_singlethread_workqueue("fuse-reval-wq");
+				if (!fc->reval_wq)
+					pr_warn("Failed to allocate fuse revalidateion workqueue!\n");
+			}
 			if (flags & FUSE_PARALLEL_DIROPS)
 				fc->parallel_dirops = 1;
 			if (flags & FUSE_HANDLE_KILLPRIV)
@@ -1234,7 +1257,7 @@ void fuse_send_init(struct fuse_mount *fm)
 		FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
 		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
 		FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
-		FUSE_SECURITY_CTX;
+		FUSE_SECURITY_CTX | FUSE_WB_CTO | FUSE_WB_CTO_WCC;
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		flags |= FUSE_MAP_ALIGNMENT;
@@ -1440,7 +1463,7 @@ static int fuse_fill_super_submount(struct super_block *sb,
 		return -ENOMEM;
 
 	fuse_fill_attr_from_inode(&root_attr, parent_fi);
-	root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0);
+	root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0, parent_fi->change_attr);
 	/*
 	 * This inode is just a duplicate, so it is not looked up and
 	 * its nlookup should not be incremented.  fuse_iget() does
@@ -1773,6 +1796,8 @@ void fuse_conn_destroy(struct fuse_mount *fm)
 		list_del(&fc->entry);
 		fuse_ctl_remove_conn(fc);
 		mutex_unlock(&fuse_mutex);
+		if (fc->reval_wq)
+			destroy_workqueue(fc->reval_wq);
 	}
 }
 EXPORT_SYMBOL_GPL(fuse_conn_destroy);
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index b4e565711045..bfd7d6ec3097 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -231,7 +231,7 @@ static int fuse_direntplus_link(struct file *file,
 	} else {
 		inode = fuse_iget(dir->i_sb, o->nodeid, o->generation,
 				  &o->attr, entry_attr_timeout(o),
-				  attr_version);
+				  attr_version, o->change_attr);
 		if (!inode)
 			inode = ERR_PTR(-ENOMEM);
 
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index d6ccee961891..bd12eda29871 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -194,6 +194,8 @@
  *  - add FUSE_SECURITY_CTX init flag
  *  - add security context to create, mkdir, symlink, and mknod requests
  *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
+ *  7.37
+ *  - add attr to fuse_open_out and fuse_write_out
  */
 
 #ifndef _LINUX_FUSE_H
@@ -229,7 +231,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 36
+#define FUSE_KERNEL_MINOR_VERSION 37
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -389,6 +391,8 @@ struct fuse_file_lock {
 /* bits 32..63 get shifted down 32 bits into the flags2 field */
 #define FUSE_SECURITY_CTX	(1ULL << 32)
 #define FUSE_HAS_INODE_DAX	(1ULL << 33)
+#define FUSE_WB_CTO		(1ULL << 34)
+#define FUSE_WB_CTO_WCC		(1ULL << 35)
 
 /**
  * CUSE INIT request/reply flags
@@ -570,6 +574,7 @@ struct fuse_entry_out {
 	uint32_t	entry_valid_nsec;
 	uint32_t	attr_valid_nsec;
 	struct fuse_attr attr;
+	uint64_t	change_attr;
 };
 
 struct fuse_forget_in {
@@ -664,6 +669,7 @@ struct fuse_open_out {
 	uint64_t	fh;
 	uint32_t	open_flags;
 	uint32_t	padding;
+	uint64_t	change_attr;
 };
 
 struct fuse_release_in {
@@ -705,6 +711,8 @@ struct fuse_write_in {
 struct fuse_write_out {
 	uint32_t	size;
 	uint32_t	padding;
+	uint64_t	pre_change_attr;
+	uint64_t	post_change_attr;
 };
 
 #define FUSE_COMPAT_STATFS_SIZE 48
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
  2022-03-25 13:21 [RFC PATCH] fuse: support cache revalidation in writeback_cache mode Jiachen Zhang
@ 2022-04-25 12:41 ` Miklos Szeredi
  2022-04-25 13:32   ` Jiachen Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2022-04-25 12:41 UTC (permalink / raw)
  To: Jiachen Zhang; +Cc: linux-fsdevel, linux-kernel, Xie Yongji

On Fri, 25 Mar 2022 at 14:23, Jiachen Zhang
<zhangjiachen.jaycee@bytedance.com> wrote:
>
> Hi all,
>
> This RFC patch implements attr cache and data cache revalidation for
> fuse writeback_cache mode in kernel. Looking forward to any suggestions
> or comments on this feature.

Quick question before going into the details:  could the cache
revalidation be done in the userspace filesystem instead, which would
set/clear FOPEN_KEEP_CACHE based on the result of the revalidation?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
  2022-04-25 12:41 ` Miklos Szeredi
@ 2022-04-25 13:32   ` Jiachen Zhang
  2022-04-25 13:41     ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Jiachen Zhang @ 2022-04-25 13:32 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, Xie Yongji

On Mon, Apr 25, 2022 at 8:41 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 25 Mar 2022 at 14:23, Jiachen Zhang
> <zhangjiachen.jaycee@bytedance.com> wrote:
> >
> > Hi all,
> >
> > This RFC patch implements attr cache and data cache revalidation for
> > fuse writeback_cache mode in kernel. Looking forward to any suggestions
> > or comments on this feature.
>
> Quick question before going into the details:  could the cache
> revalidation be done in the userspace filesystem instead, which would
> set/clear FOPEN_KEEP_CACHE based on the result of the revalidation?
>
> Thanks,
> Miklos

Hi, Miklos,

Thanks for replying. Yes, I believe we can also perform the
revalidation in userspace, and we can invalidate the data cache with
FOPEN_KEEP_CACHE cleared. But for now, there is no way we can
invalidate attr cache (c/mtime and size)  in writeback mode.

If it is more concise to implement most of the logic in userspace, do
you think we should add a flag for attr cache just like what
FOPEN_KEEP_CACHE does for data cache?

Thanks,
Jiachen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
  2022-04-25 13:32   ` Jiachen Zhang
@ 2022-04-25 13:41     ` Miklos Szeredi
  2022-04-25 13:52       ` Jiachen Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: Miklos Szeredi @ 2022-04-25 13:41 UTC (permalink / raw)
  To: Jiachen Zhang; +Cc: linux-fsdevel, linux-kernel, Xie Yongji

On Mon, 25 Apr 2022 at 15:33, Jiachen Zhang
<zhangjiachen.jaycee@bytedance.com> wrote:
>
> On Mon, Apr 25, 2022 at 8:41 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Fri, 25 Mar 2022 at 14:23, Jiachen Zhang
> > <zhangjiachen.jaycee@bytedance.com> wrote:
> > >
> > > Hi all,
> > >
> > > This RFC patch implements attr cache and data cache revalidation for
> > > fuse writeback_cache mode in kernel. Looking forward to any suggestions
> > > or comments on this feature.
> >
> > Quick question before going into the details:  could the cache
> > revalidation be done in the userspace filesystem instead, which would
> > set/clear FOPEN_KEEP_CACHE based on the result of the revalidation?
> >
> > Thanks,
> > Miklos
>
> Hi, Miklos,
>
> Thanks for replying. Yes, I believe we can also perform the
> revalidation in userspace, and we can invalidate the data cache with
> FOPEN_KEEP_CACHE cleared. But for now, there is no way we can
> invalidate attr cache (c/mtime and size)  in writeback mode.

Can you please describe the use case for invalidating the attr cache?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: Re: [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
  2022-04-25 13:41     ` Miklos Szeredi
@ 2022-04-25 13:52       ` Jiachen Zhang
  2022-04-26 13:09         ` Miklos Szeredi
  0 siblings, 1 reply; 9+ messages in thread
From: Jiachen Zhang @ 2022-04-25 13:52 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, Xie Yongji

On Mon, Apr 25, 2022 at 9:42 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, 25 Apr 2022 at 15:33, Jiachen Zhang
> <zhangjiachen.jaycee@bytedance.com> wrote:
> >
> > On Mon, Apr 25, 2022 at 8:41 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Fri, 25 Mar 2022 at 14:23, Jiachen Zhang
> > > <zhangjiachen.jaycee@bytedance.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > This RFC patch implements attr cache and data cache revalidation for
> > > > fuse writeback_cache mode in kernel. Looking forward to any suggestions
> > > > or comments on this feature.
> > >
> > > Quick question before going into the details:  could the cache
> > > revalidation be done in the userspace filesystem instead, which would
> > > set/clear FOPEN_KEEP_CACHE based on the result of the revalidation?
> > >
> > > Thanks,
> > > Miklos
> >
> > Hi, Miklos,
> >
> > Thanks for replying. Yes, I believe we can also perform the
> > revalidation in userspace, and we can invalidate the data cache with
> > FOPEN_KEEP_CACHE cleared. But for now, there is no way we can
> > invalidate attr cache (c/mtime and size)  in writeback mode.
>
> Can you please describe the use case for invalidating the attr cache?
>

Some users may want both the high performance of writeback mode and a
little bit more consistency among FUSE mounts. In the current
writeback mode implementation, users of one FUSE mount can never see
the file expansion done by other FUSE mounts.

Thanks,
Jiachen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: Re: [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
  2022-04-25 13:52       ` Jiachen Zhang
@ 2022-04-26 13:09         ` Miklos Szeredi
  2022-06-08 10:49           ` [External] " Jiachen Zhang
  2022-06-27 14:45           ` Vivek Goyal
  0 siblings, 2 replies; 9+ messages in thread
From: Miklos Szeredi @ 2022-04-26 13:09 UTC (permalink / raw)
  To: Jiachen Zhang; +Cc: linux-fsdevel, linux-kernel, Xie Yongji

On Mon, Apr 25, 2022 at 09:52:44PM +0800, Jiachen Zhang wrote:

> Some users may want both the high performance of writeback mode and a
> little bit more consistency among FUSE mounts. In the current
> writeback mode implementation, users of one FUSE mount can never see
> the file expansion done by other FUSE mounts.

Okay.

Here's a preliminary patch that you could try.

Thanks,
Miklos

---
 fs/fuse/dir.c             |   35 ++++++++++++++++++++++-------------
 fs/fuse/file.c            |   17 +++++++++++++++--
 fs/fuse/fuse_i.h          |   14 +++++++++++++-
 fs/fuse/inode.c           |   32 +++++++++++++++++++++++++++-----
 include/uapi/linux/fuse.h |    5 +++++
 5 files changed, 82 insertions(+), 21 deletions(-)

--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -194,6 +194,7 @@
  *  - add FUSE_SECURITY_CTX init flag
  *  - add security context to create, mkdir, symlink, and mknod requests
  *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
+ *  - add FUSE_WRITEBACK_CACHE_V2 init flag
  */
 
 #ifndef _LINUX_FUSE_H
@@ -353,6 +354,9 @@ struct fuse_file_lock {
  * FUSE_SECURITY_CTX:	add security context to create, mkdir, symlink, and
  *			mknod
  * FUSE_HAS_INODE_DAX:  use per inode DAX
+ * FUSE_WRITEBACK_CACHE_V2:
+ *			- allow time/size to be refreshed if no pending write
+ * 			- time/size not cached for falocate/copy_file_range
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -389,6 +393,7 @@ struct fuse_file_lock {
 /* bits 32..63 get shifted down 32 bits into the flags2 field */
 #define FUSE_SECURITY_CTX	(1ULL << 32)
 #define FUSE_HAS_INODE_DAX	(1ULL << 33)
+#define FUSE_WRITEBACK_CACHE_V2	(1ULL << 34)
 
 /**
  * CUSE INIT request/reply flags
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -222,19 +222,37 @@ void fuse_change_attributes_common(struc
 u32 fuse_get_cache_mask(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_inode *fi = get_fuse_inode(inode);
 
 	if (!fc->writeback_cache || !S_ISREG(inode->i_mode))
 		return 0;
 
+	/*
+	 * In writeback_cache_v2 mode if all the following conditions are met,
+	 * then allow the attributes to be refreshed:
+	 *
+	 * - inode is not dirty (I_DIRTY_INODE)
+	 * - inode is not in the process of being written (I_SYNC)
+	 * - inode has no dirty pages (I_DIRTY_PAGES)
+	 * - inode does not have any page writeback in progress
+	 *
+	 * Note: checking PAGECACHE_TAG_WRITEBACK is not sufficient in fuse,
+	 * since inode can appear to have no PageWriteback pages, yet still have
+	 * outstanding write request.
+	 */
+	if (fc->writeback_cache_v2 && !(inode->i_state & (I_DIRTY | I_SYNC)) &&
+	    RB_EMPTY_ROOT(&fi->writepages))
+		return 0;
+
 	return STATX_MTIME | STATX_CTIME | STATX_SIZE;
 }
 
-void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
-			    u64 attr_valid, u64 attr_version)
+void fuse_change_attributes_mask(struct inode *inode, struct fuse_attr *attr,
+				 u64 attr_valid, u64 attr_version,
+				 u32 cache_mask)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	struct fuse_inode *fi = get_fuse_inode(inode);
-	u32 cache_mask;
 	loff_t oldsize;
 	struct timespec64 old_mtime;
 
@@ -244,7 +262,7 @@ void fuse_change_attributes(struct inode
 	 * may update i_size.  In these cases trust the cached value in the
 	 * inode.
 	 */
-	cache_mask = fuse_get_cache_mask(inode);
+	cache_mask |= fuse_get_cache_mask(inode);
 	if (cache_mask & STATX_SIZE)
 		attr->size = i_size_read(inode);
 
@@ -1153,6 +1171,10 @@ static void process_init_reply(struct fu
 				fc->async_dio = 1;
 			if (flags & FUSE_WRITEBACK_CACHE)
 				fc->writeback_cache = 1;
+			if (flags & FUSE_WRITEBACK_CACHE_V2) {
+				fc->writeback_cache = 1;
+				fc->writeback_cache_v2 = 1;
+			}
 			if (flags & FUSE_PARALLEL_DIROPS)
 				fc->parallel_dirops = 1;
 			if (flags & FUSE_HANDLE_KILLPRIV)
@@ -1234,7 +1256,7 @@ void fuse_send_init(struct fuse_mount *f
 		FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
 		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
 		FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
-		FUSE_SECURITY_CTX;
+		FUSE_SECURITY_CTX | FUSE_WRITEBACK_CACHE_V2;
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		flags |= FUSE_MAP_ALIGNMENT;
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -213,6 +213,7 @@ static int fuse_dentry_revalidate(struct
 		FUSE_ARGS(args);
 		struct fuse_forget_link *forget;
 		u64 attr_version;
+		u32 cache_mask;
 
 		/* For negative dentries, always do a fresh lookup */
 		if (!inode)
@@ -230,6 +231,7 @@ static int fuse_dentry_revalidate(struct
 			goto out;
 
 		attr_version = fuse_get_attr_version(fm->fc);
+		cache_mask = fuse_get_cache_mask(inode);
 
 		parent = dget_parent(entry);
 		fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
@@ -259,9 +261,9 @@ static int fuse_dentry_revalidate(struct
 			goto invalid;
 
 		forget_all_cached_acls(inode);
-		fuse_change_attributes(inode, &outarg.attr,
-				       entry_attr_timeout(&outarg),
-				       attr_version);
+		fuse_change_attributes_mask(inode, &outarg.attr,
+					    entry_attr_timeout(&outarg),
+					    attr_version, cache_mask);
 		fuse_change_entry_timeout(entry, &outarg);
 	} else if (inode) {
 		fi = get_fuse_inode(inode);
@@ -836,16 +838,23 @@ static int fuse_symlink(struct user_name
 
 void fuse_flush_time_update(struct inode *inode)
 {
-	int err = sync_inode_metadata(inode, 1);
+	struct fuse_conn *fc = get_fuse_conn(inode);
+	int err;
 
-	mapping_set_error(inode->i_mapping, err);
+	if (!fc->writeback_cache_v2) {
+		err = sync_inode_metadata(inode, 1);
+		mapping_set_error(inode->i_mapping, err);
+	}
 }
 
 static void fuse_update_ctime_in_cache(struct inode *inode)
 {
 	if (!IS_NOCMTIME(inode)) {
+		struct fuse_conn *fc = get_fuse_conn(inode);
+
 		inode->i_ctime = current_time(inode);
-		mark_inode_dirty_sync(inode);
+		if (!fc->writeback_cache_v2)
+			mark_inode_dirty_sync(inode);
 		fuse_flush_time_update(inode);
 	}
 }
@@ -1065,7 +1074,7 @@ static void fuse_fillattr(struct inode *
 }
 
 static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
-			   struct file *file)
+			   struct file *file, u32 cache_mask)
 {
 	int err;
 	struct fuse_getattr_in inarg;
@@ -1100,9 +1109,9 @@ static int fuse_do_getattr(struct inode
 			fuse_make_bad(inode);
 			err = -EIO;
 		} else {
-			fuse_change_attributes(inode, &outarg.attr,
-					       attr_timeout(&outarg),
-					       attr_version);
+			fuse_change_attributes_mask(inode, &outarg.attr,
+						    attr_timeout(&outarg),
+						    attr_version, cache_mask);
 			if (stat)
 				fuse_fillattr(inode, &outarg.attr, stat);
 		}
@@ -1131,7 +1140,7 @@ static int fuse_update_get_attr(struct i
 
 	if (sync) {
 		forget_all_cached_acls(inode);
-		err = fuse_do_getattr(inode, stat, file);
+		err = fuse_do_getattr(inode, stat, file, cache_mask);
 	} else if (stat) {
 		generic_fillattr(&init_user_ns, inode, stat);
 		stat->mode = fi->orig_i_mode;
@@ -1277,7 +1286,7 @@ static int fuse_perm_getattr(struct inod
 		return -ECHILD;
 
 	forget_all_cached_acls(inode);
-	return fuse_do_getattr(inode, NULL, NULL);
+	return fuse_do_getattr(inode, NULL, NULL, 0);
 }
 
 /*
@@ -1833,7 +1842,7 @@ static int fuse_setattr(struct user_name
 			 * ia_mode calculation may have used stale i_mode.
 			 * Refresh and recalculate.
 			 */
-			ret = fuse_do_getattr(inode, NULL, file);
+			ret = fuse_do_getattr(inode, NULL, file, 0);
 			if (ret)
 				return ret;
 
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2949,6 +2949,19 @@ static int fuse_writeback_range(struct i
 	return err;
 }
 
+static void fuse_update_time(struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
+	if (!IS_NOCMTIME(inode)) {
+		if (fc->writeback_cache_v2)
+			inode->i_mtime = inode->i_ctime = current_time(inode);
+		else
+			file_update_time(file);
+	}
+}
+
 static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 				loff_t length)
 {
@@ -3021,7 +3034,7 @@ static long fuse_file_fallocate(struct f
 	/* we could have extended the file */
 	if (!(mode & FALLOC_FL_KEEP_SIZE)) {
 		if (fuse_write_update_attr(inode, offset + length, length))
-			file_update_time(file);
+			fuse_update_time(file);
 	}
 
 	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE))
@@ -3135,7 +3148,7 @@ static ssize_t __fuse_copy_file_range(st
 				   ALIGN_DOWN(pos_out, PAGE_SIZE),
 				   ALIGN(pos_out + outarg.size, PAGE_SIZE) - 1);
 
-	file_update_time(file_out);
+	fuse_update_time(file_out);
 	fuse_write_update_attr(inode_out, pos_out + outarg.size, outarg.size);
 
 	err = outarg.size;
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -654,6 +654,9 @@ struct fuse_conn {
 	/* show legacy mount options */
 	unsigned int legacy_opts_show:1;
 
+	/* Improved writeback cache policy */
+	unsigned writeback_cache_v2:1;
+
 	/*
 	 * fs kills suid/sgid/cap on write/chown/trunc. suid is killed on
 	 * write/trunc only if caller did not have CAP_FSETID.  sgid is killed
@@ -1049,8 +1052,17 @@ void fuse_init_symlink(struct inode *ino
 /**
  * Change attributes of an inode
  */
+void fuse_change_attributes_mask(struct inode *inode, struct fuse_attr *attr,
+				 u64 attr_valid, u64 attr_version,
+				 u32 cache_mask);
+
+static inline
 void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
-			    u64 attr_valid, u64 attr_version);
+			    u64 attr_valid, u64 attr_version)
+{
+	return fuse_change_attributes_mask(inode, attr,
+					   attr_valid, attr_version, 0);
+}
 
 void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
 				   u64 attr_valid, u32 cache_mask);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [External] Re: Re: Re: [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
  2022-04-26 13:09         ` Miklos Szeredi
@ 2022-06-08 10:49           ` Jiachen Zhang
  2022-06-27 14:45           ` Vivek Goyal
  1 sibling, 0 replies; 9+ messages in thread
From: Jiachen Zhang @ 2022-06-08 10:49 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, Xie Yongji

On Tue, Apr 26, 2022 at 9:09 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, Apr 25, 2022 at 09:52:44PM +0800, Jiachen Zhang wrote:
>
> > Some users may want both the high performance of writeback mode and a
> > little bit more consistency among FUSE mounts. In the current
> > writeback mode implementation, users of one FUSE mount can never see
> > the file expansion done by other FUSE mounts.
>
> Okay.
>
> Here's a preliminary patch that you could try.
>
> Thanks,
> Miklos
>

Hi, Miklos,

Thanks for the patch, and sorry for the late reply. I have already
tried on an older
kernel version based on the same idea of your patch, and it works fine for some
simple manual tests. I still have some questions about this patch.

> ---
>  fs/fuse/dir.c             |   35 ++++++++++++++++++++++-------------
>  fs/fuse/file.c            |   17 +++++++++++++++--
>  fs/fuse/fuse_i.h          |   14 +++++++++++++-
>  fs/fuse/inode.c           |   32 +++++++++++++++++++++++++++-----
>  include/uapi/linux/fuse.h |    5 +++++
>  5 files changed, 82 insertions(+), 21 deletions(-)
>
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -194,6 +194,7 @@
>   *  - add FUSE_SECURITY_CTX init flag
>   *  - add security context to create, mkdir, symlink, and mknod requests
>   *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
> + *  - add FUSE_WRITEBACK_CACHE_V2 init flag
>   */
>
>  #ifndef _LINUX_FUSE_H
> @@ -353,6 +354,9 @@ struct fuse_file_lock {
>   * FUSE_SECURITY_CTX:  add security context to create, mkdir, symlink, and
>   *                     mknod
>   * FUSE_HAS_INODE_DAX:  use per inode DAX
> + * FUSE_WRITEBACK_CACHE_V2:
> + *                     - allow time/size to be refreshed if no pending write
> + *                     - time/size not cached for falocate/copy_file_range
>   */
>  #define FUSE_ASYNC_READ                (1 << 0)
>  #define FUSE_POSIX_LOCKS       (1 << 1)
> @@ -389,6 +393,7 @@ struct fuse_file_lock {
>  /* bits 32..63 get shifted down 32 bits into the flags2 field */
>  #define FUSE_SECURITY_CTX      (1ULL << 32)
>  #define FUSE_HAS_INODE_DAX     (1ULL << 33)
> +#define FUSE_WRITEBACK_CACHE_V2        (1ULL << 34)
>
>  /**
>   * CUSE INIT request/reply flags
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -222,19 +222,37 @@ void fuse_change_attributes_common(struc
>  u32 fuse_get_cache_mask(struct inode *inode)
>  {
>         struct fuse_conn *fc = get_fuse_conn(inode);
> +       struct fuse_inode *fi = get_fuse_inode(inode);
>
>         if (!fc->writeback_cache || !S_ISREG(inode->i_mode))
>                 return 0;
>
> +       /*
> +        * In writeback_cache_v2 mode if all the following conditions are met,
> +        * then allow the attributes to be refreshed:
> +        *
> +        * - inode is not dirty (I_DIRTY_INODE)
> +        * - inode is not in the process of being written (I_SYNC)
> +        * - inode has no dirty pages (I_DIRTY_PAGES)
> +        * - inode does not have any page writeback in progress
> +        *
> +        * Note: checking PAGECACHE_TAG_WRITEBACK is not sufficient in fuse,
> +        * since inode can appear to have no PageWriteback pages, yet still have
> +        * outstanding write request.
> +        */
> +       if (fc->writeback_cache_v2 && !(inode->i_state & (I_DIRTY | I_SYNC)) &&
> +           RB_EMPTY_ROOT(&fi->writepages))
> +               return 0;
> +

Do we need to lock the inode between this cleanness checking and attr updating?
There may be writes between the two procedures and we should not update the
attributes in such cases.

>         return STATX_MTIME | STATX_CTIME | STATX_SIZE;
>  }
>
> -void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
> -                           u64 attr_valid, u64 attr_version)
> +void fuse_change_attributes_mask(struct inode *inode, struct fuse_attr *attr,
> +                                u64 attr_valid, u64 attr_version,
> +                                u32 cache_mask)
>  {
>         struct fuse_conn *fc = get_fuse_conn(inode);
>         struct fuse_inode *fi = get_fuse_inode(inode);
> -       u32 cache_mask;
>         loff_t oldsize;
>         struct timespec64 old_mtime;
>
> @@ -244,7 +262,7 @@ void fuse_change_attributes(struct inode
>          * may update i_size.  In these cases trust the cached value in the
>          * inode.
>          */
> -       cache_mask = fuse_get_cache_mask(inode);
> +       cache_mask |= fuse_get_cache_mask(inode);

Could we get the cache_mask only based on the fuse_get_cache_mask()
 in fuse_change_attributes()? Why the fuse_get_cache_mask() should
be called multiple times in fuse_dentry_revalidate() and
fuse_do_setattr() cases?

>         if (cache_mask & STATX_SIZE)
>                 attr->size = i_size_read(inode);
>
> @@ -1153,6 +1171,10 @@ static void process_init_reply(struct fu
>                                 fc->async_dio = 1;
>                         if (flags & FUSE_WRITEBACK_CACHE)
>                                 fc->writeback_cache = 1;
> +                       if (flags & FUSE_WRITEBACK_CACHE_V2) {
> +                               fc->writeback_cache = 1;
> +                               fc->writeback_cache_v2 = 1;
> +                       }
>                         if (flags & FUSE_PARALLEL_DIROPS)
>                                 fc->parallel_dirops = 1;
>                         if (flags & FUSE_HANDLE_KILLPRIV)
> @@ -1234,7 +1256,7 @@ void fuse_send_init(struct fuse_mount *f
>                 FUSE_ABORT_ERROR | FUSE_MAX_PAGES | FUSE_CACHE_SYMLINKS |
>                 FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
>                 FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
> -               FUSE_SECURITY_CTX;
> +               FUSE_SECURITY_CTX | FUSE_WRITEBACK_CACHE_V2;
>  #ifdef CONFIG_FUSE_DAX
>         if (fm->fc->dax)
>                 flags |= FUSE_MAP_ALIGNMENT;
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -213,6 +213,7 @@ static int fuse_dentry_revalidate(struct
>                 FUSE_ARGS(args);
>                 struct fuse_forget_link *forget;
>                 u64 attr_version;
> +               u32 cache_mask;
>
>                 /* For negative dentries, always do a fresh lookup */
>                 if (!inode)
> @@ -230,6 +231,7 @@ static int fuse_dentry_revalidate(struct
>                         goto out;
>
>                 attr_version = fuse_get_attr_version(fm->fc);
> +               cache_mask = fuse_get_cache_mask(inode);
>
>                 parent = dget_parent(entry);
>                 fuse_lookup_init(fm->fc, &args, get_node_id(d_inode(parent)),
> @@ -259,9 +261,9 @@ static int fuse_dentry_revalidate(struct
>                         goto invalid;
>
>                 forget_all_cached_acls(inode);
> -               fuse_change_attributes(inode, &outarg.attr,
> -                                      entry_attr_timeout(&outarg),
> -                                      attr_version);
> +               fuse_change_attributes_mask(inode, &outarg.attr,
> +                                           entry_attr_timeout(&outarg),
> +                                           attr_version, cache_mask);
>                 fuse_change_entry_timeout(entry, &outarg);
>         } else if (inode) {
>                 fi = get_fuse_inode(inode);
> @@ -836,16 +838,23 @@ static int fuse_symlink(struct user_name
>
>  void fuse_flush_time_update(struct inode *inode)
>  {
> -       int err = sync_inode_metadata(inode, 1);
> +       struct fuse_conn *fc = get_fuse_conn(inode);
> +       int err;
>
> -       mapping_set_error(inode->i_mapping, err);
> +       if (!fc->writeback_cache_v2) {
> +               err = sync_inode_metadata(inode, 1);
> +               mapping_set_error(inode->i_mapping, err);
> +       }
>  }
>
>  static void fuse_update_ctime_in_cache(struct inode *inode)
>  {
>         if (!IS_NOCMTIME(inode)) {
> +               struct fuse_conn *fc = get_fuse_conn(inode);
> +
>                 inode->i_ctime = current_time(inode);
> -               mark_inode_dirty_sync(inode);
> +               if (!fc->writeback_cache_v2)
> +                       mark_inode_dirty_sync(inode);
>                 fuse_flush_time_update(inode);
>         }
>  }
> @@ -1065,7 +1074,7 @@ static void fuse_fillattr(struct inode *
>  }
>
>  static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
> -                          struct file *file)
> +                          struct file *file, u32 cache_mask)
>  {
>         int err;
>         struct fuse_getattr_in inarg;
> @@ -1100,9 +1109,9 @@ static int fuse_do_getattr(struct inode
>                         fuse_make_bad(inode);
>                         err = -EIO;
>                 } else {
> -                       fuse_change_attributes(inode, &outarg.attr,
> -                                              attr_timeout(&outarg),
> -                                              attr_version);
> +                       fuse_change_attributes_mask(inode, &outarg.attr,
> +                                                   attr_timeout(&outarg),
> +                                                   attr_version, cache_mask);
>                         if (stat)
>                                 fuse_fillattr(inode, &outarg.attr, stat);
>                 }
> @@ -1131,7 +1140,7 @@ static int fuse_update_get_attr(struct i
>
>         if (sync) {
>                 forget_all_cached_acls(inode);
> -               err = fuse_do_getattr(inode, stat, file);
> +               err = fuse_do_getattr(inode, stat, file, cache_mask);
>         } else if (stat) {
>                 generic_fillattr(&init_user_ns, inode, stat);
>                 stat->mode = fi->orig_i_mode;
> @@ -1277,7 +1286,7 @@ static int fuse_perm_getattr(struct inod
>                 return -ECHILD;
>
>         forget_all_cached_acls(inode);
> -       return fuse_do_getattr(inode, NULL, NULL);
> +       return fuse_do_getattr(inode, NULL, NULL, 0);
>  }
>
>  /*
> @@ -1833,7 +1842,7 @@ static int fuse_setattr(struct user_name
>                          * ia_mode calculation may have used stale i_mode.
>                          * Refresh and recalculate.
>                          */
> -                       ret = fuse_do_getattr(inode, NULL, file);
> +                       ret = fuse_do_getattr(inode, NULL, file, 0);
>                         if (ret)
>                                 return ret;
>
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -2949,6 +2949,19 @@ static int fuse_writeback_range(struct i
>         return err;
>  }
>
> +static void fuse_update_time(struct file *file)
> +{
> +       struct inode *inode = file_inode(file);
> +       struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +       if (!IS_NOCMTIME(inode)) {
> +               if (fc->writeback_cache_v2)
> +                       inode->i_mtime = inode->i_ctime = current_time(inode);
> +               else
> +                       file_update_time(file);
> +       }
> +}
> +
>  static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
>                                 loff_t length)
>  {
> @@ -3021,7 +3034,7 @@ static long fuse_file_fallocate(struct f
>         /* we could have extended the file */
>         if (!(mode & FALLOC_FL_KEEP_SIZE)) {
>                 if (fuse_write_update_attr(inode, offset + length, length))
> -                       file_update_time(file);
> +                       fuse_update_time(file);
>         }
>
>         if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE))
> @@ -3135,7 +3148,7 @@ static ssize_t __fuse_copy_file_range(st
>                                    ALIGN_DOWN(pos_out, PAGE_SIZE),
>                                    ALIGN(pos_out + outarg.size, PAGE_SIZE) - 1);
>
> -       file_update_time(file_out);
> +       fuse_update_time(file_out);

To avoid setting I_DIRTY_SYNC in writeback_cache_v2 mode, maybe we should
replace all file_update_time() in fuse by fuse_update_time().

There are also file_update_time()s in fuse_page_mkwrite(),
fuse_cache_write_iter(),
and fuse_finish_open().


>         fuse_write_update_attr(inode_out, pos_out + outarg.size, outarg.size);
>
>         err = outarg.size;
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -654,6 +654,9 @@ struct fuse_conn {
>         /* show legacy mount options */
>         unsigned int legacy_opts_show:1;
>
> +       /* Improved writeback cache policy */
> +       unsigned writeback_cache_v2:1;
> +
>         /*
>          * fs kills suid/sgid/cap on write/chown/trunc. suid is killed on
>          * write/trunc only if caller did not have CAP_FSETID.  sgid is killed
> @@ -1049,8 +1052,17 @@ void fuse_init_symlink(struct inode *ino
>  /**
>   * Change attributes of an inode
>   */
> +void fuse_change_attributes_mask(struct inode *inode, struct fuse_attr *attr,
> +                                u64 attr_valid, u64 attr_version,
> +                                u32 cache_mask);
> +
> +static inline
>  void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
> -                           u64 attr_valid, u64 attr_version);
> +                           u64 attr_valid, u64 attr_version)
> +{
> +       return fuse_change_attributes_mask(inode, attr,
> +                                          attr_valid, attr_version, 0);
> +}
>
>  void fuse_change_attributes_common(struct inode *inode, struct fuse_attr *attr,
>                                    u64 attr_valid, u32 cache_mask);

Also, maybe we can add a FOPEN_INVAL_ATTR flag [1] to invalidate the
attr cache on
file open. With the writeback v2 patch and FOPEN_INVAL_ATTR, the close-to-open
consistency revalidations can be implemented in user-space fuse daemons.

[1] https://lore.kernel.org/linux-fsdevel/20220608104202.19461-1-zhangjiachen.jaycee@bytedance.com/

Thanks,
Jiachen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: Re: [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
  2022-04-26 13:09         ` Miklos Szeredi
  2022-06-08 10:49           ` [External] " Jiachen Zhang
@ 2022-06-27 14:45           ` Vivek Goyal
  2022-06-28 10:46             ` Jiachen Zhang
  1 sibling, 1 reply; 9+ messages in thread
From: Vivek Goyal @ 2022-06-27 14:45 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jiachen Zhang, linux-fsdevel, linux-kernel, Xie Yongji

On Tue, Apr 26, 2022 at 03:09:05PM +0200, Miklos Szeredi wrote:
> On Mon, Apr 25, 2022 at 09:52:44PM +0800, Jiachen Zhang wrote:
> 
> > Some users may want both the high performance of writeback mode and a
> > little bit more consistency among FUSE mounts. In the current
> > writeback mode implementation, users of one FUSE mount can never see
> > the file expansion done by other FUSE mounts.
> 
> Okay.
> 
> Here's a preliminary patch that you could try.
> 
> Thanks,
> Miklos
> 
> ---
>  fs/fuse/dir.c             |   35 ++++++++++++++++++++++-------------
>  fs/fuse/file.c            |   17 +++++++++++++++--
>  fs/fuse/fuse_i.h          |   14 +++++++++++++-
>  fs/fuse/inode.c           |   32 +++++++++++++++++++++++++++-----
>  include/uapi/linux/fuse.h |    5 +++++
>  5 files changed, 82 insertions(+), 21 deletions(-)
> 
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -194,6 +194,7 @@
>   *  - add FUSE_SECURITY_CTX init flag
>   *  - add security context to create, mkdir, symlink, and mknod requests
>   *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
> + *  - add FUSE_WRITEBACK_CACHE_V2 init flag
>   */
>  
>  #ifndef _LINUX_FUSE_H
> @@ -353,6 +354,9 @@ struct fuse_file_lock {
>   * FUSE_SECURITY_CTX:	add security context to create, mkdir, symlink, and
>   *			mknod
>   * FUSE_HAS_INODE_DAX:  use per inode DAX
> + * FUSE_WRITEBACK_CACHE_V2:
> + *			- allow time/size to be refreshed if no pending write
> + * 			- time/size not cached for falocate/copy_file_range
>   */
>  #define FUSE_ASYNC_READ		(1 << 0)
>  #define FUSE_POSIX_LOCKS	(1 << 1)
> @@ -389,6 +393,7 @@ struct fuse_file_lock {
>  /* bits 32..63 get shifted down 32 bits into the flags2 field */
>  #define FUSE_SECURITY_CTX	(1ULL << 32)
>  #define FUSE_HAS_INODE_DAX	(1ULL << 33)
> +#define FUSE_WRITEBACK_CACHE_V2	(1ULL << 34)
>  
>  /**
>   * CUSE INIT request/reply flags
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -222,19 +222,37 @@ void fuse_change_attributes_common(struc
>  u32 fuse_get_cache_mask(struct inode *inode)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
> +	struct fuse_inode *fi = get_fuse_inode(inode);
>  
>  	if (!fc->writeback_cache || !S_ISREG(inode->i_mode))
>  		return 0;
>  
> +	/*
> +	 * In writeback_cache_v2 mode if all the following conditions are met,
> +	 * then allow the attributes to be refreshed:
> +	 *
> +	 * - inode is not dirty (I_DIRTY_INODE)
> +	 * - inode is not in the process of being written (I_SYNC)
> +	 * - inode has no dirty pages (I_DIRTY_PAGES)
> +	 * - inode does not have any page writeback in progress
> +	 *
> +	 * Note: checking PAGECACHE_TAG_WRITEBACK is not sufficient in fuse,
> +	 * since inode can appear to have no PageWriteback pages, yet still have
> +	 * outstanding write request.
> +	 */

Hi,

I started following this thread just now after Jiachen pointed me to
previous conversations. Without going into too much details.

Based on above description, so we will update mtime/ctime/i_size only
if inode does not have dirty pages or nothing is in progress. So that
means sometime we will update it and other times we will ignore it. 

Do I understand it correctly. I am wondering how that is useful to
applications. 

I thought that other remote filesystems might have leasing for this so
that one client can acquire the lease and cache changes and when lease
is broken, this client pushes out all the changes and other client gets
the lease.

Given we don't have any lease mechanism, we probably need to define the
semantics more clearly and we should probably document it as well.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: Re: Re: [RFC PATCH] fuse: support cache revalidation in writeback_cache mode
  2022-06-27 14:45           ` Vivek Goyal
@ 2022-06-28 10:46             ` Jiachen Zhang
  0 siblings, 0 replies; 9+ messages in thread
From: Jiachen Zhang @ 2022-06-28 10:46 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, linux-fsdevel, linux-kernel, Xie Yongji

On Mon, Jun 27, 2022 at 10:45 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Tue, Apr 26, 2022 at 03:09:05PM +0200, Miklos Szeredi wrote:
> > On Mon, Apr 25, 2022 at 09:52:44PM +0800, Jiachen Zhang wrote:
> >
> > > Some users may want both the high performance of writeback mode and a
> > > little bit more consistency among FUSE mounts. In the current
> > > writeback mode implementation, users of one FUSE mount can never see
> > > the file expansion done by other FUSE mounts.
> >
> > Okay.
> >
> > Here's a preliminary patch that you could try.
> >
> > Thanks,
> > Miklos
> >
> > ---
> >  fs/fuse/dir.c             |   35 ++++++++++++++++++++++-------------
> >  fs/fuse/file.c            |   17 +++++++++++++++--
> >  fs/fuse/fuse_i.h          |   14 +++++++++++++-
> >  fs/fuse/inode.c           |   32 +++++++++++++++++++++++++++-----
> >  include/uapi/linux/fuse.h |    5 +++++
> >  5 files changed, 82 insertions(+), 21 deletions(-)
> >
> > --- a/include/uapi/linux/fuse.h
> > +++ b/include/uapi/linux/fuse.h
> > @@ -194,6 +194,7 @@
> >   *  - add FUSE_SECURITY_CTX init flag
> >   *  - add security context to create, mkdir, symlink, and mknod requests
> >   *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
> > + *  - add FUSE_WRITEBACK_CACHE_V2 init flag
> >   */
> >
> >  #ifndef _LINUX_FUSE_H
> > @@ -353,6 +354,9 @@ struct fuse_file_lock {
> >   * FUSE_SECURITY_CTX:        add security context to create, mkdir, symlink, and
> >   *                   mknod
> >   * FUSE_HAS_INODE_DAX:  use per inode DAX
> > + * FUSE_WRITEBACK_CACHE_V2:
> > + *                   - allow time/size to be refreshed if no pending write
> > + *                   - time/size not cached for falocate/copy_file_range
> >   */
> >  #define FUSE_ASYNC_READ              (1 << 0)
> >  #define FUSE_POSIX_LOCKS     (1 << 1)
> > @@ -389,6 +393,7 @@ struct fuse_file_lock {
> >  /* bits 32..63 get shifted down 32 bits into the flags2 field */
> >  #define FUSE_SECURITY_CTX    (1ULL << 32)
> >  #define FUSE_HAS_INODE_DAX   (1ULL << 33)
> > +#define FUSE_WRITEBACK_CACHE_V2      (1ULL << 34)
> >
> >  /**
> >   * CUSE INIT request/reply flags
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -222,19 +222,37 @@ void fuse_change_attributes_common(struc
> >  u32 fuse_get_cache_mask(struct inode *inode)
> >  {
> >       struct fuse_conn *fc = get_fuse_conn(inode);
> > +     struct fuse_inode *fi = get_fuse_inode(inode);
> >
> >       if (!fc->writeback_cache || !S_ISREG(inode->i_mode))
> >               return 0;
> >
> > +     /*
> > +      * In writeback_cache_v2 mode if all the following conditions are met,
> > +      * then allow the attributes to be refreshed:
> > +      *
> > +      * - inode is not dirty (I_DIRTY_INODE)
> > +      * - inode is not in the process of being written (I_SYNC)
> > +      * - inode has no dirty pages (I_DIRTY_PAGES)
> > +      * - inode does not have any page writeback in progress
> > +      *
> > +      * Note: checking PAGECACHE_TAG_WRITEBACK is not sufficient in fuse,
> > +      * since inode can appear to have no PageWriteback pages, yet still have
> > +      * outstanding write request.
> > +      */
>
> Hi,
>
> I started following this thread just now after Jiachen pointed me to
> previous conversations. Without going into too much details.
>
> Based on above description, so we will update mtime/ctime/i_size only
> if inode does not have dirty pages or nothing is in progress. So that
> means sometime we will update it and other times we will ignore it.
>
> Do I understand it correctly. I am wondering how that is useful to
> applications.
>
> I thought that other remote filesystems might have leasing for this so
> that one client can acquire the lease and cache changes and when lease
> is broken, this client pushes out all the changes and other client gets
> the lease.
>
> Given we don't have any lease mechanism, we probably need to define the
> semantics more clearly and we should probably document it as well.
>

Hi Vivek,

I agree we should define or document the semantics properly. For now,
it seems that Miklos' writeback_mode_v2 is making best-effort updating
when pages are not dirty and a set of new attributes are returned from
FUSE server.

Thanks,
Jiachen

> Thanks
> Vivek
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-28 10:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-25 13:21 [RFC PATCH] fuse: support cache revalidation in writeback_cache mode Jiachen Zhang
2022-04-25 12:41 ` Miklos Szeredi
2022-04-25 13:32   ` Jiachen Zhang
2022-04-25 13:41     ` Miklos Szeredi
2022-04-25 13:52       ` Jiachen Zhang
2022-04-26 13:09         ` Miklos Szeredi
2022-06-08 10:49           ` [External] " Jiachen Zhang
2022-06-27 14:45           ` Vivek Goyal
2022-06-28 10:46             ` Jiachen Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.