All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/5] fuse,virtiofs: support per-file DAX
@ 2021-09-23  9:25 Jeffle Xu
  2021-09-23  9:25 ` [PATCH v5 1/5] fuse: add fuse_should_enable_dax() helper Jeffle Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:25 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

This patchset adds support of per-file DAX for virtiofs, which is
inspired by Ira Weiny's work on ext4[1] and xfs[2].

Any comment is welcome.

[1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
[2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")


[Purpose]
DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per-file DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.


[Note]
When the per-file DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state, since dynamic switching needs to switch a_ops atomiclly. Ira
Weiny had ever implemented a so called i_aops_sem lock [3] but
eventually gave up since the complexity of the implementation [4][5][6][7].

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per-file DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later. This is also how ext4/xfs per-file DAX works.

[3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
[4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
[5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
[6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
[7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/

chanegs since v4:
- drop support for setting/clearing FS_DAX inside guest
- and thus drop the negotiation phase during FUSE_INIT

changes since v3:
- bug fix (patch 6): s/"IS_DAX(inode) != newdax"/"!!IS_DAX(inode) !=
  newdax"
- during FUSE_INIT, advertise capability for per-file DAX only when
  mounted as "-o dax=inode" (patch 4)

changes since v2:
- modify fuse_show_options() accordingly to make it compatible with
  new tri-state mount option (patch 2)
- extract FUSE protocol changes into one seperate patch (patch 3)
- FUSE server/client need to negotiate if they support per-file DAX
  (patch 4)
- extract DONT_CACHE logic into patch 6/7

v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html


Jeffle Xu (5):
  fuse: add fuse_should_enable_dax() helper
  fuse: make DAX mount option a tri-state
  fuse: support per-file DAX
  fuse: enable per-file DAX
  fuse: mark inode DONT_CACHE when per-file DAX hint changes

 fs/fuse/dax.c             | 32 +++++++++++++++++++++++++++++---
 fs/fuse/file.c            |  4 ++--
 fs/fuse/fuse_i.h          | 19 +++++++++++++++----
 fs/fuse/inode.c           | 15 +++++++++++----
 fs/fuse/virtio_fs.c       | 16 ++++++++++++++--
 include/uapi/linux/fuse.h |  7 ++++++-
 6 files changed, 77 insertions(+), 16 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 1/5] fuse: add fuse_should_enable_dax() helper
  2021-09-23  9:25 [PATCH v5 0/5] fuse,virtiofs: support per-file DAX Jeffle Xu
@ 2021-09-23  9:25 ` Jeffle Xu
  2021-09-23 18:58   ` Vivek Goyal
  2021-09-23  9:25 ` [PATCH v5 2/5] fuse: make DAX mount option a tri-state Jeffle Xu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:25 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

This is in prep for following per-file DAX checking.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 281d79f8b3d3..28db96ea23e2 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1330,11 +1330,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-void fuse_dax_inode_init(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
 	if (!fc->dax)
+		return false;
+
+	return true;
+}
+
+void fuse_dax_inode_init(struct inode *inode)
+{
+	if (!fuse_should_enable_dax(inode))
 		return;
 
 	inode->i_flags |= S_DAX;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-23  9:25 [PATCH v5 0/5] fuse,virtiofs: support per-file DAX Jeffle Xu
  2021-09-23  9:25 ` [PATCH v5 1/5] fuse: add fuse_should_enable_dax() helper Jeffle Xu
@ 2021-09-23  9:25 ` Jeffle Xu
  2021-09-23 19:02   ` Vivek Goyal
  2021-09-23  9:25 ` [PATCH v5 3/5] fuse: support per-file DAX Jeffle Xu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:25 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'. To be consistemt with
ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
option is specified, the default behaviour is equal to 'inode'.

By the time this patch is applied, 'inode' mode is actually equal to
'always' mode, before the per-file DAX flag is introduced in the
following patch.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c       |  9 +++++++--
 fs/fuse/fuse_i.h    | 14 ++++++++++++--
 fs/fuse/inode.c     | 10 +++++++---
 fs/fuse/virtio_fs.c | 16 ++++++++++++++--
 4 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 28db96ea23e2..e87ddded38c7 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1282,11 +1282,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
 	return ret;
 }
 
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
+			struct dax_device *dax_dev)
 {
 	struct fuse_conn_dax *fcd;
 	int err;
 
+	fc->dax_mode = dax_mode;
+
 	if (!dax_dev)
 		return 0;
 
@@ -1333,8 +1336,10 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	unsigned int dax_mode = fc->dax_mode;
 
-	if (!fc->dax)
+	/* If 'dax=always/inode', fc->dax couldn't be NULL */
+	if (dax_mode == FUSE_DAX_NEVER)
 		return false;
 
 	return true;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 319596df5dc6..5abf9749923f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -480,6 +480,12 @@ struct fuse_dev {
 	struct list_head entry;
 };
 
+enum fuse_dax_mode {
+	FUSE_DAX_INODE,
+	FUSE_DAX_ALWAYS,
+	FUSE_DAX_NEVER,
+};
+
 struct fuse_fs_context {
 	int fd;
 	struct file *file;
@@ -497,7 +503,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
-	bool dax:1;
+	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
@@ -802,6 +808,9 @@ struct fuse_conn {
 	struct list_head devices;
 
 #ifdef CONFIG_FUSE_DAX
+	/* dax mode: FUSE_DAX_* (always, never or per-file) */
+	enum fuse_dax_mode dax_mode;
+
 	/* Dax specific conn data, non-NULL if DAX is enabled */
 	struct fuse_conn_dax *dax;
 #endif
@@ -1255,7 +1264,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
 ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
 int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
+			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 36cd03114b6d..b4b41683e97e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -742,8 +742,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
 	}
 #ifdef CONFIG_FUSE_DAX
-	if (fc->dax)
-		seq_puts(m, ",dax");
+	if (fc->dax_mode == FUSE_DAX_ALWAYS)
+		seq_puts(m, ",dax=always");
+	else if (fc->dax_mode == FUSE_DAX_NEVER)
+		seq_puts(m, ",dax=never");
+	else if (fc->dax_mode == FUSE_DAX_INODE)
+		seq_puts(m, ",dax=inode");
 #endif
 
 	return 0;
@@ -1493,7 +1497,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	sb->s_subtype = ctx->subtype;
 	ctx->subtype = NULL;
 	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
-		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
+		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
 		if (err)
 			goto err;
 	}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 0ad89c6629d7..58cfbaeb4a7d 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -88,12 +88,21 @@ struct virtio_fs_req_work {
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
+static const struct constant_table dax_param_enums[] = {
+	{"inode",	FUSE_DAX_INODE },
+	{"always",	FUSE_DAX_ALWAYS },
+	{"never",	FUSE_DAX_NEVER },
+	{}
+};
+
 enum {
 	OPT_DAX,
+	OPT_DAX_ENUM,
 };
 
 static const struct fs_parameter_spec virtio_fs_parameters[] = {
 	fsparam_flag("dax", OPT_DAX),
+	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
 	{}
 };
 
@@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
 
 	switch (opt) {
 	case OPT_DAX:
-		ctx->dax = 1;
+		ctx->dax_mode = FUSE_DAX_ALWAYS;
+		break;
+	case OPT_DAX_ENUM:
+		ctx->dax_mode = result.uint_32;
 		break;
 	default:
 		return -EINVAL;
@@ -1326,7 +1338,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
 
 	/* virtiofs allocates and installs its own fuse devices */
 	ctx->fudptr = NULL;
-	if (ctx->dax) {
+	if (ctx->dax_mode != FUSE_DAX_NEVER) {
 		if (!fs->dax_dev) {
 			err = -EINVAL;
 			pr_err("virtio-fs: dax can't be enabled as filesystem"
-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 3/5] fuse: support per-file DAX
  2021-09-23  9:25 [PATCH v5 0/5] fuse,virtiofs: support per-file DAX Jeffle Xu
  2021-09-23  9:25 ` [PATCH v5 1/5] fuse: add fuse_should_enable_dax() helper Jeffle Xu
  2021-09-23  9:25 ` [PATCH v5 2/5] fuse: make DAX mount option a tri-state Jeffle Xu
@ 2021-09-23  9:25 ` Jeffle Xu
  2021-09-23  9:25 ` [PATCH v5 4/5] fuse: enable " Jeffle Xu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:25 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

Expand the fuse protocol to support per-file DAX.

FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
corresponding file when replying FUSE_LOOKUP request.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/uapi/linux/fuse.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 36ed092227fa..88dc24512124 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -184,6 +184,9 @@
  *
  *  7.34
  *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -219,7 +222,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 34
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -449,8 +452,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 4/5] fuse: enable per-file DAX
  2021-09-23  9:25 [PATCH v5 0/5] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (2 preceding siblings ...)
  2021-09-23  9:25 ` [PATCH v5 3/5] fuse: support per-file DAX Jeffle Xu
@ 2021-09-23  9:25 ` Jeffle Xu
  2021-09-23  9:25 ` [PATCH v5 5/5] fuse: mark inode DONT_CACHE when per-file DAX hint changes Jeffle Xu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:25 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per-file DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.

The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
DAX should be enabled or not for corresponding file. Currently the state
whether DAX is enabled or not for the file is initialized only when
inode is instantiated.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 12 ++++++++----
 fs/fuse/file.c   |  4 ++--
 fs/fuse/fuse_i.h |  4 ++--
 fs/fuse/inode.c  |  2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index e87ddded38c7..b0682ddd5d77 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1333,7 +1333,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-static bool fuse_should_enable_dax(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	unsigned int dax_mode = fc->dax_mode;
@@ -1342,12 +1342,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
 	if (dax_mode == FUSE_DAX_NEVER)
 		return false;
 
-	return true;
+	if (dax_mode == FUSE_DAX_ALWAYS)
+		return true;
+
+	/* dax_mode == FUSE_DAX_INODE */
+	return flags & FUSE_ATTR_DAX;
 }
 
-void fuse_dax_inode_init(struct inode *inode)
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 {
-	if (!fuse_should_enable_dax(inode))
+	if (!fuse_should_enable_dax(inode, flags))
 		return;
 
 	inode->i_flags |= S_DAX;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 11404f8c21c7..40c667a48cf6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3163,7 +3163,7 @@ static const struct address_space_operations fuse_file_aops  = {
 	.write_end	= fuse_write_end,
 };
 
-void fuse_init_file_inode(struct inode *inode)
+void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 
@@ -3177,5 +3177,5 @@ void fuse_init_file_inode(struct inode *inode)
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
-		fuse_dax_inode_init(inode);
+		fuse_dax_inode_init(inode, flags);
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 5abf9749923f..0270a41c31d7 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1016,7 +1016,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 /**
  * Initialize file operations on a regular file
  */
-void fuse_init_file_inode(struct inode *inode);
+void fuse_init_file_inode(struct inode *inode, unsigned int flags);
 
 /**
  * Initialize inode operations on regular files and special files
@@ -1268,7 +1268,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
 			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
-void fuse_dax_inode_init(struct inode *inode);
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b4b41683e97e..48f466a65b76 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -280,7 +280,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	inode->i_ctime.tv_nsec = attr->ctimensec;
 	if (S_ISREG(inode->i_mode)) {
 		fuse_init_common(inode);
-		fuse_init_file_inode(inode);
+		fuse_init_file_inode(inode, attr->flags);
 	} else if (S_ISDIR(inode->i_mode))
 		fuse_init_dir(inode);
 	else if (S_ISLNK(inode->i_mode))
-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 5/5] fuse: mark inode DONT_CACHE when per-file DAX hint changes
  2021-09-23  9:25 [PATCH v5 0/5] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (3 preceding siblings ...)
  2021-09-23  9:25 ` [PATCH v5 4/5] fuse: enable " Jeffle Xu
@ 2021-09-23  9:25 ` Jeffle Xu
  2021-09-23  9:35 ` [virtiofsd PATCH v5 0/2] virtiofsd: support per-file DAX Jeffle Xu
  2021-09-23 18:57 ` [PATCH v5 0/5] fuse,virtiofs: " Vivek Goyal
  6 siblings, 0 replies; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:25 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

When the per-file DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per-file DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 9 +++++++++
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index b0682ddd5d77..d90eaf1c5b48 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1358,6 +1358,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 	inode->i_data.a_ops = &fuse_dax_file_aops;
 }
 
+void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
+	if (fc->dax_mode == FUSE_DAX_INODE &&
+	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
+		d_mark_dontcache(inode);
+}
+
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
 {
 	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0270a41c31d7..bb2c11e0311a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1270,6 +1270,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 48f466a65b76..aee462a25231 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -268,6 +268,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 		if (inval)
 			invalidate_inode_pages2(inode->i_mapping);
 	}
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX))
+		fuse_dax_dontcache(inode, attr->flags);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [virtiofsd PATCH v5 0/2] virtiofsd: support per-file DAX
  2021-09-23  9:25 [PATCH v5 0/5] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (4 preceding siblings ...)
  2021-09-23  9:25 ` [PATCH v5 5/5] fuse: mark inode DONT_CACHE when per-file DAX hint changes Jeffle Xu
@ 2021-09-23  9:35 ` Jeffle Xu
  2021-09-23  9:35   ` [virtiofsd PATCH v5 1/2] virtiofsd: add FUSE_ATTR_DAX to fuse protocol Jeffle Xu
  2021-09-23  9:35   ` [virtiofsd PATCH v5 2/2] virtiofsd: support per-file DAX Jeffle Xu
  2021-09-23 18:57 ` [PATCH v5 0/5] fuse,virtiofs: " Vivek Goyal
  6 siblings, 2 replies; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:35 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

changes since v4:
- decide whether DAX shall be enabled or not solely depending on file
  size (DAX is disabled for files smaller than 32KB)
- negotiation during FUSE_INIT is droped
- drop support for .ioctl() for passthrough

changes since v2/v3:
Patch 4 in v2 is incomplete by mistake and it will fail to be compiled.
I had ever sent a seperate patch 4 of v3. Now I send the whole complete
set in v4. Except for this, there's no other diferrence.


Jeffle Xu (2):
  virtiofsd: add FUSE_ATTR_DAX to fuse protocol
  virtiofsd: support per-file DAX

 include/standard-headers/linux/fuse.h |  1 +
 tools/virtiofsd/passthrough_ll.c      | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [virtiofsd PATCH v5 1/2] virtiofsd: add FUSE_ATTR_DAX to fuse protocol
  2021-09-23  9:35 ` [virtiofsd PATCH v5 0/2] virtiofsd: support per-file DAX Jeffle Xu
@ 2021-09-23  9:35   ` Jeffle Xu
  2021-09-23  9:35   ` [virtiofsd PATCH v5 2/2] virtiofsd: support per-file DAX Jeffle Xu
  1 sibling, 0 replies; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:35 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

... in prep for the following support for per-file DAX feature.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/standard-headers/linux/fuse.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index 950d7edb7e..a2c73f1469 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -440,6 +440,7 @@ struct fuse_file_lock {
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [virtiofsd PATCH v5 2/2] virtiofsd: support per-file DAX
  2021-09-23  9:35 ` [virtiofsd PATCH v5 0/2] virtiofsd: support per-file DAX Jeffle Xu
  2021-09-23  9:35   ` [virtiofsd PATCH v5 1/2] virtiofsd: add FUSE_ATTR_DAX to fuse protocol Jeffle Xu
@ 2021-09-23  9:35   ` Jeffle Xu
  1 sibling, 0 replies; 26+ messages in thread
From: Jeffle Xu @ 2021-09-23  9:35 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

When DAX window is fully utilized and needs to be exapnded to avoid the
performance fluctuation triggered by DAX window recaliming, it may not
be wise to allocate DAX window for files with size smaller than some
specific point, considering from the perspective of reducing memory
overhead.

To maintain one DAX window chunk (e.g., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest. Thus it'd better disable DAX for those files smaller than
32KB, to reduce the demand for DAX window and thus avoid the unworthy
memory overhead.

Thus only flag the file with FUSE_ATTR_DAX when the file size is greater
than 32 KB. The guest will enable DAX only for those files flagged with
FUSE_ATTR_DAX, when virtiofs is mounted with '-o dax=inode'.

To be noted that both FUSE_LOOKUP and FUSE_READDIRPLUS are affected, and
will convey FUSE_ATTR_DAX flag to the guest.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b76d878509..6893180c6b 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -53,12 +53,26 @@
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <sys/xattr.h>
+#include <sys/user.h>
 #include <syslog.h>
 
 #include "qemu/cutils.h"
 #include "passthrough_helpers.h"
 #include "passthrough_seccomp.h"
 
+/*
+ * One page descriptor (64 bytes in size) needs to be maintained for every page
+ * in the DAX window chunk, i.e., there is certain guest memory overhead when
+ * DAX is enabled. Thus disable DAX for those files smaller than this certain
+ * memory overhead if virtiofs is mounted in per-file DAX mode, in which case
+ * the guest page cache will consume less memory when DAX is disabled.
+ */
+#define FUSE_DAX_SHIFT	21
+#define PAGE_DESC_SHIFT	6
+#define FUSE_PERFILE_DAX_SHIFT \
+    (FUSE_DAX_SHIFT - PAGE_SHIFT + PAGE_DESC_SHIFT)
+#define FUSE_PERFILE_DAX_POINT	(1 << FUSE_PERFILE_DAX_SHIFT)
+
 /* Keep track of inode posix locks for each owner. */
 struct lo_inode_plock {
     uint64_t lock_owner;
@@ -1023,6 +1037,11 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
+    if (S_ISREG(e->attr.st_mode) &&
+        (e->attr.st_size > FUSE_PERFILE_DAX_POINT)) {
+        e->attr_flags |= FUSE_ATTR_DAX;
+    }
+
     inode = lo_find(lo, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/5] fuse,virtiofs: support per-file DAX
  2021-09-23  9:25 [PATCH v5 0/5] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (5 preceding siblings ...)
  2021-09-23  9:35 ` [virtiofsd PATCH v5 0/2] virtiofsd: support per-file DAX Jeffle Xu
@ 2021-09-23 18:57 ` Vivek Goyal
  2021-10-27  3:42   ` JeffleXu
  6 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2021-09-23 18:57 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Thu, Sep 23, 2021 at 05:25:21PM +0800, Jeffle Xu wrote:
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Any comment is welcome.
> 
> [1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
> [2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")
> 
> 
> [Purpose]
> DAX may be limited in some specific situation. When the number of usable
> DAX windows is under watermark, the recalim routine will be triggered to
> reclaim some DAX windows. It may have a negative impact on the
> performance, since some processes may need to wait for DAX windows to be
> recalimed and reused then. To mitigate the performance degradation, the
> overall DAX window need to be expanded larger.
> 
> However, simply expanding the DAX window may not be a good deal in some
> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest, which is greater than the memory footprint if it uses
> guest page cache when DAX disabled. Thus it'd better disable DAX for
> those files smaller than 32KB, to reduce the demand for DAX window and
> thus avoid the unworthy memory overhead.
> 
> Per-file DAX feature is introduced to address this issue, by offering a
> finer grained control for dax to users, trying to achieve a balance
> between performance and memory overhead.
> 
> 
> [Note]
> When the per-file DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state, since dynamic switching needs to switch a_ops atomiclly. Ira
> Weiny had ever implemented a so called i_aops_sem lock [3] but
> eventually gave up since the complexity of the implementation [4][5][6][7].
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per-file DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.

If we don't cache inode (if no fd is open), will it not have negative
performance impact. When we cache inodes, we also have all the dax
mappings cached as well. So if a process opens the same file again,
it gets all the mappings already in place and it does not have
to call FUSE_SETUPMAPPING again.

Vivek

> 
> In summary, when the per-file DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later. This is also how ext4/xfs per-file DAX works.
> 
> [3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
> [4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
> [5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
> [6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> [7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/
> 
> chanegs since v4:
> - drop support for setting/clearing FS_DAX inside guest
> - and thus drop the negotiation phase during FUSE_INIT
> 
> changes since v3:
> - bug fix (patch 6): s/"IS_DAX(inode) != newdax"/"!!IS_DAX(inode) !=
>   newdax"
> - during FUSE_INIT, advertise capability for per-file DAX only when
>   mounted as "-o dax=inode" (patch 4)
> 
> changes since v2:
> - modify fuse_show_options() accordingly to make it compatible with
>   new tri-state mount option (patch 2)
> - extract FUSE protocol changes into one seperate patch (patch 3)
> - FUSE server/client need to negotiate if they support per-file DAX
>   (patch 4)
> - extract DONT_CACHE logic into patch 6/7
> 
> v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
> v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
> v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
> v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html
> 
> 
> Jeffle Xu (5):
>   fuse: add fuse_should_enable_dax() helper
>   fuse: make DAX mount option a tri-state
>   fuse: support per-file DAX
>   fuse: enable per-file DAX
>   fuse: mark inode DONT_CACHE when per-file DAX hint changes
> 
>  fs/fuse/dax.c             | 32 +++++++++++++++++++++++++++++---
>  fs/fuse/file.c            |  4 ++--
>  fs/fuse/fuse_i.h          | 19 +++++++++++++++----
>  fs/fuse/inode.c           | 15 +++++++++++----
>  fs/fuse/virtio_fs.c       | 16 ++++++++++++++--
>  include/uapi/linux/fuse.h |  7 ++++++-
>  6 files changed, 77 insertions(+), 16 deletions(-)
> 
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/5] fuse: add fuse_should_enable_dax() helper
  2021-09-23  9:25 ` [PATCH v5 1/5] fuse: add fuse_should_enable_dax() helper Jeffle Xu
@ 2021-09-23 18:58   ` Vivek Goyal
  0 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2021-09-23 18:58 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Thu, Sep 23, 2021 at 05:25:22PM +0800, Jeffle Xu wrote:
> This is in prep for following per-file DAX checking.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 281d79f8b3d3..28db96ea23e2 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1330,11 +1330,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  	.invalidatepage	= noop_invalidatepage,
>  };
>  
> -void fuse_dax_inode_init(struct inode *inode)
> +static bool fuse_should_enable_dax(struct inode *inode)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
>  
>  	if (!fc->dax)
> +		return false;
> +
> +	return true;
> +}

It probably is easier to read if we flip the logic.

{
	if (fc->dax)
		return true;
	return false;
}

> +
> +void fuse_dax_inode_init(struct inode *inode)
> +{
> +	if (!fuse_should_enable_dax(inode))
>  		return;
>  
>  	inode->i_flags |= S_DAX;
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-23  9:25 ` [PATCH v5 2/5] fuse: make DAX mount option a tri-state Jeffle Xu
@ 2021-09-23 19:02   ` Vivek Goyal
  2021-09-23 22:26     ` Dave Chinner
  0 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2021-09-23 19:02 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> operate the same which is equivalent to 'always'. To be consistemt with
> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> option is specified, the default behaviour is equal to 'inode'.

So will "-o dax=inode" be used for per file DAX where dax mode comes
from server?

I think we discussed this. It will be better to leave "-o dax=inode"
alone. It should be used when we are reading dax status from file
attr (like ext4 and xfs). 

And probably create separate option say "-o dax=server" where server
specifies which inode should use dax.

Otherwise it will be very confusing. People familiar with "-o dax=inode"
on ext4/xfs will expect file attr to work and that's not what we
are implementing, IIUC.

Vivek

>

> By the time this patch is applied, 'inode' mode is actually equal to
> 'always' mode, before the per-file DAX flag is introduced in the
> following patch.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c       |  9 +++++++--
>  fs/fuse/fuse_i.h    | 14 ++++++++++++--
>  fs/fuse/inode.c     | 10 +++++++---
>  fs/fuse/virtio_fs.c | 16 ++++++++++++++--
>  4 files changed, 40 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 28db96ea23e2..e87ddded38c7 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1282,11 +1282,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
>  	return ret;
>  }
>  
> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
> +			struct dax_device *dax_dev)
>  {
>  	struct fuse_conn_dax *fcd;
>  	int err;
>  
> +	fc->dax_mode = dax_mode;
> +
>  	if (!dax_dev)
>  		return 0;
>  
> @@ -1333,8 +1336,10 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  static bool fuse_should_enable_dax(struct inode *inode)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
> +	unsigned int dax_mode = fc->dax_mode;
>  
> -	if (!fc->dax)
> +	/* If 'dax=always/inode', fc->dax couldn't be NULL */
> +	if (dax_mode == FUSE_DAX_NEVER)
>  		return false;
>  
>  	return true;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 319596df5dc6..5abf9749923f 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -480,6 +480,12 @@ struct fuse_dev {
>  	struct list_head entry;
>  };
>  
> +enum fuse_dax_mode {
> +	FUSE_DAX_INODE,
> +	FUSE_DAX_ALWAYS,
> +	FUSE_DAX_NEVER,
> +};
> +
>  struct fuse_fs_context {
>  	int fd;
>  	struct file *file;
> @@ -497,7 +503,7 @@ struct fuse_fs_context {
>  	bool no_control:1;
>  	bool no_force_umount:1;
>  	bool legacy_opts_show:1;
> -	bool dax:1;
> +	enum fuse_dax_mode dax_mode;
>  	unsigned int max_read;
>  	unsigned int blksize;
>  	const char *subtype;
> @@ -802,6 +808,9 @@ struct fuse_conn {
>  	struct list_head devices;
>  
>  #ifdef CONFIG_FUSE_DAX
> +	/* dax mode: FUSE_DAX_* (always, never or per-file) */
> +	enum fuse_dax_mode dax_mode;
> +
>  	/* Dax specific conn data, non-NULL if DAX is enabled */
>  	struct fuse_conn_dax *dax;
>  #endif
> @@ -1255,7 +1264,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
>  ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
>  int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
>  int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
> +			struct dax_device *dax_dev);
>  void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 36cd03114b6d..b4b41683e97e 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -742,8 +742,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>  			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
>  	}
>  #ifdef CONFIG_FUSE_DAX
> -	if (fc->dax)
> -		seq_puts(m, ",dax");
> +	if (fc->dax_mode == FUSE_DAX_ALWAYS)
> +		seq_puts(m, ",dax=always");
> +	else if (fc->dax_mode == FUSE_DAX_NEVER)
> +		seq_puts(m, ",dax=never");
> +	else if (fc->dax_mode == FUSE_DAX_INODE)
> +		seq_puts(m, ",dax=inode");
>  #endif
>  
>  	return 0;
> @@ -1493,7 +1497,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  	sb->s_subtype = ctx->subtype;
>  	ctx->subtype = NULL;
>  	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
> -		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
> +		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
>  		if (err)
>  			goto err;
>  	}
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 0ad89c6629d7..58cfbaeb4a7d 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -88,12 +88,21 @@ struct virtio_fs_req_work {
>  static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>  				 struct fuse_req *req, bool in_flight);
>  
> +static const struct constant_table dax_param_enums[] = {
> +	{"inode",	FUSE_DAX_INODE },
> +	{"always",	FUSE_DAX_ALWAYS },
> +	{"never",	FUSE_DAX_NEVER },
> +	{}
> +};
> +
>  enum {
>  	OPT_DAX,
> +	OPT_DAX_ENUM,
>  };
>  
>  static const struct fs_parameter_spec virtio_fs_parameters[] = {
>  	fsparam_flag("dax", OPT_DAX),
> +	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
>  	{}
>  };
>  
> @@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
>  
>  	switch (opt) {
>  	case OPT_DAX:
> -		ctx->dax = 1;
> +		ctx->dax_mode = FUSE_DAX_ALWAYS;
> +		break;
> +	case OPT_DAX_ENUM:
> +		ctx->dax_mode = result.uint_32;
>  		break;
>  	default:
>  		return -EINVAL;
> @@ -1326,7 +1338,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
>  
>  	/* virtiofs allocates and installs its own fuse devices */
>  	ctx->fudptr = NULL;
> -	if (ctx->dax) {
> +	if (ctx->dax_mode != FUSE_DAX_NEVER) {
>  		if (!fs->dax_dev) {
>  			err = -EINVAL;
>  			pr_err("virtio-fs: dax can't be enabled as filesystem"
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-23 19:02   ` Vivek Goyal
@ 2021-09-23 22:26     ` Dave Chinner
  2021-09-24  1:02       ` Vivek Goyal
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2021-09-23 22:26 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jeffle Xu, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
> On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
> > We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> > operate the same which is equivalent to 'always'. To be consistemt with
> > ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> > option is specified, the default behaviour is equal to 'inode'.
> 
> So will "-o dax=inode" be used for per file DAX where dax mode comes
> from server?
> 
> I think we discussed this. It will be better to leave "-o dax=inode"
> alone. It should be used when we are reading dax status from file
> attr (like ext4 and xfs). 
> 
> And probably create separate option say "-o dax=server" where server
> specifies which inode should use dax.

That seems like a poor idea to me.

The server side already controls what the client side does by
controlling the inode attributes that the client side sees.  That
is, if the server is going to specify whether the client side data
access is going to use dax, then the server presents the client with
an inode that has the DAX attribute flag set on it.

In that case, turning off dax on the guest side should be
communicated to the fuse server so the server turns off the DAX flag
on the server side iff server side policy allows it.  When the guest
then purges it's local inode and reloads it from the server then it
will get an inode with the DAX flag set according to server side
policy.

Hence if the server side is configured with "dax=always" or
dax="never" semantics it means the client side inode flag state
cannot control DAX mode. That means, regardless of the client side
mount options, DAX is operating under always or never policy,
enforced by the server side by direct control of the client inode
DAX attribute flag. If dax=inode is in use on both sides, the the
server honours the requests of the client to set/clear the inode
flags and presents the inode flag according to the state the client
side has requested.

This policy state probably should be communicated to
the fuse client from the server at mount time so policy conflicts
can be be resolved at mount time (e.g. reject mount if policy
conflict occurs, default to guest overrides server or vice versa,
etc). This then means that that the client side mount policies will
default to server side policy when they set "dax=inode" but also
provide a local override for always or never local behaviour.

Hence, AFAICT, there is no need for a 'dax=server' option - this
seems to be exactly what 'dax=inode' behaviour means on the client
side - it behaves according to how the server side propagates the
DAX attribute to the client for each inode.

> Otherwise it will be very confusing. People familiar with "-o dax=inode"
> on ext4/xfs will expect file attr to work and that's not what we
> are implementing, IIUC.

The dax mount option behaviour is already confusing enough without
adding yet another weird, poorly documented, easily misunderstood
mode that behaves subtly different to existing modes.

Please try to make the virtiofs behaviour compatible with existing
modes - it's not that hard to make the client dax=inode behaviour be
controlled by the server side without any special client side mount
modes.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-23 22:26     ` Dave Chinner
@ 2021-09-24  1:02       ` Vivek Goyal
  2021-09-24  3:06         ` JeffleXu
  2021-09-27  0:21         ` Dave Chinner
  0 siblings, 2 replies; 26+ messages in thread
From: Vivek Goyal @ 2021-09-24  1:02 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jeffle Xu, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
> On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
> > On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
> > > We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> > > operate the same which is equivalent to 'always'. To be consistemt with
> > > ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> > > option is specified, the default behaviour is equal to 'inode'.
> > 
> > So will "-o dax=inode" be used for per file DAX where dax mode comes
> > from server?
> > 
> > I think we discussed this. It will be better to leave "-o dax=inode"
> > alone. It should be used when we are reading dax status from file
> > attr (like ext4 and xfs). 
> > 
> > And probably create separate option say "-o dax=server" where server
> > specifies which inode should use dax.
> 
> That seems like a poor idea to me.
> 
> The server side already controls what the client side does by
> controlling the inode attributes that the client side sees.  That
> is, if the server is going to specify whether the client side data
> access is going to use dax, then the server presents the client with
> an inode that has the DAX attribute flag set on it.

Hi Dave,

Currently in fuse/virtiofs, DAX is compltely controlled by client. Server
has no say in it. If client is mounted with "-o dax", dax is enabled on
all files otherwise dax is disabled on all files. One could think of
implementing an option on server so that server could deny mmap()
requests that come from client, but so far nobody asked for such
an option on server side.

When you say "inode that has DAX attribute flag set on it", are you
referring to "S_DAX (in inode->i_flags)" or persistent attr
"FS_XFLAG_DAX"?

As of now S_DAX on client side inode is set by fuse client whenever
client mounted filesystem with "-o dax". And I think you are assuming
that DAX attribute of inode is already coming from server. That's not
the case. In fact that seems to be the proposal. Provide capability
so that server can specify which inode should be using DAX and which
inode should not be.

> 
> In that case, turning off dax on the guest side should be
> communicated to the fuse server so the server turns off the DAX flag
> on the server side iff server side policy allows it.

Not sure what do you mean by server turns of DAX flag based on client
turning off DAX. Server does not have to do anything. If client decides
to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
to server and that's it.

> When the guest
> then purges it's local inode and reloads it from the server then it
> will get an inode with the DAX flag set according to server side
> policy.

So right now we don't have a mechanism for server to specify DAX flag.
And that's what this patch series is trying to do.

> 
> Hence if the server side is configured with "dax=always" or
> dax="never" semantics it means the client side inode flag state
> cannot control DAX mode. That means, regardless of the client side
> mount options, DAX is operating under always or never policy,

Hmm..., When you say "server side is configured with "dax=always", 
do you mean shared directory on host is mounted with "-o dax=always",
or you mean some virtiofs server specific option which enables
dax on all inodes from server side.

In general, DAX on host and DAX inside guest are completely independent.
Host filesystem could very well be mounted with dax or without dax and
that has no affect on guests's capability to be able to enable DAX or
not. 

> enforced by the server side by direct control of the client inode
> DAX attribute flag. If dax=inode is in use on both sides, the the
> server honours the requests of the client to set/clear the inode
> flags and presents the inode flag according to the state the client
> side has requested.
> 
> This policy state probably should be communicated to
> the fuse client from the server at mount time so policy conflicts
> can be be resolved at mount time (e.g. reject mount if policy
> conflict occurs, default to guest overrides server or vice versa,
> etc). This then means that that the client side mount policies will
> default to server side policy when they set "dax=inode" but also
> provide a local override for always or never local behaviour.
> 
> Hence, AFAICT, there is no need for a 'dax=server' option - this
> seems to be exactly what 'dax=inode' behaviour means on the client
> side - it behaves according to how the server side propagates the
> DAX attribute to the client for each inode.

Ok. So "-o dax=inode" in fuse will have a different meaning as opposed
to ext4/xfs. This will mean that server will pass DAX state of inode
when inode is instantiated and client should honor that. 

But what about FS_XFLAG_DAX flag then. Say host file system
does support this att and fuse/virtiofs allows querying and
setting this attribute (I don't think it is allowed now). So
will we not create a future conflict where in case of fuse/virtiofs
"-o dax=inode" means something different and it does look at
FS_XFLAG_DAX file attr.

> 
> > Otherwise it will be very confusing. People familiar with "-o dax=inode"
> > on ext4/xfs will expect file attr to work and that's not what we
> > are implementing, IIUC.
> 
> The dax mount option behaviour is already confusing enough without
> adding yet another weird, poorly documented, easily misunderstood
> mode that behaves subtly different to existing modes.
> 
> Please try to make the virtiofs behaviour compatible with existing
> modes - it's not that hard to make the client dax=inode behaviour be
> controlled by the server side without any special client side mount
> modes.

Given I want to keep the option of similar behavior for "dax=inode"
across ext4/xfs and virtiofs, I suggested "dax=server". Because I 
assumed that "dax=inode" means that dax is per inode property AND
this per inode property is specified by persistent file attr 
FS_XFLAG_DAX.

But fuse/virtiofs will not be specifying dax property of inode using
FS_XFLAG_DAX (atleast as of now). And server will set DAX property
using some bit in protocol. 

So these seem little different. If we use "dax=inode" for server
specifying DAX property of inode, then in future if client can
query/set FS_XFLAG_DAX on inode, it will be a problem. There will
be a conflict.

Use case I was imagining was, say on host, user might set FS_XFLAG_DAX
attr on relevant files and then mount virtiofs in guest with 
"-o dax=inode". Guest will query state of FS_XFLAG_DAX on inode
and enable DAX accordingly. (And server is not necessarily playing
an active role in determining which files should use DAX).

In summary, there seem to be two use cases.

A. virtiofsd/fuse-server wants do be able to come up with its own policy
   to decide which inodes should use guest.

B. guest client decides which inode use DAX based on FS_XFLAG_DAX attr
   on inode (and server does not play a role).

To be able to different between these two cases, I was suggesting using
"-o dax=inode" for B and "-o dax=server" for A.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-24  1:02       ` Vivek Goyal
@ 2021-09-24  3:06         ` JeffleXu
  2021-09-24 12:43           ` Vivek Goyal
  2021-09-27  0:21         ` Dave Chinner
  1 sibling, 1 reply; 26+ messages in thread
From: JeffleXu @ 2021-09-24  3:06 UTC (permalink / raw)
  To: Vivek Goyal, Dave Chinner
  Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi



On 9/24/21 9:02 AM, Vivek Goyal wrote:
> On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
>> On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
>>> On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
>>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
>>>> operate the same which is equivalent to 'always'. To be consistemt with
>>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
>>>> option is specified, the default behaviour is equal to 'inode'.
>>>
>>> So will "-o dax=inode" be used for per file DAX where dax mode comes
>>> from server?
>>>
>>> I think we discussed this. It will be better to leave "-o dax=inode"
>>> alone. It should be used when we are reading dax status from file
>>> attr (like ext4 and xfs). 
>>>
>>> And probably create separate option say "-o dax=server" where server
>>> specifies which inode should use dax.
>>
>> That seems like a poor idea to me.
>>
>> The server side already controls what the client side does by
>> controlling the inode attributes that the client side sees.  That
>> is, if the server is going to specify whether the client side data
>> access is going to use dax, then the server presents the client with
>> an inode that has the DAX attribute flag set on it.
> 
> Hi Dave,
> 
> Currently in fuse/virtiofs, DAX is compltely controlled by client. Server
> has no say in it. If client is mounted with "-o dax", dax is enabled on
> all files otherwise dax is disabled on all files. One could think of
> implementing an option on server so that server could deny mmap()
> requests that come from client, but so far nobody asked for such
> an option on server side.
> 
> When you say "inode that has DAX attribute flag set on it", are you
> referring to "S_DAX (in inode->i_flags)" or persistent attr
> "FS_XFLAG_DAX"?
> 
> As of now S_DAX on client side inode is set by fuse client whenever
> client mounted filesystem with "-o dax". And I think you are assuming
> that DAX attribute of inode is already coming from server. That's not
> the case. In fact that seems to be the proposal. Provide capability
> so that server can specify which inode should be using DAX and which
> inode should not be.
> 
>>
>> In that case, turning off dax on the guest side should be
>> communicated to the fuse server so the server turns off the DAX flag
>> on the server side iff server side policy allows it.
> 
> Not sure what do you mean by server turns of DAX flag based on client
> turning off DAX. Server does not have to do anything. If client decides
> to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
> to server and that's it.
> 
>> When the guest
>> then purges it's local inode and reloads it from the server then it
>> will get an inode with the DAX flag set according to server side
>> policy.
> 
> So right now we don't have a mechanism for server to specify DAX flag.
> And that's what this patch series is trying to do.
> 
>>
>> Hence if the server side is configured with "dax=always" or
>> dax="never" semantics it means the client side inode flag state
>> cannot control DAX mode. That means, regardless of the client side
>> mount options, DAX is operating under always or never policy,
> 
> Hmm..., When you say "server side is configured with "dax=always", 
> do you mean shared directory on host is mounted with "-o dax=always",
> or you mean some virtiofs server specific option which enables
> dax on all inodes from server side.
> 
> In general, DAX on host and DAX inside guest are completely independent.
> Host filesystem could very well be mounted with dax or without dax and
> that has no affect on guests's capability to be able to enable DAX or
> not. 

Hi Dave, I think you are referring to "shared directory on host is
mounted with "-o dax=always"" when you are saying "server side is
configured with "dax=always". And just as Vivek said, there's no
necessary relationship between the DAX mode in host and that in guest,
technically.


> 
>> enforced by the server side by direct control of the client inode
>> DAX attribute flag. If dax=inode is in use on both sides, the the
>> server honours the requests of the client to set/clear the inode
>> flags and presents the inode flag according to the state the client
>> side has requested.
>>
>> This policy state probably should be communicated to
>> the fuse client from the server at mount time so policy conflicts
>> can be be resolved at mount time (e.g. reject mount if policy
>> conflict occurs, default to guest overrides server or vice versa,
>> etc). This then means that that the client side mount policies will
>> default to server side policy when they set "dax=inode" but also
>> provide a local override for always or never local behaviour.
>>
>> Hence, AFAICT, there is no need for a 'dax=server' option - this
>> seems to be exactly what 'dax=inode' behaviour means on the client
>> side - it behaves according to how the server side propagates the
>> DAX attribute to the client for each inode.
> 
> Ok. So "-o dax=inode" in fuse will have a different meaning as opposed
> to ext4/xfs. This will mean that server will pass DAX state of inode
> when inode is instantiated and client should honor that. 
> 
> But what about FS_XFLAG_DAX flag then. Say host file system
> does support this att and fuse/virtiofs allows querying and
> setting this attribute (I don't think it is allowed now). So
> will we not create a future conflict where in case of fuse/virtiofs
> "-o dax=inode" means something different and it does look at
> FS_XFLAG_DAX file attr.
> 
>>
>>> Otherwise it will be very confusing. People familiar with "-o dax=inode"
>>> on ext4/xfs will expect file attr to work and that's not what we
>>> are implementing, IIUC.
>>
>> The dax mount option behaviour is already confusing enough without
>> adding yet another weird, poorly documented, easily misunderstood
>> mode that behaves subtly different to existing modes.
>>
>> Please try to make the virtiofs behaviour compatible with existing
>> modes - it's not that hard to make the client dax=inode behaviour be
>> controlled by the server side without any special client side mount
>> modes.
> 
> Given I want to keep the option of similar behavior for "dax=inode"
> across ext4/xfs and virtiofs, I suggested "dax=server". Because I 
> assumed that "dax=inode" means that dax is per inode property AND
> this per inode property is specified by persistent file attr 
> FS_XFLAG_DAX.
> 
> But fuse/virtiofs will not be specifying dax property of inode using
> FS_XFLAG_DAX (atleast as of now). And server will set DAX property
> using some bit in protocol. 
> 
> So these seem little different. If we use "dax=inode" for server
> specifying DAX property of inode, then in future if client can
> query/set FS_XFLAG_DAX on inode, it will be a problem. There will
> be a conflict.
> 
> Use case I was imagining was, say on host, user might set FS_XFLAG_DAX
> attr on relevant files and then mount virtiofs in guest with 
> "-o dax=inode". ...

Yes this will be a real using case, where users could specify which
files should be DAX enabled by setting FS_XFLAG_DAX attr **on host**. In
this case, the FS_XFLAG_DAX attr could still be conveyed by
FUSE_ATTR_DAX (introduced in this patch set) in FUSE_LOOKUP reply. Then
extra option may be added to fuse daemon, e.g., '-o policy=server' for
getting DAX attr according to fuse daemon's own policy, and '-o
policy=flag' for querying persistent FS_XFLAG_DAX attr of the host inode.

And thus 'dax=server' mount option inside guest could be omitted and
replaced by 'policy=server' option on the fuse daemon side. In this
case, the fuse kernel module could be as simple as possible, while all
other strategies could be implemented on the fuse daemon side.


> ... Guest will query state of FS_XFLAG_DAX on inode
> and enable DAX accordingly. (And server is not necessarily playing
> an active role in determining which files should use DAX).

It will be less efficient for guest to initiate another FUSE_IOCTL
request to query if FS_XFLAG_DAX attr is on the inode. I think
FUSE_ATTR_DAX flag in FUSE_LOOKUP reply shall be adequate for conveying
DAX related attr.


However it is indeed an issue when it comes to the consistency of
FS_XFLAG_DAX flag with ext4/xfs. There are following two semantics when
using per-file DAX for ext4/xfs:

1. user sets FS_XFLAG_DAX attribute on inode, for which per-file DAX
shall be enabled, and the attribute will be stored persistently on the
inode;
2. user can query the FS_XFLAG_DAX attribute to see if DAX is enabled
for specific file, when it's in 'dax=inode' mode.


For semantic 1), I'm quite doubted if there's necessity and possibility
of this using case for fuse so far, given admin could already specify
which files should be DAX enabled on the host side. If this semantic
indeed shall be implemented later, then I'm afraid 'policy=server'
option shall always be specified with fuse daemon, that is, fuse daemon
shall always query the FS_XFLAG_DAX attr of the inode on the host.

For semantic 2), could we return a fake FS_XFLAG_DAX once the server
shows that this file should be DAX enabled?

> 
> In summary, there seem to be two use cases.
> 
> A. virtiofsd/fuse-server wants do be able to come up with its own policy
>    to decide which inodes should use guest.
> 
> B. guest client decides which inode use DAX based on FS_XFLAG_DAX attr
>    on inode (and server does not play a role).
> 
> To be able to different between these two cases, I was suggesting using
> "-o dax=inode" for B and "-o dax=server" for A.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-24  3:06         ` JeffleXu
@ 2021-09-24 12:43           ` Vivek Goyal
  2021-09-26  2:06             ` JeffleXu
  0 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2021-09-24 12:43 UTC (permalink / raw)
  To: JeffleXu
  Cc: Dave Chinner, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu,
	joseph.qi

On Fri, Sep 24, 2021 at 11:06:07AM +0800, JeffleXu wrote:
> 
> 
> On 9/24/21 9:02 AM, Vivek Goyal wrote:
> > On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
> >> On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
> >>> On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
> >>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> >>>> operate the same which is equivalent to 'always'. To be consistemt with
> >>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> >>>> option is specified, the default behaviour is equal to 'inode'.
> >>>
> >>> So will "-o dax=inode" be used for per file DAX where dax mode comes
> >>> from server?
> >>>
> >>> I think we discussed this. It will be better to leave "-o dax=inode"
> >>> alone. It should be used when we are reading dax status from file
> >>> attr (like ext4 and xfs). 
> >>>
> >>> And probably create separate option say "-o dax=server" where server
> >>> specifies which inode should use dax.
> >>
> >> That seems like a poor idea to me.
> >>
> >> The server side already controls what the client side does by
> >> controlling the inode attributes that the client side sees.  That
> >> is, if the server is going to specify whether the client side data
> >> access is going to use dax, then the server presents the client with
> >> an inode that has the DAX attribute flag set on it.
> > 
> > Hi Dave,
> > 
> > Currently in fuse/virtiofs, DAX is compltely controlled by client. Server
> > has no say in it. If client is mounted with "-o dax", dax is enabled on
> > all files otherwise dax is disabled on all files. One could think of
> > implementing an option on server so that server could deny mmap()
> > requests that come from client, but so far nobody asked for such
> > an option on server side.
> > 
> > When you say "inode that has DAX attribute flag set on it", are you
> > referring to "S_DAX (in inode->i_flags)" or persistent attr
> > "FS_XFLAG_DAX"?
> > 
> > As of now S_DAX on client side inode is set by fuse client whenever
> > client mounted filesystem with "-o dax". And I think you are assuming
> > that DAX attribute of inode is already coming from server. That's not
> > the case. In fact that seems to be the proposal. Provide capability
> > so that server can specify which inode should be using DAX and which
> > inode should not be.
> > 
> >>
> >> In that case, turning off dax on the guest side should be
> >> communicated to the fuse server so the server turns off the DAX flag
> >> on the server side iff server side policy allows it.
> > 
> > Not sure what do you mean by server turns of DAX flag based on client
> > turning off DAX. Server does not have to do anything. If client decides
> > to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
> > to server and that's it.
> > 
> >> When the guest
> >> then purges it's local inode and reloads it from the server then it
> >> will get an inode with the DAX flag set according to server side
> >> policy.
> > 
> > So right now we don't have a mechanism for server to specify DAX flag.
> > And that's what this patch series is trying to do.
> > 
> >>
> >> Hence if the server side is configured with "dax=always" or
> >> dax="never" semantics it means the client side inode flag state
> >> cannot control DAX mode. That means, regardless of the client side
> >> mount options, DAX is operating under always or never policy,
> > 
> > Hmm..., When you say "server side is configured with "dax=always", 
> > do you mean shared directory on host is mounted with "-o dax=always",
> > or you mean some virtiofs server specific option which enables
> > dax on all inodes from server side.
> > 
> > In general, DAX on host and DAX inside guest are completely independent.
> > Host filesystem could very well be mounted with dax or without dax and
> > that has no affect on guests's capability to be able to enable DAX or
> > not. 
> 
> Hi Dave, I think you are referring to "shared directory on host is
> mounted with "-o dax=always"" when you are saying "server side is
> configured with "dax=always". And just as Vivek said, there's no
> necessary relationship between the DAX mode in host and that in guest,
> technically.
> 
> 
> > 
> >> enforced by the server side by direct control of the client inode
> >> DAX attribute flag. If dax=inode is in use on both sides, the the
> >> server honours the requests of the client to set/clear the inode
> >> flags and presents the inode flag according to the state the client
> >> side has requested.
> >>
> >> This policy state probably should be communicated to
> >> the fuse client from the server at mount time so policy conflicts
> >> can be be resolved at mount time (e.g. reject mount if policy
> >> conflict occurs, default to guest overrides server or vice versa,
> >> etc). This then means that that the client side mount policies will
> >> default to server side policy when they set "dax=inode" but also
> >> provide a local override for always or never local behaviour.
> >>
> >> Hence, AFAICT, there is no need for a 'dax=server' option - this
> >> seems to be exactly what 'dax=inode' behaviour means on the client
> >> side - it behaves according to how the server side propagates the
> >> DAX attribute to the client for each inode.
> > 
> > Ok. So "-o dax=inode" in fuse will have a different meaning as opposed
> > to ext4/xfs. This will mean that server will pass DAX state of inode
> > when inode is instantiated and client should honor that. 
> > 
> > But what about FS_XFLAG_DAX flag then. Say host file system
> > does support this att and fuse/virtiofs allows querying and
> > setting this attribute (I don't think it is allowed now). So
> > will we not create a future conflict where in case of fuse/virtiofs
> > "-o dax=inode" means something different and it does look at
> > FS_XFLAG_DAX file attr.
> > 
> >>
> >>> Otherwise it will be very confusing. People familiar with "-o dax=inode"
> >>> on ext4/xfs will expect file attr to work and that's not what we
> >>> are implementing, IIUC.
> >>
> >> The dax mount option behaviour is already confusing enough without
> >> adding yet another weird, poorly documented, easily misunderstood
> >> mode that behaves subtly different to existing modes.
> >>
> >> Please try to make the virtiofs behaviour compatible with existing
> >> modes - it's not that hard to make the client dax=inode behaviour be
> >> controlled by the server side without any special client side mount
> >> modes.
> > 
> > Given I want to keep the option of similar behavior for "dax=inode"
> > across ext4/xfs and virtiofs, I suggested "dax=server". Because I 
> > assumed that "dax=inode" means that dax is per inode property AND
> > this per inode property is specified by persistent file attr 
> > FS_XFLAG_DAX.
> > 
> > But fuse/virtiofs will not be specifying dax property of inode using
> > FS_XFLAG_DAX (atleast as of now). And server will set DAX property
> > using some bit in protocol. 
> > 
> > So these seem little different. If we use "dax=inode" for server
> > specifying DAX property of inode, then in future if client can
> > query/set FS_XFLAG_DAX on inode, it will be a problem. There will
> > be a conflict.
> > 
> > Use case I was imagining was, say on host, user might set FS_XFLAG_DAX
> > attr on relevant files and then mount virtiofs in guest with 
> > "-o dax=inode". ...
> 
> Yes this will be a real using case, where users could specify which
> files should be DAX enabled by setting FS_XFLAG_DAX attr **on host**. In
> this case, the FS_XFLAG_DAX attr could still be conveyed by
> FUSE_ATTR_DAX (introduced in this patch set) in FUSE_LOOKUP reply. Then
> extra option may be added to fuse daemon, e.g., '-o policy=server' for
> getting DAX attr according to fuse daemon's own policy, and '-o
> policy=flag' for querying persistent FS_XFLAG_DAX attr of the host inode.
> 
> And thus 'dax=server' mount option inside guest could be omitted and
> replaced by 'policy=server' option on the fuse daemon side. In this
> case, the fuse kernel module could be as simple as possible, while all
> other strategies could be implemented on the fuse daemon side.
> 
> 
> > ... Guest will query state of FS_XFLAG_DAX on inode
> > and enable DAX accordingly. (And server is not necessarily playing
> > an active role in determining which files should use DAX).
> 
> It will be less efficient for guest to initiate another FUSE_IOCTL
> request to query if FS_XFLAG_DAX attr is on the inode. I think
> FUSE_ATTR_DAX flag in FUSE_LOOKUP reply shall be adequate for conveying
> DAX related attr.

This is a good point. In case of fuse, for querying FS_XFLAG_DAX, there
will be extra round trip for FUSE_IOCTL message. It will be better if
server could have an option which enables checking FS_XFLAG_DAX and
respond with FUSE_ATTR_DAX flag in FUSE_LOOKUP message.

Ok, so for the case of fuse, "-o dax=inode" can just mean that enablement
of DAX is per inode. But which files should use DAX is determined by
server. And client will not have a say in this.

And server could have dax options (as you mentioned) to determine whether
it should query FS_XFLAG_DAX to determine dax status of file or it
could implement its own policy (say size based) to determine which inodes
should use DAX. We probably will have document this little deviation
in behavior in Documentation/filesystem/dax.rst

I guess this sounds reasonable. There is little deviation from ext4/xfs
but they are local filesystems to exact match might not make lot of
sense in fuse context.
> 
> 
> However it is indeed an issue when it comes to the consistency of
> FS_XFLAG_DAX flag with ext4/xfs. There are following two semantics when
> using per-file DAX for ext4/xfs:
> 
> 1. user sets FS_XFLAG_DAX attribute on inode, for which per-file DAX
> shall be enabled, and the attribute will be stored persistently on the
> inode;

So this one we can still support as long as fuse supports ioctl and
query and setting FS_XFLAG_DAX, right?

> 2. user can query the FS_XFLAG_DAX attribute to see if DAX is enabled
> for specific file, when it's in 'dax=inode' mode.

Documentation/filesystems/dax.rst says.

 1. There exists an in-kernel file access mode flag `S_DAX` that corresponds to
    the statx flag `STATX_ATTR_DAX`.  See the manpage for statx(2) for details
    about this access mode.

So my understanding is that if user wants to know if DAX is currently
being used on a file, they should call statx() and check for 
STATX_ATTR_DAX flag instead? And this will be set based on S_DAX status
of inode.

If this is correct, then it is not an issue. We should be able to
return STATX_ATTR_DAX based on S_DAX and user should be able to
figure out which files are using DAX. Just that it will not necessarily
match FS_XFLAG_DAX because server might be configured to use its own
custom policy to determine DAX status of a file/inode.

> 
> 
> For semantic 1), I'm quite doubted if there's necessity and possibility
> of this using case for fuse so far, given admin could already specify
> which files should be DAX enabled on the host side. If this semantic
> indeed shall be implemented later, then I'm afraid 'policy=server'
> option shall always be specified with fuse daemon, that is, fuse daemon
> shall always query the FS_XFLAG_DAX attr of the inode on the host.
> 
> For semantic 2), could we return a fake FS_XFLAG_DAX once the server
> shows that this file should be DAX enabled?

I guess we don't have to return fake FS_XFLAG_DAX at all. If user space
is calling statx(), we should just set STATX_ATTR_DAX if file/inode is
using DAX.

Thanks
Vivek

> 
> > 
> > In summary, there seem to be two use cases.
> > 
> > A. virtiofsd/fuse-server wants do be able to come up with its own policy
> >    to decide which inodes should use guest.
> > 
> > B. guest client decides which inode use DAX based on FS_XFLAG_DAX attr
> >    on inode (and server does not play a role).
> > 
> > To be able to different between these two cases, I was suggesting using
> > "-o dax=inode" for B and "-o dax=server" for A.
> 
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-24 12:43           ` Vivek Goyal
@ 2021-09-26  2:06             ` JeffleXu
  0 siblings, 0 replies; 26+ messages in thread
From: JeffleXu @ 2021-09-26  2:06 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Dave Chinner, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu,
	joseph.qi



On 9/24/21 8:43 PM, Vivek Goyal wrote:
> On Fri, Sep 24, 2021 at 11:06:07AM +0800, JeffleXu wrote:
>>
>>
>> On 9/24/21 9:02 AM, Vivek Goyal wrote:
>>> On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
>>>> On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
>>>>> On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
>>>>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
>>>>>> operate the same which is equivalent to 'always'. To be consistemt with
>>>>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
>>>>>> option is specified, the default behaviour is equal to 'inode'.
>>>>>
>>>>> So will "-o dax=inode" be used for per file DAX where dax mode comes
>>>>> from server?
>>>>>
>>>>> I think we discussed this. It will be better to leave "-o dax=inode"
>>>>> alone. It should be used when we are reading dax status from file
>>>>> attr (like ext4 and xfs). 
>>>>>
>>>>> And probably create separate option say "-o dax=server" where server
>>>>> specifies which inode should use dax.
>>>>
>>>> That seems like a poor idea to me.
>>>>
>>>> The server side already controls what the client side does by
>>>> controlling the inode attributes that the client side sees.  That
>>>> is, if the server is going to specify whether the client side data
>>>> access is going to use dax, then the server presents the client with
>>>> an inode that has the DAX attribute flag set on it.
>>>
>>> Hi Dave,
>>>
>>> Currently in fuse/virtiofs, DAX is compltely controlled by client. Server
>>> has no say in it. If client is mounted with "-o dax", dax is enabled on
>>> all files otherwise dax is disabled on all files. One could think of
>>> implementing an option on server so that server could deny mmap()
>>> requests that come from client, but so far nobody asked for such
>>> an option on server side.
>>>
>>> When you say "inode that has DAX attribute flag set on it", are you
>>> referring to "S_DAX (in inode->i_flags)" or persistent attr
>>> "FS_XFLAG_DAX"?
>>>
>>> As of now S_DAX on client side inode is set by fuse client whenever
>>> client mounted filesystem with "-o dax". And I think you are assuming
>>> that DAX attribute of inode is already coming from server. That's not
>>> the case. In fact that seems to be the proposal. Provide capability
>>> so that server can specify which inode should be using DAX and which
>>> inode should not be.
>>>
>>>>
>>>> In that case, turning off dax on the guest side should be
>>>> communicated to the fuse server so the server turns off the DAX flag
>>>> on the server side iff server side policy allows it.
>>>
>>> Not sure what do you mean by server turns of DAX flag based on client
>>> turning off DAX. Server does not have to do anything. If client decides
>>> to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
>>> to server and that's it.
>>>
>>>> When the guest
>>>> then purges it's local inode and reloads it from the server then it
>>>> will get an inode with the DAX flag set according to server side
>>>> policy.
>>>
>>> So right now we don't have a mechanism for server to specify DAX flag.
>>> And that's what this patch series is trying to do.
>>>
>>>>
>>>> Hence if the server side is configured with "dax=always" or
>>>> dax="never" semantics it means the client side inode flag state
>>>> cannot control DAX mode. That means, regardless of the client side
>>>> mount options, DAX is operating under always or never policy,
>>>
>>> Hmm..., When you say "server side is configured with "dax=always", 
>>> do you mean shared directory on host is mounted with "-o dax=always",
>>> or you mean some virtiofs server specific option which enables
>>> dax on all inodes from server side.
>>>
>>> In general, DAX on host and DAX inside guest are completely independent.
>>> Host filesystem could very well be mounted with dax or without dax and
>>> that has no affect on guests's capability to be able to enable DAX or
>>> not. 
>>
>> Hi Dave, I think you are referring to "shared directory on host is
>> mounted with "-o dax=always"" when you are saying "server side is
>> configured with "dax=always". And just as Vivek said, there's no
>> necessary relationship between the DAX mode in host and that in guest,
>> technically.
>>
>>
>>>
>>>> enforced by the server side by direct control of the client inode
>>>> DAX attribute flag. If dax=inode is in use on both sides, the the
>>>> server honours the requests of the client to set/clear the inode
>>>> flags and presents the inode flag according to the state the client
>>>> side has requested.
>>>>
>>>> This policy state probably should be communicated to
>>>> the fuse client from the server at mount time so policy conflicts
>>>> can be be resolved at mount time (e.g. reject mount if policy
>>>> conflict occurs, default to guest overrides server or vice versa,
>>>> etc). This then means that that the client side mount policies will
>>>> default to server side policy when they set "dax=inode" but also
>>>> provide a local override for always or never local behaviour.
>>>>
>>>> Hence, AFAICT, there is no need for a 'dax=server' option - this
>>>> seems to be exactly what 'dax=inode' behaviour means on the client
>>>> side - it behaves according to how the server side propagates the
>>>> DAX attribute to the client for each inode.
>>>
>>> Ok. So "-o dax=inode" in fuse will have a different meaning as opposed
>>> to ext4/xfs. This will mean that server will pass DAX state of inode
>>> when inode is instantiated and client should honor that. 
>>>
>>> But what about FS_XFLAG_DAX flag then. Say host file system
>>> does support this att and fuse/virtiofs allows querying and
>>> setting this attribute (I don't think it is allowed now). So
>>> will we not create a future conflict where in case of fuse/virtiofs
>>> "-o dax=inode" means something different and it does look at
>>> FS_XFLAG_DAX file attr.
>>>
>>>>
>>>>> Otherwise it will be very confusing. People familiar with "-o dax=inode"
>>>>> on ext4/xfs will expect file attr to work and that's not what we
>>>>> are implementing, IIUC.
>>>>
>>>> The dax mount option behaviour is already confusing enough without
>>>> adding yet another weird, poorly documented, easily misunderstood
>>>> mode that behaves subtly different to existing modes.
>>>>
>>>> Please try to make the virtiofs behaviour compatible with existing
>>>> modes - it's not that hard to make the client dax=inode behaviour be
>>>> controlled by the server side without any special client side mount
>>>> modes.
>>>
>>> Given I want to keep the option of similar behavior for "dax=inode"
>>> across ext4/xfs and virtiofs, I suggested "dax=server". Because I 
>>> assumed that "dax=inode" means that dax is per inode property AND
>>> this per inode property is specified by persistent file attr 
>>> FS_XFLAG_DAX.
>>>
>>> But fuse/virtiofs will not be specifying dax property of inode using
>>> FS_XFLAG_DAX (atleast as of now). And server will set DAX property
>>> using some bit in protocol. 
>>>
>>> So these seem little different. If we use "dax=inode" for server
>>> specifying DAX property of inode, then in future if client can
>>> query/set FS_XFLAG_DAX on inode, it will be a problem. There will
>>> be a conflict.
>>>
>>> Use case I was imagining was, say on host, user might set FS_XFLAG_DAX
>>> attr on relevant files and then mount virtiofs in guest with 
>>> "-o dax=inode". ...
>>
>> Yes this will be a real using case, where users could specify which
>> files should be DAX enabled by setting FS_XFLAG_DAX attr **on host**. In
>> this case, the FS_XFLAG_DAX attr could still be conveyed by
>> FUSE_ATTR_DAX (introduced in this patch set) in FUSE_LOOKUP reply. Then
>> extra option may be added to fuse daemon, e.g., '-o policy=server' for
>> getting DAX attr according to fuse daemon's own policy, and '-o
>> policy=flag' for querying persistent FS_XFLAG_DAX attr of the host inode.
>>
>> And thus 'dax=server' mount option inside guest could be omitted and
>> replaced by 'policy=server' option on the fuse daemon side. In this
>> case, the fuse kernel module could be as simple as possible, while all
>> other strategies could be implemented on the fuse daemon side.
>>
>>
>>> ... Guest will query state of FS_XFLAG_DAX on inode
>>> and enable DAX accordingly. (And server is not necessarily playing
>>> an active role in determining which files should use DAX).
>>
>> It will be less efficient for guest to initiate another FUSE_IOCTL
>> request to query if FS_XFLAG_DAX attr is on the inode. I think
>> FUSE_ATTR_DAX flag in FUSE_LOOKUP reply shall be adequate for conveying
>> DAX related attr.
> 
> This is a good point. In case of fuse, for querying FS_XFLAG_DAX, there
> will be extra round trip for FUSE_IOCTL message. It will be better if
> server could have an option which enables checking FS_XFLAG_DAX and
> respond with FUSE_ATTR_DAX flag in FUSE_LOOKUP message.
> 
> Ok, so for the case of fuse, "-o dax=inode" can just mean that enablement
> of DAX is per inode. But which files should use DAX is determined by
> server. And client will not have a say in this.
> 
> And server could have dax options (as you mentioned) to determine whether
> it should query FS_XFLAG_DAX to determine dax status of file or it
> could implement its own policy (say size based) to determine which inodes
> should use DAX. We probably will have document this little deviation
> in behavior in Documentation/filesystem/dax.rst

Agreed.

> 
> I guess this sounds reasonable. There is little deviation from ext4/xfs
> but they are local filesystems to exact match might not make lot of
> sense in fuse context.
>>
>>
>> However it is indeed an issue when it comes to the consistency of
>> FS_XFLAG_DAX flag with ext4/xfs. There are following two semantics when
>> using per-file DAX for ext4/xfs:
>>
>> 1. user sets FS_XFLAG_DAX attribute on inode, for which per-file DAX
>> shall be enabled, and the attribute will be stored persistently on the
>> inode;
> 
> So this one we can still support as long as fuse supports ioctl and
> query and setting FS_XFLAG_DAX, right?

Yes, if it is really needed later.

> 
>> 2. user can query the FS_XFLAG_DAX attribute to see if DAX is enabled
>> for specific file, when it's in 'dax=inode' mode.
> 
> Documentation/filesystems/dax.rst says.
> 
>  1. There exists an in-kernel file access mode flag `S_DAX` that corresponds to
>     the statx flag `STATX_ATTR_DAX`.  See the manpage for statx(2) for details
>     about this access mode.
> 
> So my understanding is that if user wants to know if DAX is currently
> being used on a file, they should call statx() and check for 
> STATX_ATTR_DAX flag instead? And this will be set based on S_DAX status
> of inode.
> 
> If this is correct, then it is not an issue. We should be able to
> return STATX_ATTR_DAX based on S_DAX and user should be able to
> figure out which files are using DAX. Just that it will not necessarily
> match FS_XFLAG_DAX because server might be configured to use its own
> custom policy to determine DAX status of a file/inode.

Good catch! You are right. Then semantic 2) can be matched without any
modification.

> 
>>
>>
>> For semantic 1), I'm quite doubted if there's necessity and possibility
>> of this using case for fuse so far, given admin could already specify
>> which files should be DAX enabled on the host side. If this semantic
>> indeed shall be implemented later, then I'm afraid 'policy=server'
>> option shall always be specified with fuse daemon, that is, fuse daemon
>> shall always query the FS_XFLAG_DAX attr of the inode on the host.
>>
>> For semantic 2), could we return a fake FS_XFLAG_DAX once the server
>> shows that this file should be DAX enabled?
> 
> I guess we don't have to return fake FS_XFLAG_DAX at all. If user space
> is calling statx(), we should just set STATX_ATTR_DAX if file/inode is
> using DAX.
> 

Yes.

Thanks! I will send a new version later based on these discussions.



-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-24  1:02       ` Vivek Goyal
  2021-09-24  3:06         ` JeffleXu
@ 2021-09-27  0:21         ` Dave Chinner
  2021-09-27  2:28           ` JeffleXu
  2021-09-27 15:32           ` Vivek Goyal
  1 sibling, 2 replies; 26+ messages in thread
From: Dave Chinner @ 2021-09-27  0:21 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jeffle Xu, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Thu, Sep 23, 2021 at 09:02:26PM -0400, Vivek Goyal wrote:
> On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
> > On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
> > > On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
> > > > We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> > > > operate the same which is equivalent to 'always'. To be consistemt with
> > > > ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> > > > option is specified, the default behaviour is equal to 'inode'.
> > > 
> > > So will "-o dax=inode" be used for per file DAX where dax mode comes
> > > from server?
> > > 
> > > I think we discussed this. It will be better to leave "-o dax=inode"
> > > alone. It should be used when we are reading dax status from file
> > > attr (like ext4 and xfs). 
> > > 
> > > And probably create separate option say "-o dax=server" where server
> > > specifies which inode should use dax.
> > 
> > That seems like a poor idea to me.
> > 
> > The server side already controls what the client side does by
> > controlling the inode attributes that the client side sees.  That
> > is, if the server is going to specify whether the client side data
> > access is going to use dax, then the server presents the client with
> > an inode that has the DAX attribute flag set on it.
> 
> Hi Dave,
> 
> Currently in fuse/virtiofs, DAX is compltely controlled by client. Server
> has no say in it. If client is mounted with "-o dax", dax is enabled on
> all files otherwise dax is disabled on all files. One could think of
> implementing an option on server so that server could deny mmap()
> requests that come from client, but so far nobody asked for such
> an option on server side.

Can you please refer to the new options "always|never|inode" in
disucssions, because the "-o dax" option is deprecated and we really
need to center discussions around the new options, not the
deprecated one.

> When you say "inode that has DAX attribute flag set on it", are you
> referring to "S_DAX (in inode->i_flags)" or persistent attr
> "FS_XFLAG_DAX"?

Neither. The S_DAX flags is the VFS struct inode state that tells
the kernel what to do with that inode. The FS_XFLAG_DAX is the
userspace API status/control flag that represents the current state
of the persistent per-inode DAX control attribute.

What I'm talking about the persistent attribute that is present
on stable storage. e.g. in the XFS on-disk inode the flag is
XFS_DIFLAG2_DAX which is never seen outside low level XFS code. In
the case of FUSE, this is the persistent server-side DAX attribute
state. The client gets this information from the server via an
inode attribute defined in the FUSE protocol. The client translates
that network protocol attribute to S_DAX status on FS inode
instantiation, and to/from FS_XFLAG_DAX for user control under
dax=inode.

In the case that the user changes FS_XFLAG_DAX, the FUSE client
needs to communicate that attribute change to the server, where the
server then changes the persistent state of the on-disk inode so
that the next time the client requests that inode, it gets the state
it previously set. Unless, of course, there are server side policy
overrides (never/always).

IOWs, dax=inode on the client side is essentially "follow server
side policy" because the server maintains the persistent DAX flag
state in the server side inode...

> As of now S_DAX on client side inode is set by fuse client whenever
> client mounted filesystem with "-o dax". And I think you are assuming

Of course. Similarly, if the client uses dax=never, the S_DAX is
never set on the VFS inode. But if dax=inode is set, where does the
DAX attribute state that determines S_DAX comes from .....?

> that DAX attribute of inode is already coming from server. That's not
> the case. In fact that seems to be the proposal. Provide capability
> so that server can specify which inode should be using DAX and which
> inode should not be.

Yup, that's exactly what I'm talking about - this is "dax=inode"
behaviour, because the client follows what the server tells it
in the inode attributes that arrive over the wire.

> > In that case, turning off dax on the guest side should be
> > communicated to the fuse server so the server turns off the DAX flag
> > on the server side iff server side policy allows it.
> 
> Not sure what do you mean by server turns of DAX flag based on client
> turning off DAX. Server does not have to do anything. If client decides
> to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
> to server and that's it.

Where does the client get it's per-inode DAX policy from if
dax=inode is, like other DAX filesystems, the default behaviour?

Where is the persistent storage of that per-inode attribute kept?

> > When the guest
> > then purges it's local inode and reloads it from the server then it
> > will get an inode with the DAX flag set according to server side
> > policy.
> 
> So right now we don't have a mechanism for server to specify DAX flag.
> And that's what this patch series is trying to do.
> 
> > 
> > Hence if the server side is configured with "dax=always" or
> > dax="never" semantics it means the client side inode flag state
> > cannot control DAX mode. That means, regardless of the client side
> > mount options, DAX is operating under always or never policy,
> 
> Hmm..., When you say "server side is configured with "dax=always", 
> do you mean shared directory on host is mounted with "-o dax=always",
> or you mean some virtiofs server specific option which enables
> dax on all inodes from server side.

I don't care. That's a server side implementation detail. You can
keep it in a private xattr for all I care.

> In general, DAX on host and DAX inside guest are completely independent.
> Host filesystem could very well be mounted with dax or without dax and
> that has no affect on guests's capability to be able to enable DAX or
> not. 

If you have both the host and guest accessing the same files with
different modes, then you have a data coherency problem that is
guaranteed to corrupt data.

> > seems to be exactly what 'dax=inode' behaviour means on the client
> > side - it behaves according to how the server side propagates the
> > DAX attribute to the client for each inode.
> 
> Ok. So "-o dax=inode" in fuse will have a different meaning as opposed
> to ext4/xfs. This will mean that server will pass DAX state of inode
> when inode is instantiated and client should honor that. 

No, that's exactly what dax=inode means: DAX behaviour is
per-inode state that users must probe via statx() to determine if
dax is active or not.

> But what about FS_XFLAG_DAX flag then. Say host file system
> does support this att and fuse/virtiofs allows querying and
> setting this attribute (I don't think it is allowed now).

FS_XFLAG_DAX is the ioctl-based control API for the client side.
It's not a persistent flag.

> So will we not create a future conflict where in case of
> fuse/virtiofs "-o dax=inode" means something different and it does
> look at FS_XFLAG_DAX file attr.

Like many, I suspect you might be mis-understanding the DAX API.

There are 4 parts to it:

- persistent filesystem inode state flag
- ioctls to manipulate persistent inode state flag (FS_XFLAG_DAX)
- mount options to override persistent inode state flag
- VFS inode state indicating DAX is active (S_DAX, STATX_ATTR_DAX)

You seem to be conflating the user API FS_XFLAG_DAX flag with
whatever the filesystem uses to physically storage that state. They
are not the same thing. FS_XFLAG_DAX also has no correlation with
the mount options - we can change the on-disk flag state regardless
of the mount option in use. We can even change the on-disk state
flag on *kernels that don't support DAX*.  Changing the on-disk
attribute *does not change S_DAX*.

That's because the on-disk persistent state is a property of the
filesystem's on-disk format, not a property of the kernel that is
running on that machine. The persistent state flag is managed as a
completely independent filesystem inode attribute, but it only
affects behaviour when the dax=inode mode has been selected.

Control of the behaviour by this persistent inode flag is what
dax=inode means. It does not define how that attribute flag is
managed, it just defines the policy that an the inode's behaviour
will be determined by it's dax attribute state.

OTOH, S_DAX is used by the kernel to enable DAX access. It is the
_mechanism_, not the policy. We control S_DAX by mount option and/or
the filesystems persistent inode state flag, and behaviour is
determined by these policies at VFS inode instantiation time. Change
the policy, turf the inode out of cache and re-instantiate it, and
S_DAX for that inode is recalculated from the new policy.

For FUSE, the server provides the persistent storage and nothing
else. The ioctls to manipulate dax state are client side ioctls, as
are the mount options and the S_DAX vfs inode state. Hence for
server side management of the per-inode S_DAX state, the FUSE
protocol needs to be able to pass the per-inode persistent DAX
attribute state to the client and the client needs to honor
that attribute from the server.

How the FUSE server stores this persistent attribute is up to the
server implementation. If the server wants FUSE access to be
independent of host access, it can't use the persistent inode flags
in the host filesystem - it will need to use it's own xattrs. If the
server wants host access to be coherent with the guest, then it will
need to implement that in a different way.

But the FUSE client doesn't care about any of this server side
policy mumbo-jumbo - it just needs to do what the server sends
to it in the DAX inode attribute. And that's exactly what dax=inode
means...

> In summary, there seem to be two use cases.
> 
> A. virtiofsd/fuse-server wants do be able to come up with its own
> policy to decide which inodes should use guest.
> 
> B. guest client decides which inode use DAX based on FS_XFLAG_DAX
> attr on inode (and server does not play a role).
> 
> To be able to different between these two cases, I was suggesting
> using "-o dax=inode" for B and "-o dax=server" for A.

dax=inode covrees both these cases.  All the server side needs to do
is set the inode attribute according to policy, and all the client
needs to do is obey the server side per-inode attribute. IOWs,
client side using "dax=inode" means the server side controls the DAX
behaviour via the FUSE DAX attribute. 

If the server wants the client to always use DAX, then it always
sets the FUSE inode attribute. If the server says "never use DAX",
then it never sets the FUSE inode attribute.  If the server doesn't
want the client to control policy, then it just rejects attempts to
change the per-inode persistent DAX flag via client side
ioctl(FS_XFLAG_DAX) calls. Hence we have use case A completely
covered.

For case B, which is true dax=inode behaviour at the client, then
the server side honours the client side requests to change the
persistent FUSE DAX inode attribute via client side FS_XFLAG_DAX
ioctls.

See? At the client side, there is absolutely no difference between
server side policy control and true dax=inode support. The client is
completely unaware of server side policies and that's how the client
side should be implemented. The applications have to be prepared for
policy to change dynamically with either dax=server or dax=inode, so
there's no difference to applications running in the guest either.

Hence I just don't see the justification for a special DAX mode from
an architectural POV. It's no more work to implement per-inode DAX
properly form the start and we don't end up with a special, subtly
different mode.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-27  0:21         ` Dave Chinner
@ 2021-09-27  2:28           ` JeffleXu
  2021-09-28  3:44             ` Dave Chinner
  2021-09-27 15:32           ` Vivek Goyal
  1 sibling, 1 reply; 26+ messages in thread
From: JeffleXu @ 2021-09-27  2:28 UTC (permalink / raw)
  To: Dave Chinner, Vivek Goyal
  Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi



On 9/27/21 8:21 AM, Dave Chinner wrote:
> On Thu, Sep 23, 2021 at 09:02:26PM -0400, Vivek Goyal wrote:
>> On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
>>> On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
>>>> On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
>>>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
>>>>> operate the same which is equivalent to 'always'. To be consistemt with
>>>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
>>>>> option is specified, the default behaviour is equal to 'inode'.
>>>>
>>>> So will "-o dax=inode" be used for per file DAX where dax mode comes
>>>> from server?
>>>>
>>>> I think we discussed this. It will be better to leave "-o dax=inode"
>>>> alone. It should be used when we are reading dax status from file
>>>> attr (like ext4 and xfs). 
>>>>
>>>> And probably create separate option say "-o dax=server" where server
>>>> specifies which inode should use dax.
>>>
>>> That seems like a poor idea to me.
>>>
>>> The server side already controls what the client side does by
>>> controlling the inode attributes that the client side sees.  That
>>> is, if the server is going to specify whether the client side data
>>> access is going to use dax, then the server presents the client with
>>> an inode that has the DAX attribute flag set on it.
>>
>> Hi Dave,
>>
>> Currently in fuse/virtiofs, DAX is compltely controlled by client. Server
>> has no say in it. If client is mounted with "-o dax", dax is enabled on
>> all files otherwise dax is disabled on all files. One could think of
>> implementing an option on server so that server could deny mmap()
>> requests that come from client, but so far nobody asked for such
>> an option on server side.
> 
> Can you please refer to the new options "always|never|inode" in
> disucssions, because the "-o dax" option is deprecated and we really
> need to center discussions around the new options, not the
> deprecated one.

Consistent with ext4/xfs, "-o dax" is equal to "-o dax=always", and DAX
is always enabled/disabled with "-o dax=always|never" regardless the per
inode dax flag.

With "-o dax=inode", the client gets the per inode dax flag from FUSE
protocol (during aops->lookup()) fed by server, deciding if DAX shall be
enabled or not for this specific file.

> 
>> When you say "inode that has DAX attribute flag set on it", are you
>> referring to "S_DAX (in inode->i_flags)" or persistent attr
>> "FS_XFLAG_DAX"?
> 
> Neither. The S_DAX flags is the VFS struct inode state that tells
> the kernel what to do with that inode. The FS_XFLAG_DAX is the
> userspace API status/control flag that represents the current state
> of the persistent per-inode DAX control attribute.
> 
> What I'm talking about the persistent attribute that is present
> on stable storage. e.g. in the XFS on-disk inode the flag is
> XFS_DIFLAG2_DAX which is never seen outside low level XFS code. In
> the case of FUSE, this is the persistent server-side DAX attribute
> state. The client gets this information from the server via an
> inode attribute defined in the FUSE protocol. The client translates
> that network protocol attribute to S_DAX status on FS inode
> instantiation, and to/from FS_XFLAG_DAX for user control under
> dax=inode.

Got it.

> 
> In the case that the user changes FS_XFLAG_DAX, the FUSE client
> needs to communicate that attribute change to the server, where the
> server then changes the persistent state of the on-disk inode so
> that the next time the client requests that inode, it gets the state
> it previously set. Unless, of course, there are server side policy
> overrides (never/always).

One thing I'm concerned with is that, is the following behavior
consistent with the semantics of per-file DAX in ext4/xfs?

Client changes FS_XFLAG_DAX, this change is communicated to server and
then server also returns **success**. Then client finds that this file
is not DAX enabled, since server doesn't honor the previously set state.

IOWs, shall server always honor the persistent per-inode attribute of
host file (if the change of FS_XFLAG_DAX inside guest is stored as
persistent per-inode attribute on host file)?

> 
> IOWs, dax=inode on the client side is essentially "follow server
> side policy" because the server maintains the persistent DAX flag
> state in the server side inode...
> 
>> As of now S_DAX on client side inode is set by fuse client whenever
>> client mounted filesystem with "-o dax". And I think you are assuming
> 
> Of course. Similarly, if the client uses dax=never, the S_DAX is
> never set on the VFS inode. But if dax=inode is set, where does the
> DAX attribute state that determines S_DAX comes from .....?
> 
>> that DAX attribute of inode is already coming from server. That's not
>> the case. In fact that seems to be the proposal. Provide capability
>> so that server can specify which inode should be using DAX and which
>> inode should not be.
> 
> Yup, that's exactly what I'm talking about - this is "dax=inode"
> behaviour, because the client follows what the server tells it
> in the inode attributes that arrive over the wire.
> 
>>> In that case, turning off dax on the guest side should be
>>> communicated to the fuse server so the server turns off the DAX flag
>>> on the server side iff server side policy allows it.
>>
>> Not sure what do you mean by server turns of DAX flag based on client
>> turning off DAX. Server does not have to do anything. If client decides
>> to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
>> to server and that's it.
> 
> Where does the client get it's per-inode DAX policy from if
> dax=inode is, like other DAX filesystems, the default behaviour?
> 
> Where is the persistent storage of that per-inode attribute kept?

In the latest patch set, it is not supported yet to change FS_XFLAG_DAX
(and thus setting/clearing persistent per-inode attribute) inside guest,
since this scenario is not urgently needed as the real using case.
Currently the per-inode dax attribute is completely fed from server
thourgh FUSE protocol, e.g. server could set/clear the per-inode dax
attribute depending on the file size.

The previous path set indeed had ever supported changing FS_XFLAG_DAX
and accordingly setting/clearing persistent per-inode attribute inside
guest. For "passthrough" type virtiofsd, the persistent per-inode
attribute is stored as XFS_DIFLAG2_DAX/EXT4_DAX_FL on host file
directly, since this is what "passthrough" means.

> 
>>> When the guest
>>> then purges it's local inode and reloads it from the server then it
>>> will get an inode with the DAX flag set according to server side
>>> policy.
>>
>> So right now we don't have a mechanism for server to specify DAX flag.
>> And that's what this patch series is trying to do.
>>
>>>
>>> Hence if the server side is configured with "dax=always" or
>>> dax="never" semantics it means the client side inode flag state
>>> cannot control DAX mode. That means, regardless of the client side
>>> mount options, DAX is operating under always or never policy,
>>
>> Hmm..., When you say "server side is configured with "dax=always", 
>> do you mean shared directory on host is mounted with "-o dax=always",
>> or you mean some virtiofs server specific option which enables
>> dax on all inodes from server side.
> 
> I don't care. That's a server side implementation detail. You can
> keep it in a private xattr for all I care.
> 
>> In general, DAX on host and DAX inside guest are completely independent.
>> Host filesystem could very well be mounted with dax or without dax and
>> that has no affect on guests's capability to be able to enable DAX or
>> not. 
> 
> If you have both the host and guest accessing the same files with
> different modes, then you have a data coherency problem that is
> guaranteed to corrupt data.
> 
>>> seems to be exactly what 'dax=inode' behaviour means on the client
>>> side - it behaves according to how the server side propagates the
>>> DAX attribute to the client for each inode.
>>
>> Ok. So "-o dax=inode" in fuse will have a different meaning as opposed
>> to ext4/xfs. This will mean that server will pass DAX state of inode
>> when inode is instantiated and client should honor that. 
> 
> No, that's exactly what dax=inode means: DAX behaviour is
> per-inode state that users must probe via statx() to determine if
> dax is active or not.
> 
>> But what about FS_XFLAG_DAX flag then. Say host file system
>> does support this att and fuse/virtiofs allows querying and
>> setting this attribute (I don't think it is allowed now).
> 
> FS_XFLAG_DAX is the ioctl-based control API for the client side.
> It's not a persistent flag.
> 
>> So will we not create a future conflict where in case of
>> fuse/virtiofs "-o dax=inode" means something different and it does
>> look at FS_XFLAG_DAX file attr.
> 
> Like many, I suspect you might be mis-understanding the DAX API.
> 
> There are 4 parts to it:
> 
> - persistent filesystem inode state flag
> - ioctls to manipulate persistent inode state flag (FS_XFLAG_DAX)
> - mount options to override persistent inode state flag
> - VFS inode state indicating DAX is active (S_DAX, STATX_ATTR_DAX)
> 
> You seem to be conflating the user API FS_XFLAG_DAX flag with
> whatever the filesystem uses to physically storage that state. They
> are not the same thing. FS_XFLAG_DAX also has no correlation with
> the mount options - we can change the on-disk flag state regardless
> of the mount option in use. We can even change the on-disk state
> flag on *kernels that don't support DAX*.  Changing the on-disk
> attribute *does not change S_DAX*.
> 
> That's because the on-disk persistent state is a property of the
> filesystem's on-disk format, not a property of the kernel that is
> running on that machine. The persistent state flag is managed as a
> completely independent filesystem inode attribute, but it only
> affects behaviour when the dax=inode mode has been selected.
> 
> Control of the behaviour by this persistent inode flag is what
> dax=inode means. It does not define how that attribute flag is
> managed, it just defines the policy that an the inode's behaviour
> will be determined by it's dax attribute state.
> 
> OTOH, S_DAX is used by the kernel to enable DAX access. It is the
> _mechanism_, not the policy. We control S_DAX by mount option and/or
> the filesystems persistent inode state flag, and behaviour is
> determined by these policies at VFS inode instantiation time. Change
> the policy, turf the inode out of cache and re-instantiate it, and
> S_DAX for that inode is recalculated from the new policy.
> 
> For FUSE, the server provides the persistent storage and nothing
> else. The ioctls to manipulate dax state are client side ioctls, as
> are the mount options and the S_DAX vfs inode state. Hence for
> server side management of the per-inode S_DAX state, the FUSE
> protocol needs to be able to pass the per-inode persistent DAX
> attribute state to the client and the client needs to honor
> that attribute from the server.
> 
> How the FUSE server stores this persistent attribute is up to the
> server implementation. If the server wants FUSE access to be
> independent of host access, it can't use the persistent inode flags
> in the host filesystem - it will need to use it's own xattrs. If the
> server wants host access to be coherent with the guest, then it will
> need to implement that in a different way.
> 
> But the FUSE client doesn't care about any of this server side
> policy mumbo-jumbo - it just needs to do what the server sends
> to it in the DAX inode attribute. And that's exactly what dax=inode
> means...
> 
>> In summary, there seem to be two use cases.
>>
>> A. virtiofsd/fuse-server wants do be able to come up with its own
>> policy to decide which inodes should use guest.
>>
>> B. guest client decides which inode use DAX based on FS_XFLAG_DAX
>> attr on inode (and server does not play a role).
>>
>> To be able to different between these two cases, I was suggesting
>> using "-o dax=inode" for B and "-o dax=server" for A.
> 
> dax=inode covrees both these cases.  All the server side needs to do
> is set the inode attribute according to policy, and all the client
> needs to do is obey the server side per-inode attribute. IOWs,
> client side using "dax=inode" means the server side controls the DAX
> behaviour via the FUSE DAX attribute. 
> 
> If the server wants the client to always use DAX, then it always
> sets the FUSE inode attribute. If the server says "never use DAX",
> then it never sets the FUSE inode attribute.  If the server doesn't
> want the client to control policy, then it just rejects attempts to
> change the per-inode persistent DAX flag via client side
> ioctl(FS_XFLAG_DAX) calls. Hence we have use case A completely
> covered.
> 
> For case B, which is true dax=inode behaviour at the client, then
> the server side honours the client side requests to change the
> persistent FUSE DAX inode attribute via client side FS_XFLAG_DAX
> ioctls.
> 
> See? At the client side, there is absolutely no difference between
> server side policy control and true dax=inode support. The client is
> completely unaware of server side policies and that's how the client
> side should be implemented. The applications have to be prepared for
> policy to change dynamically with either dax=server or dax=inode, so
> there's no difference to applications running in the guest either.
> 
> Hence I just don't see the justification for a special DAX mode from
> an architectural POV. It's no more work to implement per-inode DAX
> properly form the start and we don't end up with a special, subtly
> different mode.
> 
> Cheers,
> 
> Dave.
> 

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-27  0:21         ` Dave Chinner
  2021-09-27  2:28           ` JeffleXu
@ 2021-09-27 15:32           ` Vivek Goyal
  2021-09-28  3:23             ` Dave Chinner
  1 sibling, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2021-09-27 15:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jeffle Xu, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Sep 27, 2021 at 10:21:48AM +1000, Dave Chinner wrote:
> On Thu, Sep 23, 2021 at 09:02:26PM -0400, Vivek Goyal wrote:
> > On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
> > > On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
> > > > On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
> > > > > We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> > > > > operate the same which is equivalent to 'always'. To be consistemt with
> > > > > ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> > > > > option is specified, the default behaviour is equal to 'inode'.
> > > > 
> > > > So will "-o dax=inode" be used for per file DAX where dax mode comes
> > > > from server?
> > > > 
> > > > I think we discussed this. It will be better to leave "-o dax=inode"
> > > > alone. It should be used when we are reading dax status from file
> > > > attr (like ext4 and xfs). 
> > > > 
> > > > And probably create separate option say "-o dax=server" where server
> > > > specifies which inode should use dax.
> > > 
> > > That seems like a poor idea to me.
> > > 
> > > The server side already controls what the client side does by
> > > controlling the inode attributes that the client side sees.  That
> > > is, if the server is going to specify whether the client side data
> > > access is going to use dax, then the server presents the client with
> > > an inode that has the DAX attribute flag set on it.
> > 
> > Hi Dave,
> > 
> > Currently in fuse/virtiofs, DAX is compltely controlled by client. Server
> > has no say in it. If client is mounted with "-o dax", dax is enabled on
> > all files otherwise dax is disabled on all files. One could think of
> > implementing an option on server so that server could deny mmap()
> > requests that come from client, but so far nobody asked for such
> > an option on server side.
> 
> Can you please refer to the new options "always|never|inode" in
> disucssions, because the "-o dax" option is deprecated and we really
> need to center discussions around the new options, not the
> deprecated one.
> 
> > When you say "inode that has DAX attribute flag set on it", are you
> > referring to "S_DAX (in inode->i_flags)" or persistent attr
> > "FS_XFLAG_DAX"?
> 
> Neither. The S_DAX flags is the VFS struct inode state that tells
> the kernel what to do with that inode. The FS_XFLAG_DAX is the
> userspace API status/control flag that represents the current state
> of the persistent per-inode DAX control attribute.
> 
> What I'm talking about the persistent attribute that is present
> on stable storage. e.g. in the XFS on-disk inode the flag is
> XFS_DIFLAG2_DAX which is never seen outside low level XFS code. In
> the case of FUSE, this is the persistent server-side DAX attribute
> state. The client gets this information from the server via an
> inode attribute defined in the FUSE protocol. The client translates
> that network protocol attribute to S_DAX status on FS inode
> instantiation, and to/from FS_XFLAG_DAX for user control under
> dax=inode.
> 
> In the case that the user changes FS_XFLAG_DAX, the FUSE client
> needs to communicate that attribute change to the server, where the
> server then changes the persistent state of the on-disk inode so
> that the next time the client requests that inode, it gets the state
> it previously set. Unless, of course, there are server side policy
> overrides (never/always).
> 
> IOWs, dax=inode on the client side is essentially "follow server
> side policy" because the server maintains the persistent DAX flag
> state in the server side inode...
> 
> > As of now S_DAX on client side inode is set by fuse client whenever
> > client mounted filesystem with "-o dax". And I think you are assuming
> 
> Of course. Similarly, if the client uses dax=never, the S_DAX is
> never set on the VFS inode. But if dax=inode is set, where does the
> DAX attribute state that determines S_DAX comes from .....?
> 
> > that DAX attribute of inode is already coming from server. That's not
> > the case. In fact that seems to be the proposal. Provide capability
> > so that server can specify which inode should be using DAX and which
> > inode should not be.
> 
> Yup, that's exactly what I'm talking about - this is "dax=inode"
> behaviour, because the client follows what the server tells it
> in the inode attributes that arrive over the wire.
> 
> > > In that case, turning off dax on the guest side should be
> > > communicated to the fuse server so the server turns off the DAX flag
> > > on the server side iff server side policy allows it.
> > 
> > Not sure what do you mean by server turns of DAX flag based on client
> > turning off DAX. Server does not have to do anything. If client decides
> > to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
> > to server and that's it.
> 
> Where does the client get it's per-inode DAX policy from if
> dax=inode is, like other DAX filesystems, the default behaviour?
> 
> Where is the persistent storage of that per-inode attribute kept?
> 
> > > When the guest
> > > then purges it's local inode and reloads it from the server then it
> > > will get an inode with the DAX flag set according to server side
> > > policy.
> > 
> > So right now we don't have a mechanism for server to specify DAX flag.
> > And that's what this patch series is trying to do.
> > 
> > > 
> > > Hence if the server side is configured with "dax=always" or
> > > dax="never" semantics it means the client side inode flag state
> > > cannot control DAX mode. That means, regardless of the client side
> > > mount options, DAX is operating under always or never policy,
> > 
> > Hmm..., When you say "server side is configured with "dax=always", 
> > do you mean shared directory on host is mounted with "-o dax=always",
> > or you mean some virtiofs server specific option which enables
> > dax on all inodes from server side.
> 
> I don't care. That's a server side implementation detail. You can
> keep it in a private xattr for all I care.
> 
> > In general, DAX on host and DAX inside guest are completely independent.
> > Host filesystem could very well be mounted with dax or without dax and
> > that has no affect on guests's capability to be able to enable DAX or
> > not. 
> 
> If you have both the host and guest accessing the same files with
> different modes, then you have a data coherency problem that is
> guaranteed to corrupt data.
> 
> > > seems to be exactly what 'dax=inode' behaviour means on the client
> > > side - it behaves according to how the server side propagates the
> > > DAX attribute to the client for each inode.
> > 
> > Ok. So "-o dax=inode" in fuse will have a different meaning as opposed
> > to ext4/xfs. This will mean that server will pass DAX state of inode
> > when inode is instantiated and client should honor that. 
> 
> No, that's exactly what dax=inode means: DAX behaviour is
> per-inode state that users must probe via statx() to determine if
> dax is active or not.
> 
> > But what about FS_XFLAG_DAX flag then. Say host file system
> > does support this att and fuse/virtiofs allows querying and
> > setting this attribute (I don't think it is allowed now).
> 
> FS_XFLAG_DAX is the ioctl-based control API for the client side.
> It's not a persistent flag.
> 
> > So will we not create a future conflict where in case of
> > fuse/virtiofs "-o dax=inode" means something different and it does
> > look at FS_XFLAG_DAX file attr.
> 
> Like many, I suspect you might be mis-understanding the DAX API.
> 
> There are 4 parts to it:
> 
> - persistent filesystem inode state flag
> - ioctls to manipulate persistent inode state flag (FS_XFLAG_DAX)
> - mount options to override persistent inode state flag
> - VFS inode state indicating DAX is active (S_DAX, STATX_ATTR_DAX)
> 
> You seem to be conflating the user API FS_XFLAG_DAX flag with
> whatever the filesystem uses to physically storage that state. They
> are not the same thing. FS_XFLAG_DAX also has no correlation with
> the mount options - we can change the on-disk flag state regardless
> of the mount option in use. We can even change the on-disk state
> flag on *kernels that don't support DAX*.  Changing the on-disk
> attribute *does not change S_DAX*.
> 
> That's because the on-disk persistent state is a property of the
> filesystem's on-disk format, not a property of the kernel that is
> running on that machine. The persistent state flag is managed as a
> completely independent filesystem inode attribute, but it only
> affects behaviour when the dax=inode mode has been selected.
> 
> Control of the behaviour by this persistent inode flag is what
> dax=inode means. It does not define how that attribute flag is
> managed, it just defines the policy that an the inode's behaviour
> will be determined by it's dax attribute state.

So here is my confusion. Say client mounts filesystem with "dax=inode"
and file "foo.txt" has FS_XFLAG_DAX set, do I have 

> 
> OTOH, S_DAX is used by the kernel to enable DAX access. It is the
> _mechanism_, not the policy. We control S_DAX by mount option and/or
> the filesystems persistent inode state flag, and behaviour is
> determined by these policies at VFS inode instantiation time. Change
> the policy, turf the inode out of cache and re-instantiate it, and
> S_DAX for that inode is recalculated from the new policy.
> 
> For FUSE, the server provides the persistent storage and nothing
> else. The ioctls to manipulate dax state are client side ioctls, as
> are the mount options and the S_DAX vfs inode state. Hence for
> server side management of the per-inode S_DAX state, the FUSE
> protocol needs to be able to pass the per-inode persistent DAX
> attribute state to the client and the client needs to honor
> that attribute from the server.
> 
> How the FUSE server stores this persistent attribute is up to the
> server implementation. If the server wants FUSE access to be
> independent of host access, it can't use the persistent inode flags
> in the host filesystem - it will need to use it's own xattrs. If the
> server wants host access to be coherent with the guest, then it will
> need to implement that in a different way.
> 
> But the FUSE client doesn't care about any of this server side
> policy mumbo-jumbo - it just needs to do what the server sends
> to it in the DAX inode attribute. And that's exactly what dax=inode
> means...
> 
> > In summary, there seem to be two use cases.
> > 
> > A. virtiofsd/fuse-server wants do be able to come up with its own
> > policy to decide which inodes should use guest.
> > 
> > B. guest client decides which inode use DAX based on FS_XFLAG_DAX
> > attr on inode (and server does not play a role).
> > 
> > To be able to different between these two cases, I was suggesting
> > using "-o dax=inode" for B and "-o dax=server" for A.
> 
> dax=inode covrees both these cases.  All the server side needs to do
> is set the inode attribute according to policy, and all the client
> needs to do is obey the server side per-inode attribute. IOWs,
> client side using "dax=inode" means the server side controls the DAX
> behaviour via the FUSE DAX attribute. 

Hi Dave,

Can a filesystem mounted with "-o dax=inode" enable DAX on a file even
if FS_XFLAG_DAX attr is not set on the file. I think that's something new
which will happen in case of fuse  and currently does not happen with
ext4/xfs.

That's the use case A which wants to enable DAX attribute of a file
based on its own policy (irrespective of state of FS_XFLAG_DAX on
file).

IIUC, you are ok with this. Just want to be sure because this
is a subtle change from ext4/xfs behavior which will only enable
DAX on a file if FS_XFLAG_DAX is set (with dax=inode). IOW, on
ext4/xfs if FS_XFLAG_DAX is not set on file then DAX will not be
enabled (dax=inode).

I don't want applications to be making assumption that if FS_XFFLAG_DAX
is not set on a file, then DAX will not be enabled on that file because
that's what exactly fuse/virtiofs can do.

> 
> If the server wants the client to always use DAX, then it always
> sets the FUSE inode attribute. If the server says "never use DAX",
> then it never sets the FUSE inode attribute.  If the server doesn't
> want the client to control policy, then it just rejects attempts to
> change the per-inode persistent DAX flag via client side
> ioctl(FS_XFLAG_DAX) calls. Hence we have use case A completely
> covered.

Ok, I think the behavior of ioctl(FS_XFLAG_DAX) calls is confusing
me most in this context. IIUC, you are suggesting that server will
define the policy whether it is supporting ioctl(FS_XFLAG_DAX)
or not. This is will part of feature negotiation of fuse protocol.

If server decides to support ioctl(FS_XFLAG_DAX), then it should
work similar to ext4/xfs and also follow inheritance rules.

If server decides to not support (or not enable) ioctl(FS_XFLAG_DAX),
then server need to reject any attempt from client to set
FS_XFLAG_DAX. And if client queries the current state of 
FS_XFLAG_DAX, then we need to return it is not set (even if it
set on underlying filesystem). That way client will think FS_XFLAG_DAX
is not set and will not expect DAX to be necessarily enabled.

> 
> For case B, which is true dax=inode behaviour at the client, then
> the server side honours the client side requests to change the
> persistent FUSE DAX inode attribute via client side FS_XFLAG_DAX
> ioctls.
> 
> See? At the client side, there is absolutely no difference between
> server side policy control and true dax=inode support. The client is
> completely unaware of server side policies and that's how the client
> side should be implemented. The applications have to be prepared for
> policy to change dynamically with either dax=server or dax=inode, so
> there's no difference to applications running in the guest either.
> 
> Hence I just don't see the justification for a special DAX mode from
> an architectural POV. It's no more work to implement per-inode DAX
> properly form the start and we don't end up with a special, subtly
> different mode.

Ok, So I guess initially we could just implement "-o dax=inode" and
*not support FS_XFLAG_DAX" API at all. Fuse server will decide which
inodes should enable DAX and communicate that to client in a message
response as part of protocol. Any attempt by client to set attr
FS_XFLAG_DAX should fail (Say -ENOTSUP). And any query of FS_XFLAG_DAX
attr should return attr is not set.

One could support "FS_XFLAG_DAX" API down the line. And in that case
fuser server should allow setting and querying FS_XFLAG_DAX and follow
inheritance rules. Also dax flag of inode should be set according to
state of FS_XFLAG_DAX attr (and not some internal decision of server).
IOW, server will have to follow state of FS_XFLAG_DAX attr in deciding
DAX state of inode if it decided to support FS_XFLAG_DAX attr API.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-27 15:32           ` Vivek Goyal
@ 2021-09-28  3:23             ` Dave Chinner
  0 siblings, 0 replies; 26+ messages in thread
From: Dave Chinner @ 2021-09-28  3:23 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jeffle Xu, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Sep 27, 2021 at 11:32:01AM -0400, Vivek Goyal wrote:
> On Mon, Sep 27, 2021 at 10:21:48AM +1000, Dave Chinner wrote:
> > On Thu, Sep 23, 2021 at 09:02:26PM -0400, Vivek Goyal wrote:
> > > In summary, there seem to be two use cases.
> > > 
> > > A. virtiofsd/fuse-server wants do be able to come up with its own
> > > policy to decide which inodes should use guest.
> > > 
> > > B. guest client decides which inode use DAX based on FS_XFLAG_DAX
> > > attr on inode (and server does not play a role).
> > > 
> > > To be able to different between these two cases, I was suggesting
> > > using "-o dax=inode" for B and "-o dax=server" for A.
> > 
> > dax=inode covrees both these cases.  All the server side needs to do
> > is set the inode attribute according to policy, and all the client
> > needs to do is obey the server side per-inode attribute. IOWs,
> > client side using "dax=inode" means the server side controls the DAX
> > behaviour via the FUSE DAX attribute. 
> 
> Hi Dave,
> 
> Can a filesystem mounted with "-o dax=inode" enable DAX on a file even
> if FS_XFLAG_DAX attr is not set on the file. I think that's something new
> which will happen in case of fuse  and currently does not happen with
> ext4/xfs.

It can happen - the on-disk flag is defined as advisory in
Documentation/filesystems/dax.rst:

2. There exists a persistent flag `FS_XFLAG_DAX` that can be applied
   to regular files and directories. This advisory flag can be set or
   cleared at any time, but doing so does not immediately affect
   the `S_DAX` state.

So, we can have things like the inode get initialised, then the 
on disk flag is cleared. Now you have an inode that is accessed by
DAX in memory, and it will continue to be accessed using DAX until
to it removed from memory. IOWs, you can then check the on disk flag
and see that it is clear, yet check statx(STATX_ATTR_DAX) and see
that DAX is enabled.

Another counter example is that an inode has been reflink copied
and then the on-disk flag is set on the inode. Some time later
it is instantiated in memory, the filesystem sees that it has DAX
and shared extents and turns off DAX because it's not supported with
shared extents right now. So you can check the on-disk flag and
see that DAX -should- be enabled, but when you check
statx(STATX_ATTR_DAX) you will always set that DAX is disabled.

Another example: on disk bit is set, file contains inline data, or
compressed data, or encrypted data and so cannot provide direct
access to the storage. In which case, the filesysetm will ignore the
on disk bit and never use DAX on that inode.

IOWs, dax=inode means that the inode contains the *desired policy
for that inode*. It is a hint, not an enforced policy. The
filesystem and or kernel infrastructure can decide to ignore the
hint, and that is one of the reasons why checking
statx(STATX_ATTR_DAX) is required if the application needs DAX to
operate correctly.

> That's the use case A which wants to enable DAX attribute of a file
> based on its own policy (irrespective of state of FS_XFLAG_DAX on
> file).

Perfectly fine - dax=inode simply says "DAX policy is defined on
per-inode granularity". It does not guarantee any specific behaviour
will be followed because the per-inode policy flag is advisory.

> IIUC, you are ok with this. Just want to be sure because this
> is a subtle change from ext4/xfs behavior which will only enable
> DAX on a file if FS_XFLAG_DAX is set (with dax=inode). IOW, on
> ext4/xfs if FS_XFLAG_DAX is not set on file then DAX will not be
> enabled (dax=inode).

Again, the flag is -advisory- so there is nothing stopping us from
having other policies that combine with or override the on-disk
flag.

Fundamentally, the options are:

- "always" meaning *everything* *always* uses DAX.
- "never" meaning *nothing* *ever* uses DAX.
- "inode" meaning "derive behaviour from the inode advisory flag".

"dax=inode" does not rule out behaviour being derived from other
policy mechanisms - it just means that in the absence of any other
policy, decide behaviour dynamically on a per-inode granularity.  We
defined an API to set/clear that advisory flag on a per-inode basis,
and how it gets used is only limited by your imagination.

For example, the "inherit DAX from parent directory" policy is based
on setting the DAX flag set on the directory. We cannot use DAX to
access directory data directly, so this flag on a directory becomes
a *policy propagation mechanism* rather than an enabling
hint.

Hence the user might not even be aware that the admin of the system
(or even a package installer) set up DAX propagation policies. Hence
all their data directories are defaulting to using DAX even though
they never set a DAX flag or dax=always mount option themselves.

IOWs, having the FUSE server dynamically and/or silently propagate,
control or influence client side DAX behaviour under client side
dax=inode behaviour falls well within the "admin controlled policy
propoagation" guise that dax=inode is supposed to allow admins to
implement.

> I don't want applications to be making assumption that if FS_XFFLAG_DAX
> is not set on a file, then DAX will not be enabled on that file because
> that's what exactly fuse/virtiofs can do.

Applications *must* check statx(STATX_ATTR_DAX) after they have
opened the file to determine if DAX is in use. FS_XFLAG_DAX is an
advisory hint, not an indication of current behaviour, and it's
irrelevant if dax=never or dax=always is in use. In those cases,
statx() is pretty much the only way to sanely check the DAX state of
an open file.


> > If the server wants the client to always use DAX, then it always
> > sets the FUSE inode attribute. If the server says "never use DAX",
> > then it never sets the FUSE inode attribute.  If the server doesn't
> > want the client to control policy, then it just rejects attempts to
> > change the per-inode persistent DAX flag via client side
> > ioctl(FS_XFLAG_DAX) calls. Hence we have use case A completely
> > covered.
> 
> Ok, I think the behavior of ioctl(FS_XFLAG_DAX) calls is confusing
> me most in this context. IIUC, you are suggesting that server will
> define the policy whether it is supporting ioctl(FS_XFLAG_DAX)
> or not. This is will part of feature negotiation of fuse protocol.
>
> If server decides to support ioctl(FS_XFLAG_DAX), then it should
> work similar to ext4/xfs and also follow inheritance rules.
> 
> If server decides to not support (or not enable) ioctl(FS_XFLAG_DAX),
> then server need to reject any attempt from client to set
> FS_XFLAG_DAX. And if client queries the current state of 
> FS_XFLAG_DAX, then we need to return it is not set (even if it
> set on underlying filesystem). That way client will think FS_XFLAG_DAX
> is not set and will not expect DAX to be necessarily enabled.

No, I'm not suggesting that's what you do. How the server controls
the DAX policy is largely irrelevant to the discussion.

Start by looking at how the client implements FS_XFLAG_DAX.  It's
already supported, because the fuse client already has the
FSGETXATTR/FSSETXATTR protocol mechanisms for doing this
(fuse_fileattr_get(), fuse_fileattr_set()). Hence the client is
already passing FS_XFLAG_DAX to the server for persistent storage,
and it should already be getting it back from the server.

The basis of XFS/ext4 dax=inode behaviour is already implemented
in the fuse client. All you need to do is hook up
fuse_dax_inode_init() to look at that flag to determine behaviour
instead of always applying the connection DAX state as the
determination. i.e.:

	if (!fc->dax)
		return;
	if ("dax=never")
		return;
	if ("dax=inode" && !fuse_inode_has_dax(inode))
		return;

	inode->i_flags |= S_DAX;
	inode->i_data.a_ops = &fuse_dax_file_aops;

And there's not much more to implement on the client side for
dax=inode behaviour.

So, if you want server side policy control, the FUSE protocol needs
to ensure that fuse_inode_has_dax(inode) follows whatever the server
side requires the policy to be. This may be combined with the
user FS_XFLAG_DAX attribute, or it may be a separate inode flag
that overrides FS_XFLAG_DAX. But that's FUSE implementation detail,
and not something I'm concerned about.

The fact is that server side override doesn't change the client side
logic or behaviour. The FS_XFLAG_DAX behaviour follows what the user
sets, dax=inode follows that in the absence of other policy
overrides that the user may or may not know about but are applied on
a per-inode granularity...

> > For case B, which is true dax=inode behaviour at the client, then
> > the server side honours the client side requests to change the
> > persistent FUSE DAX inode attribute via client side FS_XFLAG_DAX
> > ioctls.
> > 
> > See? At the client side, there is absolutely no difference between
> > server side policy control and true dax=inode support. The client is
> > completely unaware of server side policies and that's how the client
> > side should be implemented. The applications have to be prepared for
> > policy to change dynamically with either dax=server or dax=inode, so
> > there's no difference to applications running in the guest either.
> > 
> > Hence I just don't see the justification for a special DAX mode from
> > an architectural POV. It's no more work to implement per-inode DAX
> > properly form the start and we don't end up with a special, subtly
> > different mode.
> 
> Ok, So I guess initially we could just implement "-o dax=inode" and
> *not support FS_XFLAG_DAX" API at all.

No, please don't do that, and please stop wasting my time by making
me have to repeatedly explain to you why doing stuff like this is
harmful to our user base. Implement compatible, common dax=inode
behaviour *first*, then layer your server side policy variants on
top of that.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-27  2:28           ` JeffleXu
@ 2021-09-28  3:44             ` Dave Chinner
  2021-09-28  5:17               ` JeffleXu
  0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2021-09-28  3:44 UTC (permalink / raw)
  To: JeffleXu
  Cc: Vivek Goyal, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu,
	joseph.qi

On Mon, Sep 27, 2021 at 10:28:34AM +0800, JeffleXu wrote:
> On 9/27/21 8:21 AM, Dave Chinner wrote:
> > On Thu, Sep 23, 2021 at 09:02:26PM -0400, Vivek Goyal wrote:
> >> On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
> >>> On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
> > In the case that the user changes FS_XFLAG_DAX, the FUSE client
> > needs to communicate that attribute change to the server, where the
> > server then changes the persistent state of the on-disk inode so
> > that the next time the client requests that inode, it gets the state
> > it previously set. Unless, of course, there are server side policy
> > overrides (never/always).
> 
> One thing I'm concerned with is that, is the following behavior
> consistent with the semantics of per-file DAX in ext4/xfs?
> 
> Client changes FS_XFLAG_DAX, this change is communicated to server and
> then server also returns **success**. Then client finds that this file
> is not DAX enabled, since server doesn't honor the previously set state.

FS_XFLAG_DAX is advisory in nature - it does not have to be honored
at inode instantiation.

> IOWs, shall server always honor the persistent per-inode attribute of
> host file (if the change of FS_XFLAG_DAX inside guest is stored as
> persistent per-inode attribute on host file)?

If the user set the flag, then queries it, the server should be
returning the state that the user set, regardless of whether it is
being honored at inode instantiation time.

Remember, FS_XFLAG_DAX does not imply S_DAX and vice versa.

> >> Not sure what do you mean by server turns of DAX flag based on client
> >> turning off DAX. Server does not have to do anything. If client decides
> >> to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
> >> to server and that's it.
> > 
> > Where does the client get it's per-inode DAX policy from if
> > dax=inode is, like other DAX filesystems, the default behaviour?
> > 
> > Where is the persistent storage of that per-inode attribute kept?
> 
> In the latest patch set, it is not supported yet to change FS_XFLAG_DAX
> (and thus setting/clearing persistent per-inode attribute) inside guest,
> since this scenario is not urgently needed as the real using case.

AFAICT the FS_IOC_FS{GS}ETXATTR ioctl is already supported by the
fuse client and it sends the ioctl to the server. Hence the client
should already support persistent FS_XFLAG_DAX manipulations
regardless of where/how the attribute is stored by the server. Did
you actually add code to the client in this patchset to stop
FS_XFLAG_DAX from being propagated to the server?

> Currently the per-inode dax attribute is completely fed from server
> thourgh FUSE protocol, e.g. server could set/clear the per-inode dax
> attribute depending on the file size.

Yup, that's a policy dax=inode on the client side would allow.
Indeed, this same policy could also be implemented as a client side
policy, allowing user control instead of admin control of such
conditional DAX behaviour... :)

> The previous path set indeed had ever supported changing FS_XFLAG_DAX
> and accordingly setting/clearing persistent per-inode attribute inside
> guest. For "passthrough" type virtiofsd, the persistent per-inode
> attribute is stored as XFS_DIFLAG2_DAX/EXT4_DAX_FL on host file
> directly, since this is what "passthrough" means.

Right, but that's server side storage implementation details, not a
protocol or client side detail. What I can't find in the current
client is where this per-inode flag is actually used in the FUSE dax
inode init path - it just checks whether the connection has DAX
state set up. Hence I don't see how FS_XFLAG_DAX control from the
client currently has any influence on the client side DAX usage.

Seems somewhat crazy to me explicitly want to remove that client
side control of per-inode behaviour whilst adding the missing client
side bits that allow for the per-inode policy control from server
side.  Can we please just start with the common, compatible
dax=inode behaviour on the client side, then layer the server side
policy control options over the top of that?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-28  3:44             ` Dave Chinner
@ 2021-09-28  5:17               ` JeffleXu
  2021-09-28 14:34                 ` Vivek Goyal
  0 siblings, 1 reply; 26+ messages in thread
From: JeffleXu @ 2021-09-28  5:17 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Vivek Goyal, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu,
	joseph.qi



On 9/28/21 11:44 AM, Dave Chinner wrote:
> On Mon, Sep 27, 2021 at 10:28:34AM +0800, JeffleXu wrote:
>> On 9/27/21 8:21 AM, Dave Chinner wrote:
>>> On Thu, Sep 23, 2021 at 09:02:26PM -0400, Vivek Goyal wrote:
>>>> On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
>>>>> On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
>>> In the case that the user changes FS_XFLAG_DAX, the FUSE client
>>> needs to communicate that attribute change to the server, where the
>>> server then changes the persistent state of the on-disk inode so
>>> that the next time the client requests that inode, it gets the state
>>> it previously set. Unless, of course, there are server side policy
>>> overrides (never/always).
>>
>> One thing I'm concerned with is that, is the following behavior
>> consistent with the semantics of per-file DAX in ext4/xfs?
>>
>> Client changes FS_XFLAG_DAX, this change is communicated to server and
>> then server also returns **success**. Then client finds that this file
>> is not DAX enabled, since server doesn't honor the previously set state.
> 
> FS_XFLAG_DAX is advisory in nature - it does not have to be honored
> at inode instantiation.

Fine.

> 
>> IOWs, shall server always honor the persistent per-inode attribute of
>> host file (if the change of FS_XFLAG_DAX inside guest is stored as
>> persistent per-inode attribute on host file)?
> 
> If the user set the flag, then queries it, the server should be
> returning the state that the user set, regardless of whether it is
> being honored at inode instantiation time.
> 
> Remember, FS_XFLAG_DAX does not imply S_DAX and vice versa
Got it.

> 
>>>> Not sure what do you mean by server turns of DAX flag based on client
>>>> turning off DAX. Server does not have to do anything. If client decides
>>>> to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
>>>> to server and that's it.
>>>
>>> Where does the client get it's per-inode DAX policy from if
>>> dax=inode is, like other DAX filesystems, the default behaviour?
>>>
>>> Where is the persistent storage of that per-inode attribute kept?
>>
>> In the latest patch set, it is not supported yet to change FS_XFLAG_DAX
>> (and thus setting/clearing persistent per-inode attribute) inside guest,
>> since this scenario is not urgently needed as the real using case.
> 
> AFAICT the FS_IOC_FS{GS}ETXATTR ioctl is already supported by the
> fuse client and it sends the ioctl to the server. Hence the client
> should already support persistent FS_XFLAG_DAX manipulations
> regardless of where/how the attribute is stored by the server. Did
> you actually add code to the client in this patchset to stop
> FS_XFLAG_DAX from being propagated to the server?

Yes fuse client supports FS_IOC_FS{GS}ETXATTR ioctl already, but AFAIK
"passthrough" type virtiofsd doesn't support FUSE_IOCTL yet. My previous
patch had ever added support for FUSE_IOCTL to virtiofsd.

> 
>> Currently the per-inode dax attribute is completely fed from server
>> thourgh FUSE protocol, e.g. server could set/clear the per-inode dax
>> attribute depending on the file size.
> 
> Yup, that's a policy dax=inode on the client side would allow.
> Indeed, this same policy could also be implemented as a client side
> policy, allowing user control instead of admin control of such
> conditional DAX behaviour... :)
> 
>> The previous path set indeed had ever supported changing FS_XFLAG_DAX
>> and accordingly setting/clearing persistent per-inode attribute inside
>> guest. For "passthrough" type virtiofsd, the persistent per-inode
>> attribute is stored as XFS_DIFLAG2_DAX/EXT4_DAX_FL on host file
>> directly, since this is what "passthrough" means.
> 
> Right, but that's server side storage implementation details, not a
> protocol or client side detail. What I can't find in the current
> client is where this per-inode flag is actually used in the FUSE dax
> inode init path - it just checks whether the connection has DAX
> state set up. Hence I don't see how FS_XFLAG_DAX control from the
> client currently has any influence on the client side DAX usage.

Fuse client fetches the per-inode DAX attribute from
fuse_entry_out.attr.flags of FUSE_LOOKUP reply. It's implemented in
patch 4 of this patch set.

The background info is that, fuse client will send a FUSE_LOOKUP request
to server during inode instantiation, while FS_XFLAG_DAX flag is not
included in the FUSE_LOOKUP reply, and thus fuse client need to send
another FUSE_IOCTL if it wants to query the persistent inode flags. To
remove this extra fuse request during inode instantiation, this flag is
merged into FUSE_LOOKUP reply (fuse_entry_out.attr.flags) as
FUSE_ATTR_DAX. Then if FUSE_ATTR_DAX flag is set in
fuse_entry_out.attr.flags, then fuse client knows that this file shall
be DAX enabled.

IOWs, under this mechanism it relies on fuse server to check persistent
inode flags on host, and then set FUSE_ATTR_DAX flag accordingly.

> 
> Seems somewhat crazy to me explicitly want to remove that client
> side control of per-inode behaviour whilst adding the missing client
> side bits that allow for the per-inode policy control from server
> side.  Can we please just start with the common, compatible
> dax=inode behaviour on the client side, then layer the server side
> policy control options over the top of that?


Hi Vivek,

It seems that we shall also support setting/clearing FS_XFLAG_DAX inside
guest? If that's the case, then how to design virtiofsd behavior? I
mean, is it mandatory or optional for virtiofsd to query FS_XFLAG_DAX of
host files when guest is mounted with "dax=inode"? If it's optional,
then the performance may be better since it doesn't need to do one extra
FS_IOC_FSGETXATTR ioctl when handling FUSE_LOOKUP, but admin needs to
specify "-o policy=flag" to virtiofsd explicitly if it's really needed.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state
  2021-09-28  5:17               ` JeffleXu
@ 2021-09-28 14:34                 ` Vivek Goyal
  0 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2021-09-28 14:34 UTC (permalink / raw)
  To: JeffleXu
  Cc: Dave Chinner, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu,
	joseph.qi

On Tue, Sep 28, 2021 at 01:17:27PM +0800, JeffleXu wrote:
> 
> 
> On 9/28/21 11:44 AM, Dave Chinner wrote:
> > On Mon, Sep 27, 2021 at 10:28:34AM +0800, JeffleXu wrote:
> >> On 9/27/21 8:21 AM, Dave Chinner wrote:
> >>> On Thu, Sep 23, 2021 at 09:02:26PM -0400, Vivek Goyal wrote:
> >>>> On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
> >>>>> On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
> >>> In the case that the user changes FS_XFLAG_DAX, the FUSE client
> >>> needs to communicate that attribute change to the server, where the
> >>> server then changes the persistent state of the on-disk inode so
> >>> that the next time the client requests that inode, it gets the state
> >>> it previously set. Unless, of course, there are server side policy
> >>> overrides (never/always).
> >>
> >> One thing I'm concerned with is that, is the following behavior
> >> consistent with the semantics of per-file DAX in ext4/xfs?
> >>
> >> Client changes FS_XFLAG_DAX, this change is communicated to server and
> >> then server also returns **success**. Then client finds that this file
> >> is not DAX enabled, since server doesn't honor the previously set state.
> > 
> > FS_XFLAG_DAX is advisory in nature - it does not have to be honored
> > at inode instantiation.
> 
> Fine.
> 
> > 
> >> IOWs, shall server always honor the persistent per-inode attribute of
> >> host file (if the change of FS_XFLAG_DAX inside guest is stored as
> >> persistent per-inode attribute on host file)?
> > 
> > If the user set the flag, then queries it, the server should be
> > returning the state that the user set, regardless of whether it is
> > being honored at inode instantiation time.
> > 
> > Remember, FS_XFLAG_DAX does not imply S_DAX and vice versa
> Got it.
> 
> > 
> >>>> Not sure what do you mean by server turns of DAX flag based on client
> >>>> turning off DAX. Server does not have to do anything. If client decides
> >>>> to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
> >>>> to server and that's it.
> >>>
> >>> Where does the client get it's per-inode DAX policy from if
> >>> dax=inode is, like other DAX filesystems, the default behaviour?
> >>>
> >>> Where is the persistent storage of that per-inode attribute kept?
> >>
> >> In the latest patch set, it is not supported yet to change FS_XFLAG_DAX
> >> (and thus setting/clearing persistent per-inode attribute) inside guest,
> >> since this scenario is not urgently needed as the real using case.
> > 
> > AFAICT the FS_IOC_FS{GS}ETXATTR ioctl is already supported by the
> > fuse client and it sends the ioctl to the server. Hence the client
> > should already support persistent FS_XFLAG_DAX manipulations
> > regardless of where/how the attribute is stored by the server. Did
> > you actually add code to the client in this patchset to stop
> > FS_XFLAG_DAX from being propagated to the server?
> 
> Yes fuse client supports FS_IOC_FS{GS}ETXATTR ioctl already, but AFAIK
> "passthrough" type virtiofsd doesn't support FUSE_IOCTL yet. My previous
> patch had ever added support for FUSE_IOCTL to virtiofsd.
> 
> > 
> >> Currently the per-inode dax attribute is completely fed from server
> >> thourgh FUSE protocol, e.g. server could set/clear the per-inode dax
> >> attribute depending on the file size.
> > 
> > Yup, that's a policy dax=inode on the client side would allow.
> > Indeed, this same policy could also be implemented as a client side
> > policy, allowing user control instead of admin control of such
> > conditional DAX behaviour... :)
> > 
> >> The previous path set indeed had ever supported changing FS_XFLAG_DAX
> >> and accordingly setting/clearing persistent per-inode attribute inside
> >> guest. For "passthrough" type virtiofsd, the persistent per-inode
> >> attribute is stored as XFS_DIFLAG2_DAX/EXT4_DAX_FL on host file
> >> directly, since this is what "passthrough" means.
> > 
> > Right, but that's server side storage implementation details, not a
> > protocol or client side detail. What I can't find in the current
> > client is where this per-inode flag is actually used in the FUSE dax
> > inode init path - it just checks whether the connection has DAX
> > state set up. Hence I don't see how FS_XFLAG_DAX control from the
> > client currently has any influence on the client side DAX usage.
> 
> Fuse client fetches the per-inode DAX attribute from
> fuse_entry_out.attr.flags of FUSE_LOOKUP reply. It's implemented in
> patch 4 of this patch set.
> 
> The background info is that, fuse client will send a FUSE_LOOKUP request
> to server during inode instantiation, while FS_XFLAG_DAX flag is not
> included in the FUSE_LOOKUP reply, and thus fuse client need to send
> another FUSE_IOCTL if it wants to query the persistent inode flags. To
> remove this extra fuse request during inode instantiation, this flag is
> merged into FUSE_LOOKUP reply (fuse_entry_out.attr.flags) as
> FUSE_ATTR_DAX. Then if FUSE_ATTR_DAX flag is set in
> fuse_entry_out.attr.flags, then fuse client knows that this file shall
> be DAX enabled.
> 
> IOWs, under this mechanism it relies on fuse server to check persistent
> inode flags on host, and then set FUSE_ATTR_DAX flag accordingly.
> 
> > 
> > Seems somewhat crazy to me explicitly want to remove that client
> > side control of per-inode behaviour whilst adding the missing client
> > side bits that allow for the per-inode policy control from server
> > side.  Can we please just start with the common, compatible
> > dax=inode behaviour on the client side, then layer the server side
> > policy control options over the top of that?
> 
> 
> Hi Vivek,
> 
> It seems that we shall also support setting/clearing FS_XFLAG_DAX inside
> guest? If that's the case, then how to design virtiofsd behavior? I
> mean, is it mandatory or optional for virtiofsd to query FS_XFLAG_DAX of
> host files when guest is mounted with "dax=inode"? If it's optional,
> then the performance may be better since it doesn't need to do one extra
> FS_IOC_FSGETXATTR ioctl when handling FUSE_LOOKUP, but admin needs to
> specify "-o policy=flag" to virtiofsd explicitly if it's really needed.

Hi Jeffle,

How about first doing a patch series to just enable ioctl in virtiofsd.
I know David Gilbert and others had security concenrs w.r.t. These
are coming from untrusted guest and they had concerns that we should
only allow selective operations as needed (opted-in by admin). So may
be a daemon option which specifies which operations to allow.

Once that's done, we probably will have to do a patch series, to
make sure FS_XFLAG_DAX inherit behavior is working properly. Especially
that behavior about inheriting FS_XFLAG_DAX flag when a new file
is created and parent dir has FS_XFLAG_DAX set. May be we can just
rely on host filesystem doing it? Limitation will be that it will
only work if host fs is ext4/xfs.

w.r.t virtiofsd, I think we can provide an option say "-o
dax_policy=<option>", which controls what policy is in effect. So if
a daemon specific policy is in effect, we can skip checking FS_XFLAG_DAX
state. In fact, checking FS_XFLAG_DAX state also should probably be
a policy option and not enabled by default.

Say, "-o dax_policy=FS_XFLAG_DAX" will enable cheking FS_XFLAG_DAX on
inode during FUSE_LOOKUP time. Otherwise daemon can fallback to
its own policy and set DAX flag in lookup reply accordingly.

Vivek


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/5] fuse,virtiofs: support per-file DAX
  2021-09-23 18:57 ` [PATCH v5 0/5] fuse,virtiofs: " Vivek Goyal
@ 2021-10-27  3:42   ` JeffleXu
  2021-10-27 14:45     ` Vivek Goyal
  0 siblings, 1 reply; 26+ messages in thread
From: JeffleXu @ 2021-10-27  3:42 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi


Sorry for the late reply, as your previous reply was moved to junk box
by the algorithm...

On 9/24/21 2:57 AM, Vivek Goyal wrote:
> On Thu, Sep 23, 2021 at 05:25:21PM +0800, Jeffle Xu wrote:
>> This patchset adds support of per-file DAX for virtiofs, which is
>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>
>> Any comment is welcome.
>>
>> [1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
>> [2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")
>>
>>
>> [Purpose]
>> DAX may be limited in some specific situation. When the number of usable
>> DAX windows is under watermark, the recalim routine will be triggered to
>> reclaim some DAX windows. It may have a negative impact on the
>> performance, since some processes may need to wait for DAX windows to be
>> recalimed and reused then. To mitigate the performance degradation, the
>> overall DAX window need to be expanded larger.
>>
>> However, simply expanding the DAX window may not be a good deal in some
>> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
>> (512 * 64 bytes) memory footprint will be consumed for page descriptors
>> inside guest, which is greater than the memory footprint if it uses
>> guest page cache when DAX disabled. Thus it'd better disable DAX for
>> those files smaller than 32KB, to reduce the demand for DAX window and
>> thus avoid the unworthy memory overhead.
>>
>> Per-file DAX feature is introduced to address this issue, by offering a
>> finer grained control for dax to users, trying to achieve a balance
>> between performance and memory overhead.
>>
>>
>> [Note]
>> When the per-file DAX hint changes while the file is still *opened*, it
>> is quite complicated and maybe fragile to dynamically change the DAX
>> state, since dynamic switching needs to switch a_ops atomiclly. Ira
>> Weiny had ever implemented a so called i_aops_sem lock [3] but
>> eventually gave up since the complexity of the implementation [4][5][6][7].
>>
>> Hence mark the inode and corresponding dentries as DONE_CACHE once the
>> per-file DAX hint changes, so that the inode instance will be evicted
>> and freed as soon as possible once the file is closed and the last
>> reference to the inode is put. And then when the file gets reopened next
>> time, the new instantiated inode will reflect the new DAX state.
> 
> If we don't cache inode (if no fd is open), will it not have negative
> performance impact. When we cache inodes, we also have all the dax
> mappings cached as well. So if a process opens the same file again,
> it gets all the mappings already in place and it does not have
> to call FUSE_SETUPMAPPING again.
> 

What does 'all the dax mappings cached' mean when 'we cache inodes'?

If the per-file DAX hint indeed changes for a large sized file, with
quite many page caches or DAX mapping already in the address space, then
marking it DONT_CACHE means evicting the inode as soon as possible,
which means flushing the page caches or removing all DAX mappings. When
the inode is reopened next time, page cache is re-instantiated or
FUSE_SETUPMAPPING is called again. Then the negative performance impact
indeed exist in this case.

But this performance impact only exist when the per-file DAX hint
changes halfway, that is, the hint suddenly changes after the virtiofs
has already mounted in the guest.



-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 0/5] fuse,virtiofs: support per-file DAX
  2021-10-27  3:42   ` JeffleXu
@ 2021-10-27 14:45     ` Vivek Goyal
  0 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2021-10-27 14:45 UTC (permalink / raw)
  To: JeffleXu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Wed, Oct 27, 2021 at 11:42:52AM +0800, JeffleXu wrote:
> 
> Sorry for the late reply, as your previous reply was moved to junk box
> by the algorithm...
> 
> On 9/24/21 2:57 AM, Vivek Goyal wrote:
> > On Thu, Sep 23, 2021 at 05:25:21PM +0800, Jeffle Xu wrote:
> >> This patchset adds support of per-file DAX for virtiofs, which is
> >> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>
> >> Any comment is welcome.
> >>
> >> [1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
> >> [2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")
> >>
> >>
> >> [Purpose]
> >> DAX may be limited in some specific situation. When the number of usable
> >> DAX windows is under watermark, the recalim routine will be triggered to
> >> reclaim some DAX windows. It may have a negative impact on the
> >> performance, since some processes may need to wait for DAX windows to be
> >> recalimed and reused then. To mitigate the performance degradation, the
> >> overall DAX window need to be expanded larger.
> >>
> >> However, simply expanding the DAX window may not be a good deal in some
> >> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> >> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> >> inside guest, which is greater than the memory footprint if it uses
> >> guest page cache when DAX disabled. Thus it'd better disable DAX for
> >> those files smaller than 32KB, to reduce the demand for DAX window and
> >> thus avoid the unworthy memory overhead.
> >>
> >> Per-file DAX feature is introduced to address this issue, by offering a
> >> finer grained control for dax to users, trying to achieve a balance
> >> between performance and memory overhead.
> >>
> >>
> >> [Note]
> >> When the per-file DAX hint changes while the file is still *opened*, it
> >> is quite complicated and maybe fragile to dynamically change the DAX
> >> state, since dynamic switching needs to switch a_ops atomiclly. Ira
> >> Weiny had ever implemented a so called i_aops_sem lock [3] but
> >> eventually gave up since the complexity of the implementation [4][5][6][7].
> >>
> >> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> >> per-file DAX hint changes, so that the inode instance will be evicted
> >> and freed as soon as possible once the file is closed and the last
> >> reference to the inode is put. And then when the file gets reopened next
> >> time, the new instantiated inode will reflect the new DAX state.
> > 
> > If we don't cache inode (if no fd is open), will it not have negative
> > performance impact. When we cache inodes, we also have all the dax
> > mappings cached as well. So if a process opens the same file again,
> > it gets all the mappings already in place and it does not have
> > to call FUSE_SETUPMAPPING again.
> > 
> 
> What does 'all the dax mappings cached' mean when 'we cache inodes'?
> 
> If the per-file DAX hint indeed changes for a large sized file, with
> quite many page caches or DAX mapping already in the address space, then
> marking it DONT_CACHE means evicting the inode as soon as possible,
> which means flushing the page caches or removing all DAX mappings. When
> the inode is reopened next time, page cache is re-instantiated or
> FUSE_SETUPMAPPING is called again. Then the negative performance impact
> indeed exist in this case.
> 
> But this performance impact only exist when the per-file DAX hint
> changes halfway, that is, the hint suddenly changes after the virtiofs
> has already mounted in the guest.

Ok, got it. I think I saw that in the code. I had assumed that an inode
will always be marked don't cache. That's not the case. It will be
marked don't cache only if inode property changes (from dax to non-dax or
vice-a-versa). That seems fine.

Vivek


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-10-27 14:45 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-23  9:25 [PATCH v5 0/5] fuse,virtiofs: support per-file DAX Jeffle Xu
2021-09-23  9:25 ` [PATCH v5 1/5] fuse: add fuse_should_enable_dax() helper Jeffle Xu
2021-09-23 18:58   ` Vivek Goyal
2021-09-23  9:25 ` [PATCH v5 2/5] fuse: make DAX mount option a tri-state Jeffle Xu
2021-09-23 19:02   ` Vivek Goyal
2021-09-23 22:26     ` Dave Chinner
2021-09-24  1:02       ` Vivek Goyal
2021-09-24  3:06         ` JeffleXu
2021-09-24 12:43           ` Vivek Goyal
2021-09-26  2:06             ` JeffleXu
2021-09-27  0:21         ` Dave Chinner
2021-09-27  2:28           ` JeffleXu
2021-09-28  3:44             ` Dave Chinner
2021-09-28  5:17               ` JeffleXu
2021-09-28 14:34                 ` Vivek Goyal
2021-09-27 15:32           ` Vivek Goyal
2021-09-28  3:23             ` Dave Chinner
2021-09-23  9:25 ` [PATCH v5 3/5] fuse: support per-file DAX Jeffle Xu
2021-09-23  9:25 ` [PATCH v5 4/5] fuse: enable " Jeffle Xu
2021-09-23  9:25 ` [PATCH v5 5/5] fuse: mark inode DONT_CACHE when per-file DAX hint changes Jeffle Xu
2021-09-23  9:35 ` [virtiofsd PATCH v5 0/2] virtiofsd: support per-file DAX Jeffle Xu
2021-09-23  9:35   ` [virtiofsd PATCH v5 1/2] virtiofsd: add FUSE_ATTR_DAX to fuse protocol Jeffle Xu
2021-09-23  9:35   ` [virtiofsd PATCH v5 2/2] virtiofsd: support per-file DAX Jeffle Xu
2021-09-23 18:57 ` [PATCH v5 0/5] fuse,virtiofs: " Vivek Goyal
2021-10-27  3:42   ` JeffleXu
2021-10-27 14:45     ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.