All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] fuse,virtiofs: support per-file DAX
@ 2021-10-11  3:00 Jeffle Xu
  2021-10-11  3:00 ` [PATCH v6 1/7] fuse: add fuse_should_enable_dax() helper Jeffle Xu
                   ` (8 more replies)
  0 siblings, 9 replies; 37+ messages in thread
From: Jeffle Xu @ 2021-10-11  3:00 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

changes since v5:
Overall Design Changes:
1. virtiofsd now supports ioctl (only FS_IOC_SETFLAGS and
  FS_IOC_FSSETXATTR), so that users inside guest could set/clear
  persistent inode flags now. (FUSE kernel module has already supported
  .ioctl(), virtiofsd need to suuport it.)
2. When FUSE client is mounted with '-o dax=inode', it indicates that
  whether DAX shall be enabled or not for one specific file is
  completely determined by FUSE server while FUSE client has no say on
  it, and the decision whether DAX shall be enabled or not for specific
  file is communicated through FUSE_ATTR_DAX flag of FUSE protocol. The
  algorithm used by virtiofsd to determine whether DAX shall be enabled
  or not is totally implementation specific, and thus the following
  scenario may exist: users inside guest has already set related persistent
  inode flag (i.e. FS_XFLAG_DAX) on corresponding file but FUSE server finnaly
  decides not to enable DAX for this file. This slight semantic difference
  is documented in patch 7. Also because of this, d_mark_dontcache() is
  not called when FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl is done inside
  guest. It's delayed to be done if the FUSE_ATTR_DAX flag **indeed**
  changes (as showed in patch 6).
3. patch 1: slightly modify logic of fuse_should_enable_dax()
4. patch 4: add back negotiation during FUSE_INIT. FUSE client shall
  advertise to FUSE server that it's in per-file DAX mode, so that FUSE
  server may omit querying persistent inode flags on host if FUSE client
  is not mounted in per-file DAX mode, giving querying persistent inode
  flags could be quite expensive.


chanegs since v4:
- drop support for setting/clearing FS_DAX inside guest
- and thus drop the negotiation phase during FUSE_INIT

This patchset adds support of per-file DAX for virtiofs, which is
inspired by Ira Weiny's work on ext4[1] and xfs[2].

Any comment is welcome.

[1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
[2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")

[Purpose]
DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per-file DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.


[Note]
When the per-file DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state, since dynamic switching needs to switch a_ops atomiclly. Ira
Weiny had ever implemented a so called i_aops_sem lock [3] but
eventually gave up since the complexity of the implementation
[4][5][6][7].

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per-file DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later. This is also how ext4/xfs per-file DAX works.

[3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
[4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
[5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
[6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
[7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/

changes since v3:
- bug fix (patch 6): s/"IS_DAX(inode) != newdax"/"!!IS_DAX(inode) !=
  newdax"
- during FUSE_INIT, advertise capability for per-file DAX only when
  mounted as "-o dax=inode" (patch 4)

changes since v2:
- modify fuse_show_options() accordingly to make it compatible with
  new tri-state mount option (patch 2)
- extract FUSE protocol changes into one separate patch (patch 3)
- FUSE server/client need to negotiate if they support per-file DAX
  (patch 4)
- extract DONT_CACHE logic into patch 6/7

v5: https://lore.kernel.org/all/20210923092526.72341-1-jefflexu@linux.alibaba.com/
v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html


Jeffle Xu (7):
  fuse: add fuse_should_enable_dax() helper
  fuse: make DAX mount option a tri-state
  fuse: support per-file DAX in fuse protocol
  fuse: negotiate per-file DAX in FUSE_INIT
  fuse: enable per-file DAX
  fuse: mark inode DONT_CACHE when per-file DAX hint changes
  Documentation/filesystem/dax: record DAX on virtiofs

 Documentation/filesystems/dax.rst | 20 +++++++++++++++--
 fs/fuse/dax.c                     | 36 ++++++++++++++++++++++++++++---
 fs/fuse/file.c                    |  4 ++--
 fs/fuse/fuse_i.h                  | 19 ++++++++++++----
 fs/fuse/inode.c                   | 17 +++++++++++----
 fs/fuse/virtio_fs.c               | 16 ++++++++++++--
 include/uapi/linux/fuse.h         |  9 +++++++-
 7 files changed, 103 insertions(+), 18 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 1/7] fuse: add fuse_should_enable_dax() helper
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
@ 2021-10-11  3:00 ` Jeffle Xu
  2021-10-11  3:00 ` [PATCH v6 2/7] fuse: make DAX mount option a tri-state Jeffle Xu
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Jeffle Xu @ 2021-10-11  3:00 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

This is in prep for following per-file DAX checking.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 79df61ed7481..1eb6538bf1b2 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1332,11 +1332,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-void fuse_dax_inode_init(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
-	if (!fc->dax)
+	if (fc->dax)
+		return true;
+
+	return false;
+}
+
+void fuse_dax_inode_init(struct inode *inode)
+{
+	if (!fuse_should_enable_dax(inode))
 		return;
 
 	inode->i_flags |= S_DAX;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
  2021-10-11  3:00 ` [PATCH v6 1/7] fuse: add fuse_should_enable_dax() helper Jeffle Xu
@ 2021-10-11  3:00 ` Jeffle Xu
  2021-10-18 14:10   ` Vivek Goyal
  2021-10-11  3:00 ` [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol Jeffle Xu
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Jeffle Xu @ 2021-10-11  3:00 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'. To be consistemt with
ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
option is specified, the default behaviour is equal to 'inode'.

By the time this patch is applied, 'inode' mode is actually equal to
'always' mode, before the per-file DAX flag is introduced in the
following patch.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c       | 19 ++++++++++++++++---
 fs/fuse/fuse_i.h    | 14 ++++++++++++--
 fs/fuse/inode.c     | 10 +++++++---
 fs/fuse/virtio_fs.c | 16 ++++++++++++++--
 4 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 1eb6538bf1b2..4c6c64efc950 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1284,11 +1284,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
 	return ret;
 }
 
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
+			struct dax_device *dax_dev)
 {
 	struct fuse_conn_dax *fcd;
 	int err;
 
+	fc->dax_mode = dax_mode;
+
 	if (!dax_dev)
 		return 0;
 
@@ -1335,11 +1338,21 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	unsigned int dax_mode = fc->dax_mode;
+
+	if (dax_mode == FUSE_DAX_NEVER)
+		return false;
 
-	if (fc->dax)
+	/*
+	 * If 'dax=always/inode', fc->dax couldn't be NULL even when fuse
+	 * daemon doesn't support DAX, since the mount routine will fail
+	 * early in this case.
+	 */
+	if (dax_mode == FUSE_DAX_ALWAYS)
 		return true;
 
-	return false;
+	/* dax_mode == FUSE_DAX_INODE */
+	return true;
 }
 
 void fuse_dax_inode_init(struct inode *inode)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 319596df5dc6..5abf9749923f 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -480,6 +480,12 @@ struct fuse_dev {
 	struct list_head entry;
 };
 
+enum fuse_dax_mode {
+	FUSE_DAX_INODE,
+	FUSE_DAX_ALWAYS,
+	FUSE_DAX_NEVER,
+};
+
 struct fuse_fs_context {
 	int fd;
 	struct file *file;
@@ -497,7 +503,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
-	bool dax:1;
+	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
@@ -802,6 +808,9 @@ struct fuse_conn {
 	struct list_head devices;
 
 #ifdef CONFIG_FUSE_DAX
+	/* dax mode: FUSE_DAX_* (always, never or per-file) */
+	enum fuse_dax_mode dax_mode;
+
 	/* Dax specific conn data, non-NULL if DAX is enabled */
 	struct fuse_conn_dax *dax;
 #endif
@@ -1255,7 +1264,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
 ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
 int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
+			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 36cd03114b6d..b4b41683e97e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -742,8 +742,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
 	}
 #ifdef CONFIG_FUSE_DAX
-	if (fc->dax)
-		seq_puts(m, ",dax");
+	if (fc->dax_mode == FUSE_DAX_ALWAYS)
+		seq_puts(m, ",dax=always");
+	else if (fc->dax_mode == FUSE_DAX_NEVER)
+		seq_puts(m, ",dax=never");
+	else if (fc->dax_mode == FUSE_DAX_INODE)
+		seq_puts(m, ",dax=inode");
 #endif
 
 	return 0;
@@ -1493,7 +1497,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	sb->s_subtype = ctx->subtype;
 	ctx->subtype = NULL;
 	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
-		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
+		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
 		if (err)
 			goto err;
 	}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 0ad89c6629d7..58cfbaeb4a7d 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -88,12 +88,21 @@ struct virtio_fs_req_work {
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
+static const struct constant_table dax_param_enums[] = {
+	{"inode",	FUSE_DAX_INODE },
+	{"always",	FUSE_DAX_ALWAYS },
+	{"never",	FUSE_DAX_NEVER },
+	{}
+};
+
 enum {
 	OPT_DAX,
+	OPT_DAX_ENUM,
 };
 
 static const struct fs_parameter_spec virtio_fs_parameters[] = {
 	fsparam_flag("dax", OPT_DAX),
+	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
 	{}
 };
 
@@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
 
 	switch (opt) {
 	case OPT_DAX:
-		ctx->dax = 1;
+		ctx->dax_mode = FUSE_DAX_ALWAYS;
+		break;
+	case OPT_DAX_ENUM:
+		ctx->dax_mode = result.uint_32;
 		break;
 	default:
 		return -EINVAL;
@@ -1326,7 +1338,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
 
 	/* virtiofs allocates and installs its own fuse devices */
 	ctx->fudptr = NULL;
-	if (ctx->dax) {
+	if (ctx->dax_mode != FUSE_DAX_NEVER) {
 		if (!fs->dax_dev) {
 			err = -EINVAL;
 			pr_err("virtio-fs: dax can't be enabled as filesystem"
-- 
2.27.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
  2021-10-11  3:00 ` [PATCH v6 1/7] fuse: add fuse_should_enable_dax() helper Jeffle Xu
  2021-10-11  3:00 ` [PATCH v6 2/7] fuse: make DAX mount option a tri-state Jeffle Xu
@ 2021-10-11  3:00 ` Jeffle Xu
  2021-10-18 14:14   ` Vivek Goyal
  2021-10-11  3:00 ` [PATCH v6 4/7] fuse: negotiate per-file DAX in FUSE_INIT Jeffle Xu
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Jeffle Xu @ 2021-10-11  3:00 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

Expand the fuse protocol to support per-file DAX.

FUSE_PERFILE_DAX flag is added indicating if fuse server/client
supporting per-file DAX. It can be conveyed in both FUSE_INIT request
and reply.

FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
corresponding file. It is conveyed in FUSE_LOOKUP reply.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/uapi/linux/fuse.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 36ed092227fa..15a1f5fc0797 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -184,6 +184,9 @@
  *
  *  7.34
  *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -219,7 +222,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 34
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -336,6 +339,7 @@ struct fuse_file_lock {
  *			write/truncate sgid is killed only if file has group
  *			execute permission. (Same as Linux VFS behavior).
  * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
+ * FUSE_PERFILE_DAX:	kernel supports per-file DAX
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -367,6 +371,7 @@ struct fuse_file_lock {
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
 #define FUSE_SETXATTR_EXT	(1 << 29)
+#define FUSE_PERFILE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -449,8 +454,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 4/7] fuse: negotiate per-file DAX in FUSE_INIT
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (2 preceding siblings ...)
  2021-10-11  3:00 ` [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol Jeffle Xu
@ 2021-10-11  3:00 ` Jeffle Xu
  2021-10-18 14:30   ` Vivek Goyal
  2021-10-11  3:00 ` [PATCH v6 5/7] fuse: enable per-file DAX Jeffle Xu
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Jeffle Xu @ 2021-10-11  3:00 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

Among the FUSE_INIT phase, client shall advertise per-file DAX if it's
mounted with "-o dax=inode". Then server is aware that client is in
per-file DAX mode, and will construct per-inode DAX attribute
accordingly.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b4b41683e97e..f4ad99e2415b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1203,6 +1203,8 @@ void fuse_send_init(struct fuse_mount *fm)
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		ia->in.flags |= FUSE_MAP_ALIGNMENT;
+	if (fm->fc->dax_mode == FUSE_DAX_INODE)
+		ia->in.flags |= FUSE_PERFILE_DAX;
 #endif
 	if (fm->fc->auto_submounts)
 		ia->in.flags |= FUSE_SUBMOUNTS;
-- 
2.27.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 5/7] fuse: enable per-file DAX
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (3 preceding siblings ...)
  2021-10-11  3:00 ` [PATCH v6 4/7] fuse: negotiate per-file DAX in FUSE_INIT Jeffle Xu
@ 2021-10-11  3:00 ` Jeffle Xu
  2021-10-18 15:11   ` Vivek Goyal
  2021-10-11  3:00 ` [PATCH v6 6/7] fuse: mark inode DONT_CACHE when per-file DAX hint changes Jeffle Xu
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Jeffle Xu @ 2021-10-11  3:00 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per-file DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.

The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
DAX should be enabled or not for corresponding file. Currently the state
whether DAX is enabled or not for the file is initialized only when
inode is instantiated.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 8 ++++----
 fs/fuse/file.c   | 4 ++--
 fs/fuse/fuse_i.h | 4 ++--
 fs/fuse/inode.c  | 2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 4c6c64efc950..15bde36829b8 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1335,7 +1335,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-static bool fuse_should_enable_dax(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	unsigned int dax_mode = fc->dax_mode;
@@ -1352,12 +1352,12 @@ static bool fuse_should_enable_dax(struct inode *inode)
 		return true;
 
 	/* dax_mode == FUSE_DAX_INODE */
-	return true;
+	return flags & FUSE_ATTR_DAX;
 }
 
-void fuse_dax_inode_init(struct inode *inode)
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 {
-	if (!fuse_should_enable_dax(inode))
+	if (!fuse_should_enable_dax(inode, flags))
 		return;
 
 	inode->i_flags |= S_DAX;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 11404f8c21c7..40c667a48cf6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3163,7 +3163,7 @@ static const struct address_space_operations fuse_file_aops  = {
 	.write_end	= fuse_write_end,
 };
 
-void fuse_init_file_inode(struct inode *inode)
+void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 
@@ -3177,5 +3177,5 @@ void fuse_init_file_inode(struct inode *inode)
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
-		fuse_dax_inode_init(inode);
+		fuse_dax_inode_init(inode, flags);
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 5abf9749923f..0270a41c31d7 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1016,7 +1016,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 /**
  * Initialize file operations on a regular file
  */
-void fuse_init_file_inode(struct inode *inode);
+void fuse_init_file_inode(struct inode *inode, unsigned int flags);
 
 /**
  * Initialize inode operations on regular files and special files
@@ -1268,7 +1268,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
 			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
-void fuse_dax_inode_init(struct inode *inode);
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index f4ad99e2415b..73f19cd6e702 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -280,7 +280,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	inode->i_ctime.tv_nsec = attr->ctimensec;
 	if (S_ISREG(inode->i_mode)) {
 		fuse_init_common(inode);
-		fuse_init_file_inode(inode);
+		fuse_init_file_inode(inode, attr->flags);
 	} else if (S_ISDIR(inode->i_mode))
 		fuse_init_dir(inode);
 	else if (S_ISLNK(inode->i_mode))
-- 
2.27.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 6/7] fuse: mark inode DONT_CACHE when per-file DAX hint changes
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (4 preceding siblings ...)
  2021-10-11  3:00 ` [PATCH v6 5/7] fuse: enable per-file DAX Jeffle Xu
@ 2021-10-11  3:00 ` Jeffle Xu
  2021-10-18 15:19   ` Vivek Goyal
  2021-10-11  3:00 ` [PATCH v6 7/7] Documentation/filesystem/dax: record DAX on virtiofs Jeffle Xu
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Jeffle Xu @ 2021-10-11  3:00 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

When the per-file DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per-file DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 9 +++++++++
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 15bde36829b8..ca083c13f5e8 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 	inode->i_data.a_ops = &fuse_dax_file_aops;
 }
 
+void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
+	if (fc->dax_mode == FUSE_DAX_INODE &&
+	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
+		d_mark_dontcache(inode);
+}
+
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
 {
 	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0270a41c31d7..bb2c11e0311a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1270,6 +1270,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 73f19cd6e702..cf934c2ba761 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -268,6 +268,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 		if (inval)
 			invalidate_inode_pages2(inode->i_mapping);
 	}
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX))
+		fuse_dax_dontcache(inode, attr->flags);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 7/7] Documentation/filesystem/dax: record DAX on virtiofs
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (5 preceding siblings ...)
  2021-10-11  3:00 ` [PATCH v6 6/7] fuse: mark inode DONT_CACHE when per-file DAX hint changes Jeffle Xu
@ 2021-10-11  3:00 ` Jeffle Xu
  2021-10-15  3:33 ` [PATCH v6 0/7] fuse,virtiofs: support per-file DAX JeffleXu
  2021-10-18 15:21 ` Vivek Goyal
  8 siblings, 0 replies; 37+ messages in thread
From: Jeffle Xu @ 2021-10-11  3:00 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

Record DAX on virtiofs and the semantic difference with that on ext4
and xfs.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 Documentation/filesystems/dax.rst | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/dax.rst b/Documentation/filesystems/dax.rst
index 9a1b8fd9e82b..e3b30429d703 100644
--- a/Documentation/filesystems/dax.rst
+++ b/Documentation/filesystems/dax.rst
@@ -23,8 +23,8 @@ on it as usual.  The `DAX` code currently only supports files with a block
 size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block
 size when creating the filesystem.
 
-Currently 3 filesystems support `DAX`: ext2, ext4 and xfs.  Enabling `DAX` on them
-is different.
+Currently 4 filesystems support `DAX`: ext2, ext4, xfs and virtiofs.
+Enabling `DAX` on them is different.
 
 Enabling DAX on ext2
 --------------------
@@ -168,6 +168,22 @@ if the underlying media does not support dax and/or the filesystem is
 overridden with a mount option.
 
 
+Enabling DAX on virtiofs
+----------------------------
+The semantic of DAX on virtiofs is basically equal to that on ext4 and xfs,
+except that when '-o dax=inode' is specified, virtiofs client derives the hint
+whether DAX shall be enabled or not from virtiofs server through FUSE protocol,
+rather than the persistent `FS_XFLAG_DAX` flag. That is, whether DAX shall be
+enabled or not is completely determined by virtiofs server, while virtiofs
+server itself may deploy various algorithm making this decision, e.g. depending
+on the persistent `FS_XFLAG_DAX` flag on the host.
+
+It is still supported to set or clear persistent `FS_XFLAG_DAX` flag inside
+guest, but it is not guaranteed that DAX will be enabled or disabled for
+corresponding file then. Users inside guest still need to call statx(2) and
+check the statx flag `STATX_ATTR_DAX` to see if DAX is enabled for this file.
+
+
 Implementation Tips for Block Driver Writers
 --------------------------------------------
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 0/7] fuse,virtiofs: support per-file DAX
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (6 preceding siblings ...)
  2021-10-11  3:00 ` [PATCH v6 7/7] Documentation/filesystem/dax: record DAX on virtiofs Jeffle Xu
@ 2021-10-15  3:33 ` JeffleXu
  2021-10-18 15:21 ` Vivek Goyal
  8 siblings, 0 replies; 37+ messages in thread
From: JeffleXu @ 2021-10-15  3:33 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi

Hi, any comment?

On 10/11/21 11:00 AM, Jeffle Xu wrote:
> changes since v5:
> Overall Design Changes:
> 1. virtiofsd now supports ioctl (only FS_IOC_SETFLAGS and
>   FS_IOC_FSSETXATTR), so that users inside guest could set/clear
>   persistent inode flags now. (FUSE kernel module has already supported
>   .ioctl(), virtiofsd need to suuport it.)
> 2. When FUSE client is mounted with '-o dax=inode', it indicates that
>   whether DAX shall be enabled or not for one specific file is
>   completely determined by FUSE server while FUSE client has no say on
>   it, and the decision whether DAX shall be enabled or not for specific
>   file is communicated through FUSE_ATTR_DAX flag of FUSE protocol. The
>   algorithm used by virtiofsd to determine whether DAX shall be enabled
>   or not is totally implementation specific, and thus the following
>   scenario may exist: users inside guest has already set related persistent
>   inode flag (i.e. FS_XFLAG_DAX) on corresponding file but FUSE server finnaly
>   decides not to enable DAX for this file. This slight semantic difference
>   is documented in patch 7. Also because of this, d_mark_dontcache() is
>   not called when FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl is done inside
>   guest. It's delayed to be done if the FUSE_ATTR_DAX flag **indeed**
>   changes (as showed in patch 6).
> 3. patch 1: slightly modify logic of fuse_should_enable_dax()
> 4. patch 4: add back negotiation during FUSE_INIT. FUSE client shall
>   advertise to FUSE server that it's in per-file DAX mode, so that FUSE
>   server may omit querying persistent inode flags on host if FUSE client
>   is not mounted in per-file DAX mode, giving querying persistent inode
>   flags could be quite expensive.
> 
> 
> chanegs since v4:
> - drop support for setting/clearing FS_DAX inside guest
> - and thus drop the negotiation phase during FUSE_INIT
> 
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Any comment is welcome.
> 
> [1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
> [2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")
> 
> [Purpose]
> DAX may be limited in some specific situation. When the number of usable
> DAX windows is under watermark, the recalim routine will be triggered to
> reclaim some DAX windows. It may have a negative impact on the
> performance, since some processes may need to wait for DAX windows to be
> recalimed and reused then. To mitigate the performance degradation, the
> overall DAX window need to be expanded larger.
> 
> However, simply expanding the DAX window may not be a good deal in some
> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest, which is greater than the memory footprint if it uses
> guest page cache when DAX disabled. Thus it'd better disable DAX for
> those files smaller than 32KB, to reduce the demand for DAX window and
> thus avoid the unworthy memory overhead.
> 
> Per-file DAX feature is introduced to address this issue, by offering a
> finer grained control for dax to users, trying to achieve a balance
> between performance and memory overhead.
> 
> 
> [Note]
> When the per-file DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state, since dynamic switching needs to switch a_ops atomiclly. Ira
> Weiny had ever implemented a so called i_aops_sem lock [3] but
> eventually gave up since the complexity of the implementation
> [4][5][6][7].
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per-file DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later. This is also how ext4/xfs per-file DAX works.
> 
> [3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
> [4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
> [5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
> [6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> [7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/
> 
> changes since v3:
> - bug fix (patch 6): s/"IS_DAX(inode) != newdax"/"!!IS_DAX(inode) !=
>   newdax"
> - during FUSE_INIT, advertise capability for per-file DAX only when
>   mounted as "-o dax=inode" (patch 4)
> 
> changes since v2:
> - modify fuse_show_options() accordingly to make it compatible with
>   new tri-state mount option (patch 2)
> - extract FUSE protocol changes into one separate patch (patch 3)
> - FUSE server/client need to negotiate if they support per-file DAX
>   (patch 4)
> - extract DONT_CACHE logic into patch 6/7
> 
> v5: https://lore.kernel.org/all/20210923092526.72341-1-jefflexu@linux.alibaba.com/
> v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
> v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
> v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
> v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html
> 
> 
> Jeffle Xu (7):
>   fuse: add fuse_should_enable_dax() helper
>   fuse: make DAX mount option a tri-state
>   fuse: support per-file DAX in fuse protocol
>   fuse: negotiate per-file DAX in FUSE_INIT
>   fuse: enable per-file DAX
>   fuse: mark inode DONT_CACHE when per-file DAX hint changes
>   Documentation/filesystem/dax: record DAX on virtiofs
> 
>  Documentation/filesystems/dax.rst | 20 +++++++++++++++--
>  fs/fuse/dax.c                     | 36 ++++++++++++++++++++++++++++---
>  fs/fuse/file.c                    |  4 ++--
>  fs/fuse/fuse_i.h                  | 19 ++++++++++++----
>  fs/fuse/inode.c                   | 17 +++++++++++----
>  fs/fuse/virtio_fs.c               | 16 ++++++++++++--
>  include/uapi/linux/fuse.h         |  9 +++++++-
>  7 files changed, 103 insertions(+), 18 deletions(-)
> 

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-11  3:00 ` [PATCH v6 2/7] fuse: make DAX mount option a tri-state Jeffle Xu
@ 2021-10-18 14:10   ` Vivek Goyal
  2021-10-20  2:52     ` JeffleXu
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-18 14:10 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> operate the same which is equivalent to 'always'. To be consistemt with
> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> option is specified, the default behaviour is equal to 'inode'.

Hi Jeffle,

I am not sure when  -o "dax=inode"  is used as a default? If user
specifies, "-o dax" then it is equal to "-o dax=always", otherwise
user will explicitly specify "-o dax=always/never/inode". So when
is dax=inode is used as default?

> 
> By the time this patch is applied, 'inode' mode is actually equal to
> 'always' mode, before the per-file DAX flag is introduced in the
> following patch.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c       | 19 ++++++++++++++++---
>  fs/fuse/fuse_i.h    | 14 ++++++++++++--
>  fs/fuse/inode.c     | 10 +++++++---
>  fs/fuse/virtio_fs.c | 16 ++++++++++++++--
>  4 files changed, 49 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 1eb6538bf1b2..4c6c64efc950 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1284,11 +1284,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
>  	return ret;
>  }
>  
> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
> +			struct dax_device *dax_dev)
>  {
>  	struct fuse_conn_dax *fcd;
>  	int err;
>  
> +	fc->dax_mode = dax_mode;
> +
>  	if (!dax_dev)
>  		return 0;
>  
> @@ -1335,11 +1338,21 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  static bool fuse_should_enable_dax(struct inode *inode)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
> +	unsigned int dax_mode = fc->dax_mode;
> +
> +	if (dax_mode == FUSE_DAX_NEVER)
> +		return false;
>  
> -	if (fc->dax)
> +	/*
> +	 * If 'dax=always/inode', fc->dax couldn't be NULL even when fuse
> +	 * daemon doesn't support DAX, since the mount routine will fail
> +	 * early in this case.
> +	 */
> +	if (dax_mode == FUSE_DAX_ALWAYS)
>  		return true;
>  
> -	return false;
> +	/* dax_mode == FUSE_DAX_INODE */
> +	return true;

So as of this patch except FUSE_DAX_NEVER return true and this will
change in later patches for FUSE_DAX_INODE? If that's the case, keep
it simple in this patch and change it later in the patch series.

fuse_should_enable_dax()
{
	if (dax_mode == FUSE_DAX_NEVER)
		return false;
	return true;
}

>  }
>  
>  void fuse_dax_inode_init(struct inode *inode)
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 319596df5dc6..5abf9749923f 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -480,6 +480,12 @@ struct fuse_dev {
>  	struct list_head entry;
>  };
>  
> +enum fuse_dax_mode {
> +	FUSE_DAX_INODE,
> +	FUSE_DAX_ALWAYS,
> +	FUSE_DAX_NEVER,
> +};
> +
>  struct fuse_fs_context {
>  	int fd;
>  	struct file *file;
> @@ -497,7 +503,7 @@ struct fuse_fs_context {
>  	bool no_control:1;
>  	bool no_force_umount:1;
>  	bool legacy_opts_show:1;
> -	bool dax:1;
> +	enum fuse_dax_mode dax_mode;
>  	unsigned int max_read;
>  	unsigned int blksize;
>  	const char *subtype;
> @@ -802,6 +808,9 @@ struct fuse_conn {
>  	struct list_head devices;
>  
>  #ifdef CONFIG_FUSE_DAX
> +	/* dax mode: FUSE_DAX_* (always, never or per-file) */
> +	enum fuse_dax_mode dax_mode;
> +
>  	/* Dax specific conn data, non-NULL if DAX is enabled */
>  	struct fuse_conn_dax *dax;
>  #endif
> @@ -1255,7 +1264,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
>  ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
>  int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
>  int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
> +			struct dax_device *dax_dev);
>  void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 36cd03114b6d..b4b41683e97e 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -742,8 +742,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>  			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
>  	}
>  #ifdef CONFIG_FUSE_DAX
> -	if (fc->dax)
> -		seq_puts(m, ",dax");
> +	if (fc->dax_mode == FUSE_DAX_ALWAYS)
> +		seq_puts(m, ",dax=always");

So if somebody mounts with "-o dax" then kernel previous to this change
will show "dax" and kernel after this change will show "dax=always"?

How about not change the behavior. Keep a mode say FUSE_DAX_LEGACY which
will be set when user specifies "-o dax". Internally FUSE_DAX_LEGACY
and FUSE_DAX_ALWAYS will be same.

	if (fc->dax_mode == FUSE_DAX_LEGACY)
		seq_puts(m, ",dax");


Thanks
Vivek

> +	else if (fc->dax_mode == FUSE_DAX_NEVER)
> +		seq_puts(m, ",dax=never");
> +	else if (fc->dax_mode == FUSE_DAX_INODE)
> +		seq_puts(m, ",dax=inode");
>  #endif
>  
>  	return 0;
> @@ -1493,7 +1497,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  	sb->s_subtype = ctx->subtype;
>  	ctx->subtype = NULL;
>  	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
> -		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
> +		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
>  		if (err)
>  			goto err;
>  	}
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 0ad89c6629d7..58cfbaeb4a7d 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -88,12 +88,21 @@ struct virtio_fs_req_work {
>  static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>  				 struct fuse_req *req, bool in_flight);
>  
> +static const struct constant_table dax_param_enums[] = {
> +	{"inode",	FUSE_DAX_INODE },
> +	{"always",	FUSE_DAX_ALWAYS },
> +	{"never",	FUSE_DAX_NEVER },
> +	{}
> +};
> +
>  enum {
>  	OPT_DAX,
> +	OPT_DAX_ENUM,
>  };
>  
>  static const struct fs_parameter_spec virtio_fs_parameters[] = {
>  	fsparam_flag("dax", OPT_DAX),
> +	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
>  	{}
>  };
>  
> @@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
>  
>  	switch (opt) {
>  	case OPT_DAX:
> -		ctx->dax = 1;
> +		ctx->dax_mode = FUSE_DAX_ALWAYS;
> +		break;
> +	case OPT_DAX_ENUM:
> +		ctx->dax_mode = result.uint_32;
>  		break;
>  	default:
>  		return -EINVAL;
> @@ -1326,7 +1338,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
>  
>  	/* virtiofs allocates and installs its own fuse devices */
>  	ctx->fudptr = NULL;
> -	if (ctx->dax) {
> +	if (ctx->dax_mode != FUSE_DAX_NEVER) {
>  		if (!fs->dax_dev) {
>  			err = -EINVAL;
>  			pr_err("virtio-fs: dax can't be enabled as filesystem"
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol
  2021-10-11  3:00 ` [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol Jeffle Xu
@ 2021-10-18 14:14   ` Vivek Goyal
  2021-10-18 14:20     ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-18 14:14 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Oct 11, 2021 at 11:00:48AM +0800, Jeffle Xu wrote:
> Expand the fuse protocol to support per-file DAX.
> 
> FUSE_PERFILE_DAX flag is added indicating if fuse server/client

Should we call this flag FUSE_INODE_DAX instead? It is per inode property?

Vivek

> supporting per-file DAX. It can be conveyed in both FUSE_INIT request
> and reply.
> 
> FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
> corresponding file. It is conveyed in FUSE_LOOKUP reply.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  include/uapi/linux/fuse.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index 36ed092227fa..15a1f5fc0797 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -184,6 +184,9 @@
>   *
>   *  7.34
>   *  - add FUSE_SYNCFS
> + *
> + *  7.35
> + *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
>   */
>  
>  #ifndef _LINUX_FUSE_H
> @@ -219,7 +222,7 @@
>  #define FUSE_KERNEL_VERSION 7
>  
>  /** Minor version number of this interface */
> -#define FUSE_KERNEL_MINOR_VERSION 34
> +#define FUSE_KERNEL_MINOR_VERSION 35
>  
>  /** The node ID of the root inode */
>  #define FUSE_ROOT_ID 1
> @@ -336,6 +339,7 @@ struct fuse_file_lock {
>   *			write/truncate sgid is killed only if file has group
>   *			execute permission. (Same as Linux VFS behavior).
>   * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
> + * FUSE_PERFILE_DAX:	kernel supports per-file DAX
>   */
>  #define FUSE_ASYNC_READ		(1 << 0)
>  #define FUSE_POSIX_LOCKS	(1 << 1)
> @@ -367,6 +371,7 @@ struct fuse_file_lock {
>  #define FUSE_SUBMOUNTS		(1 << 27)
>  #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
>  #define FUSE_SETXATTR_EXT	(1 << 29)
> +#define FUSE_PERFILE_DAX	(1 << 30)
>  
>  /**
>   * CUSE INIT request/reply flags
> @@ -449,8 +454,10 @@ struct fuse_file_lock {
>   * fuse_attr flags
>   *
>   * FUSE_ATTR_SUBMOUNT: Object is a submount root
> + * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
>   */
>  #define FUSE_ATTR_SUBMOUNT      (1 << 0)
> +#define FUSE_ATTR_DAX		(1 << 1)
>  
>  /**
>   * Open flags
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol
  2021-10-18 14:14   ` Vivek Goyal
@ 2021-10-18 14:20     ` Vivek Goyal
  2021-10-20  3:04       ` JeffleXu
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-18 14:20 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Oct 18, 2021 at 10:14:04AM -0400, Vivek Goyal wrote:
> On Mon, Oct 11, 2021 at 11:00:48AM +0800, Jeffle Xu wrote:
> > Expand the fuse protocol to support per-file DAX.
> > 
> > FUSE_PERFILE_DAX flag is added indicating if fuse server/client
> 
> Should we call this flag FUSE_INODE_DAX instead? It is per inode property?
> 

I realized that you are using FUSE_DAX_INODE to represent dax mode. So it
will be confusing to use FUSE_INODE_DAX as protocol flag. How about
FUSE_INODE_DAX_STATE instead?

Vivek

> Vivek
> 
> > supporting per-file DAX. It can be conveyed in both FUSE_INIT request
> > and reply.
> > 
> > FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
> > corresponding file. It is conveyed in FUSE_LOOKUP reply.
> > 
> > Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> > ---
> >  include/uapi/linux/fuse.h | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> > index 36ed092227fa..15a1f5fc0797 100644
> > --- a/include/uapi/linux/fuse.h
> > +++ b/include/uapi/linux/fuse.h
> > @@ -184,6 +184,9 @@
> >   *
> >   *  7.34
> >   *  - add FUSE_SYNCFS
> > + *
> > + *  7.35
> > + *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
> >   */
> >  
> >  #ifndef _LINUX_FUSE_H
> > @@ -219,7 +222,7 @@
> >  #define FUSE_KERNEL_VERSION 7
> >  
> >  /** Minor version number of this interface */
> > -#define FUSE_KERNEL_MINOR_VERSION 34
> > +#define FUSE_KERNEL_MINOR_VERSION 35
> >  
> >  /** The node ID of the root inode */
> >  #define FUSE_ROOT_ID 1
> > @@ -336,6 +339,7 @@ struct fuse_file_lock {
> >   *			write/truncate sgid is killed only if file has group
> >   *			execute permission. (Same as Linux VFS behavior).
> >   * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
> > + * FUSE_PERFILE_DAX:	kernel supports per-file DAX
> >   */
> >  #define FUSE_ASYNC_READ		(1 << 0)
> >  #define FUSE_POSIX_LOCKS	(1 << 1)
> > @@ -367,6 +371,7 @@ struct fuse_file_lock {
> >  #define FUSE_SUBMOUNTS		(1 << 27)
> >  #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
> >  #define FUSE_SETXATTR_EXT	(1 << 29)
> > +#define FUSE_PERFILE_DAX	(1 << 30)
> >  
> >  /**
> >   * CUSE INIT request/reply flags
> > @@ -449,8 +454,10 @@ struct fuse_file_lock {
> >   * fuse_attr flags
> >   *
> >   * FUSE_ATTR_SUBMOUNT: Object is a submount root
> > + * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
> >   */
> >  #define FUSE_ATTR_SUBMOUNT      (1 << 0)
> > +#define FUSE_ATTR_DAX		(1 << 1)
> >  
> >  /**
> >   * Open flags
> > -- 
> > 2.27.0
> > 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/7] fuse: negotiate per-file DAX in FUSE_INIT
  2021-10-11  3:00 ` [PATCH v6 4/7] fuse: negotiate per-file DAX in FUSE_INIT Jeffle Xu
@ 2021-10-18 14:30   ` Vivek Goyal
  2021-10-20  3:10     ` JeffleXu
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-18 14:30 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Oct 11, 2021 at 11:00:49AM +0800, Jeffle Xu wrote:
> Among the FUSE_INIT phase, client shall advertise per-file DAX if it's
> mounted with "-o dax=inode". Then server is aware that client is in
> per-file DAX mode, and will construct per-inode DAX attribute
> accordingly.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/inode.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b4b41683e97e..f4ad99e2415b 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1203,6 +1203,8 @@ void fuse_send_init(struct fuse_mount *fm)
>  #ifdef CONFIG_FUSE_DAX
>  	if (fm->fc->dax)
>  		ia->in.flags |= FUSE_MAP_ALIGNMENT;
> +	if (fm->fc->dax_mode == FUSE_DAX_INODE)
> +		ia->in.flags |= FUSE_PERFILE_DAX;

Are you not keeping track of server's response whether server supports
per inode dax or not. Client might be new and server might be old and
server might not support per inode dax. In that case, we probably 
should error out if user mounted with "-o dax=inode".

Vivek

>  #endif
>  	if (fm->fc->auto_submounts)
>  		ia->in.flags |= FUSE_SUBMOUNTS;
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 5/7] fuse: enable per-file DAX
  2021-10-11  3:00 ` [PATCH v6 5/7] fuse: enable per-file DAX Jeffle Xu
@ 2021-10-18 15:11   ` Vivek Goyal
  0 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2021-10-18 15:11 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Oct 11, 2021 at 11:00:50AM +0800, Jeffle Xu wrote:
> DAX may be limited in some specific situation. When the number of usable
> DAX windows is under watermark, the recalim routine will be triggered to
> reclaim some DAX windows. It may have a negative impact on the
> performance, since some processes may need to wait for DAX windows to be
> recalimed and reused then. To mitigate the performance degradation, the
> overall DAX window need to be expanded larger.
> 
> However, simply expanding the DAX window may not be a good deal in some
> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest, which is greater than the memory footprint if it uses
> guest page cache when DAX disabled. Thus it'd better disable DAX for
> those files smaller than 32KB, to reduce the demand for DAX window and
> thus avoid the unworthy memory overhead.
> 
> Per-file DAX feature is introduced to address this issue, by offering a
> finer grained control for dax to users, trying to achieve a balance
> between performance and memory overhead.
> 
> The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
> DAX should be enabled or not for corresponding file. Currently the state
> whether DAX is enabled or not for the file is initialized only when
> inode is instantiated.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c    | 8 ++++----
>  fs/fuse/file.c   | 4 ++--
>  fs/fuse/fuse_i.h | 4 ++--
>  fs/fuse/inode.c  | 2 +-
>  4 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 4c6c64efc950..15bde36829b8 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1335,7 +1335,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  	.invalidatepage	= noop_invalidatepage,
>  };
>  
> -static bool fuse_should_enable_dax(struct inode *inode)
> +static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
>  	unsigned int dax_mode = fc->dax_mode;
> @@ -1352,12 +1352,12 @@ static bool fuse_should_enable_dax(struct inode *inode)
>  		return true;
>  
>  	/* dax_mode == FUSE_DAX_INODE */
> -	return true;
> +	return flags & FUSE_ATTR_DAX;
>  }
>  
> -void fuse_dax_inode_init(struct inode *inode)
> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
							      ^^^^
These are attr->flags. May be call this variable attr_flags so that
it is more clear.

Vivek

>  {
> -	if (!fuse_should_enable_dax(inode))
> +	if (!fuse_should_enable_dax(inode, flags))
>  		return;
>  
>  	inode->i_flags |= S_DAX;
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 11404f8c21c7..40c667a48cf6 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -3163,7 +3163,7 @@ static const struct address_space_operations fuse_file_aops  = {
>  	.write_end	= fuse_write_end,
>  };
>  
> -void fuse_init_file_inode(struct inode *inode)
> +void fuse_init_file_inode(struct inode *inode, unsigned int flags)
>  {
>  	struct fuse_inode *fi = get_fuse_inode(inode);
>  
> @@ -3177,5 +3177,5 @@ void fuse_init_file_inode(struct inode *inode)
>  	fi->writepages = RB_ROOT;
>  
>  	if (IS_ENABLED(CONFIG_FUSE_DAX))
> -		fuse_dax_inode_init(inode);
> +		fuse_dax_inode_init(inode, flags);
>  }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 5abf9749923f..0270a41c31d7 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1016,7 +1016,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
>  /**
>   * Initialize file operations on a regular file
>   */
> -void fuse_init_file_inode(struct inode *inode);
> +void fuse_init_file_inode(struct inode *inode, unsigned int flags);
>  
>  /**
>   * Initialize inode operations on regular files and special files
> @@ -1268,7 +1268,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
>  			struct dax_device *dax_dev);
>  void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
> -void fuse_dax_inode_init(struct inode *inode);
> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index f4ad99e2415b..73f19cd6e702 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -280,7 +280,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
>  	inode->i_ctime.tv_nsec = attr->ctimensec;
>  	if (S_ISREG(inode->i_mode)) {
>  		fuse_init_common(inode);
> -		fuse_init_file_inode(inode);
> +		fuse_init_file_inode(inode, attr->flags);
>  	} else if (S_ISDIR(inode->i_mode))
>  		fuse_init_dir(inode);
>  	else if (S_ISLNK(inode->i_mode))
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 6/7] fuse: mark inode DONT_CACHE when per-file DAX hint changes
  2021-10-11  3:00 ` [PATCH v6 6/7] fuse: mark inode DONT_CACHE when per-file DAX hint changes Jeffle Xu
@ 2021-10-18 15:19   ` Vivek Goyal
  2021-10-27  5:05     ` JeffleXu
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-18 15:19 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Oct 11, 2021 at 11:00:51AM +0800, Jeffle Xu wrote:
> When the per-file DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per-file DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 15bde36829b8..ca083c13f5e8 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +	if (fc->dax_mode == FUSE_DAX_INODE &&
> +	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
> +		d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>  {
>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 0270a41c31d7..bb2c11e0311a 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1270,6 +1270,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 73f19cd6e702..cf934c2ba761 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -268,6 +268,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>  		if (inval)
>  			invalidate_inode_pages2(inode->i_mapping);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
> +		fuse_dax_dontcache(inode, attr->flags);

Should we give this function more generic name. Say
fuse_dax_change_attributes(). And let that function decide what attributes
have changed and does it need to take any action.

Vivek

>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 0/7] fuse,virtiofs: support per-file DAX
  2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
                   ` (7 preceding siblings ...)
  2021-10-15  3:33 ` [PATCH v6 0/7] fuse,virtiofs: support per-file DAX JeffleXu
@ 2021-10-18 15:21 ` Vivek Goyal
  2021-10-20  5:22   ` JeffleXu
  8 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-18 15:21 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Mon, Oct 11, 2021 at 11:00:45AM +0800, Jeffle Xu wrote:
> changes since v5:
> Overall Design Changes:
> 1. virtiofsd now supports ioctl (only FS_IOC_SETFLAGS and
>   FS_IOC_FSSETXATTR), so that users inside guest could set/clear
>   persistent inode flags now. (FUSE kernel module has already supported
>   .ioctl(), virtiofsd need to suuport it.)

So no changes needed in fuse side (kernel) to support FS_IOC_FSSETXATTR?
Only virtiofsd needs to be changed. That sounds good.

Vivek

> 2. When FUSE client is mounted with '-o dax=inode', it indicates that
>   whether DAX shall be enabled or not for one specific file is
>   completely determined by FUSE server while FUSE client has no say on
>   it, and the decision whether DAX shall be enabled or not for specific
>   file is communicated through FUSE_ATTR_DAX flag of FUSE protocol. The
>   algorithm used by virtiofsd to determine whether DAX shall be enabled
>   or not is totally implementation specific, and thus the following
>   scenario may exist: users inside guest has already set related persistent
>   inode flag (i.e. FS_XFLAG_DAX) on corresponding file but FUSE server finnaly
>   decides not to enable DAX for this file. This slight semantic difference
>   is documented in patch 7. Also because of this, d_mark_dontcache() is
>   not called when FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl is done inside
>   guest. It's delayed to be done if the FUSE_ATTR_DAX flag **indeed**
>   changes (as showed in patch 6).
> 3. patch 1: slightly modify logic of fuse_should_enable_dax()
> 4. patch 4: add back negotiation during FUSE_INIT. FUSE client shall
>   advertise to FUSE server that it's in per-file DAX mode, so that FUSE
>   server may omit querying persistent inode flags on host if FUSE client
>   is not mounted in per-file DAX mode, giving querying persistent inode
>   flags could be quite expensive.
> 
> 
> chanegs since v4:
> - drop support for setting/clearing FS_DAX inside guest
> - and thus drop the negotiation phase during FUSE_INIT
> 
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Any comment is welcome.
> 
> [1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
> [2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")
> 
> [Purpose]
> DAX may be limited in some specific situation. When the number of usable
> DAX windows is under watermark, the recalim routine will be triggered to
> reclaim some DAX windows. It may have a negative impact on the
> performance, since some processes may need to wait for DAX windows to be
> recalimed and reused then. To mitigate the performance degradation, the
> overall DAX window need to be expanded larger.
> 
> However, simply expanding the DAX window may not be a good deal in some
> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest, which is greater than the memory footprint if it uses
> guest page cache when DAX disabled. Thus it'd better disable DAX for
> those files smaller than 32KB, to reduce the demand for DAX window and
> thus avoid the unworthy memory overhead.
> 
> Per-file DAX feature is introduced to address this issue, by offering a
> finer grained control for dax to users, trying to achieve a balance
> between performance and memory overhead.
> 
> 
> [Note]
> When the per-file DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state, since dynamic switching needs to switch a_ops atomiclly. Ira
> Weiny had ever implemented a so called i_aops_sem lock [3] but
> eventually gave up since the complexity of the implementation
> [4][5][6][7].
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per-file DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later. This is also how ext4/xfs per-file DAX works.
> 
> [3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
> [4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
> [5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
> [6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> [7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/
> 
> changes since v3:
> - bug fix (patch 6): s/"IS_DAX(inode) != newdax"/"!!IS_DAX(inode) !=
>   newdax"
> - during FUSE_INIT, advertise capability for per-file DAX only when
>   mounted as "-o dax=inode" (patch 4)
> 
> changes since v2:
> - modify fuse_show_options() accordingly to make it compatible with
>   new tri-state mount option (patch 2)
> - extract FUSE protocol changes into one separate patch (patch 3)
> - FUSE server/client need to negotiate if they support per-file DAX
>   (patch 4)
> - extract DONT_CACHE logic into patch 6/7
> 
> v5: https://lore.kernel.org/all/20210923092526.72341-1-jefflexu@linux.alibaba.com/
> v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
> v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
> v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
> v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html
> 
> 
> Jeffle Xu (7):
>   fuse: add fuse_should_enable_dax() helper
>   fuse: make DAX mount option a tri-state
>   fuse: support per-file DAX in fuse protocol
>   fuse: negotiate per-file DAX in FUSE_INIT
>   fuse: enable per-file DAX
>   fuse: mark inode DONT_CACHE when per-file DAX hint changes
>   Documentation/filesystem/dax: record DAX on virtiofs
> 
>  Documentation/filesystems/dax.rst | 20 +++++++++++++++--
>  fs/fuse/dax.c                     | 36 ++++++++++++++++++++++++++++---
>  fs/fuse/file.c                    |  4 ++--
>  fs/fuse/fuse_i.h                  | 19 ++++++++++++----
>  fs/fuse/inode.c                   | 17 +++++++++++----
>  fs/fuse/virtio_fs.c               | 16 ++++++++++++--
>  include/uapi/linux/fuse.h         |  9 +++++++-
>  7 files changed, 103 insertions(+), 18 deletions(-)
> 
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-18 14:10   ` Vivek Goyal
@ 2021-10-20  2:52     ` JeffleXu
  2021-10-20 14:48       ` Vivek Goyal
  2021-10-20 15:17       ` Vivek Goyal
  0 siblings, 2 replies; 37+ messages in thread
From: JeffleXu @ 2021-10-20  2:52 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi



On 10/18/21 10:10 PM, Vivek Goyal wrote:
> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
>> operate the same which is equivalent to 'always'. To be consistemt with
>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
>> option is specified, the default behaviour is equal to 'inode'.
> 
> Hi Jeffle,
> 
> I am not sure when  -o "dax=inode"  is used as a default? If user
> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
> user will explicitly specify "-o dax=always/never/inode". So when
> is dax=inode is used as default?

That means when neither '-o dax' nor '-o dax=always/never/inode' is
specified, it is actually equal to '-o dax=inode', which is also how
per-file DAX on ext4/xfs works.

This default behaviour for local filesystem, e.g. ext4/xfs, may be
straightforward, since the disk inode will be read into memory during
the inode instantiation, and checking for persistent inode attribute
shall be realatively cheap, except that the default behaviour has
changed from 'dax=never' to 'dax=inode'.

Come back to virtiofs, when neither '-o dax' nor '-o
dax=always/never/inode' is specified, and it actually behaves as '-o
dax=inode', as long as '-o dax=server/attr' option is not specified for
virtiofsd, virtiofsd will always clear FUSE_ATTR_DAX and thus guest will
always disable DAX. IOWs, the guest virtiofs atually behaves as '-o
dax=never' when neither '-o dax' nor '-o dax=always/never/inode' is
specified, and '-o dax=server/attr' option is not specified for virtiofsd.

But I'm okay if we need to change the default behaviour for virtiofs.


> 
>>
>> By the time this patch is applied, 'inode' mode is actually equal to
>> 'always' mode, before the per-file DAX flag is introduced in the
>> following patch.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/fuse/dax.c       | 19 ++++++++++++++++---
>>  fs/fuse/fuse_i.h    | 14 ++++++++++++--
>>  fs/fuse/inode.c     | 10 +++++++---
>>  fs/fuse/virtio_fs.c | 16 ++++++++++++++--
>>  4 files changed, 49 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index 1eb6538bf1b2..4c6c64efc950 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -1284,11 +1284,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
>>  	return ret;
>>  }
>>  
>> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
>> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
>> +			struct dax_device *dax_dev)
>>  {
>>  	struct fuse_conn_dax *fcd;
>>  	int err;
>>  
>> +	fc->dax_mode = dax_mode;
>> +
>>  	if (!dax_dev)
>>  		return 0;
>>  
>> @@ -1335,11 +1338,21 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>>  static bool fuse_should_enable_dax(struct inode *inode)
>>  {
>>  	struct fuse_conn *fc = get_fuse_conn(inode);
>> +	unsigned int dax_mode = fc->dax_mode;
>> +
>> +	if (dax_mode == FUSE_DAX_NEVER)
>> +		return false;
>>  
>> -	if (fc->dax)
>> +	/*
>> +	 * If 'dax=always/inode', fc->dax couldn't be NULL even when fuse
>> +	 * daemon doesn't support DAX, since the mount routine will fail
>> +	 * early in this case.
>> +	 */
>> +	if (dax_mode == FUSE_DAX_ALWAYS)
>>  		return true;
>>  
>> -	return false;
>> +	/* dax_mode == FUSE_DAX_INODE */
>> +	return true;
> 
> So as of this patch except FUSE_DAX_NEVER return true and this will
> change in later patches for FUSE_DAX_INODE? If that's the case, keep
> it simple in this patch and change it later in the patch series.
> 
> fuse_should_enable_dax()
> {
> 	if (dax_mode == FUSE_DAX_NEVER)
> 		return false;
> 	return true;
> }
> 
>>  }
>>  
>>  void fuse_dax_inode_init(struct inode *inode)
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index 319596df5dc6..5abf9749923f 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -480,6 +480,12 @@ struct fuse_dev {
>>  	struct list_head entry;
>>  };
>>  
>> +enum fuse_dax_mode {
>> +	FUSE_DAX_INODE,
>> +	FUSE_DAX_ALWAYS,
>> +	FUSE_DAX_NEVER,
>> +};
>> +
>>  struct fuse_fs_context {
>>  	int fd;
>>  	struct file *file;
>> @@ -497,7 +503,7 @@ struct fuse_fs_context {
>>  	bool no_control:1;
>>  	bool no_force_umount:1;
>>  	bool legacy_opts_show:1;
>> -	bool dax:1;
>> +	enum fuse_dax_mode dax_mode;
>>  	unsigned int max_read;
>>  	unsigned int blksize;
>>  	const char *subtype;
>> @@ -802,6 +808,9 @@ struct fuse_conn {
>>  	struct list_head devices;
>>  
>>  #ifdef CONFIG_FUSE_DAX
>> +	/* dax mode: FUSE_DAX_* (always, never or per-file) */
>> +	enum fuse_dax_mode dax_mode;
>> +
>>  	/* Dax specific conn data, non-NULL if DAX is enabled */
>>  	struct fuse_conn_dax *dax;
>>  #endif
>> @@ -1255,7 +1264,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
>>  ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
>>  int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
>>  int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
>> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
>> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
>> +			struct dax_device *dax_dev);
>>  void fuse_dax_conn_free(struct fuse_conn *fc);
>>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>>  void fuse_dax_inode_init(struct inode *inode);
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 36cd03114b6d..b4b41683e97e 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -742,8 +742,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>>  			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
>>  	}
>>  #ifdef CONFIG_FUSE_DAX
>> -	if (fc->dax)
>> -		seq_puts(m, ",dax");
>> +	if (fc->dax_mode == FUSE_DAX_ALWAYS)
>> +		seq_puts(m, ",dax=always");
> 
> So if somebody mounts with "-o dax" then kernel previous to this change
> will show "dax" and kernel after this change will show "dax=always"?

Yes. It's actually how per-file DAX on ext4/xfs behaves.

> 
> How about not change the behavior. Keep a mode say FUSE_DAX_LEGACY which
> will be set when user specifies "-o dax". Internally FUSE_DAX_LEGACY
> and FUSE_DAX_ALWAYS will be same.
> 
> 	if (fc->dax_mode == FUSE_DAX_LEGACY)
> 		seq_puts(m, ",dax");
> 




> 
>> +	else if (fc->dax_mode == FUSE_DAX_NEVER)
>> +		seq_puts(m, ",dax=never");
>> +	else if (fc->dax_mode == FUSE_DAX_INODE)
>> +		seq_puts(m, ",dax=inode");
>>  #endif
>>  
>>  	return 0;
>> @@ -1493,7 +1497,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>>  	sb->s_subtype = ctx->subtype;
>>  	ctx->subtype = NULL;
>>  	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
>> -		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
>> +		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
>>  		if (err)
>>  			goto err;
>>  	}
>> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
>> index 0ad89c6629d7..58cfbaeb4a7d 100644
>> --- a/fs/fuse/virtio_fs.c
>> +++ b/fs/fuse/virtio_fs.c
>> @@ -88,12 +88,21 @@ struct virtio_fs_req_work {
>>  static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>>  				 struct fuse_req *req, bool in_flight);
>>  
>> +static const struct constant_table dax_param_enums[] = {
>> +	{"inode",	FUSE_DAX_INODE },
>> +	{"always",	FUSE_DAX_ALWAYS },
>> +	{"never",	FUSE_DAX_NEVER },
>> +	{}
>> +};
>> +
>>  enum {
>>  	OPT_DAX,
>> +	OPT_DAX_ENUM,
>>  };
>>  
>>  static const struct fs_parameter_spec virtio_fs_parameters[] = {
>>  	fsparam_flag("dax", OPT_DAX),
>> +	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
>>  	{}
>>  };
>>  
>> @@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
>>  
>>  	switch (opt) {
>>  	case OPT_DAX:
>> -		ctx->dax = 1;
>> +		ctx->dax_mode = FUSE_DAX_ALWAYS;
>> +		break;
>> +	case OPT_DAX_ENUM:
>> +		ctx->dax_mode = result.uint_32;
>>  		break;
>>  	default:
>>  		return -EINVAL;
>> @@ -1326,7 +1338,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
>>  
>>  	/* virtiofs allocates and installs its own fuse devices */
>>  	ctx->fudptr = NULL;
>> -	if (ctx->dax) {
>> +	if (ctx->dax_mode != FUSE_DAX_NEVER) {
>>  		if (!fs->dax_dev) {
>>  			err = -EINVAL;
>>  			pr_err("virtio-fs: dax can't be enabled as filesystem"
>> -- 
>> 2.27.0
>>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol
  2021-10-18 14:20     ` Vivek Goyal
@ 2021-10-20  3:04       ` JeffleXu
  2021-10-20 14:54         ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: JeffleXu @ 2021-10-20  3:04 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi



On 10/18/21 10:20 PM, Vivek Goyal wrote:
> On Mon, Oct 18, 2021 at 10:14:04AM -0400, Vivek Goyal wrote:
>> On Mon, Oct 11, 2021 at 11:00:48AM +0800, Jeffle Xu wrote:
>>> Expand the fuse protocol to support per-file DAX.
>>>
>>> FUSE_PERFILE_DAX flag is added indicating if fuse server/client
>>
>> Should we call this flag FUSE_INODE_DAX instead? It is per inode property?
>>

Yes, strictly specking, 'per-file' is not correct.

> 
> I realized that you are using FUSE_DAX_INODE to represent dax mode. So it
> will be confusing to use FUSE_INODE_DAX as protocol flag. How about
> FUSE_INODE_DAX_STATE instead?
> 

Emmm, the "_STATE" suffix is not straightforward and clear to me. How
about FUSE_HAS_INODE_DAX or FUSE_DO_INODE_DAX, referring to the existing
'FUSE_HAS_IOCTL_DIR' and 'FUSE_DO_READDIRPLUS'?


>>
>>> supporting per-file DAX. It can be conveyed in both FUSE_INIT request
>>> and reply.
>>>
>>> FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
>>> corresponding file. It is conveyed in FUSE_LOOKUP reply.
>>>
>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>>> ---
>>>  include/uapi/linux/fuse.h | 9 ++++++++-
>>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
>>> index 36ed092227fa..15a1f5fc0797 100644
>>> --- a/include/uapi/linux/fuse.h
>>> +++ b/include/uapi/linux/fuse.h
>>> @@ -184,6 +184,9 @@
>>>   *
>>>   *  7.34
>>>   *  - add FUSE_SYNCFS
>>> + *
>>> + *  7.35
>>> + *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
>>>   */
>>>  
>>>  #ifndef _LINUX_FUSE_H
>>> @@ -219,7 +222,7 @@
>>>  #define FUSE_KERNEL_VERSION 7
>>>  
>>>  /** Minor version number of this interface */
>>> -#define FUSE_KERNEL_MINOR_VERSION 34
>>> +#define FUSE_KERNEL_MINOR_VERSION 35
>>>  
>>>  /** The node ID of the root inode */
>>>  #define FUSE_ROOT_ID 1
>>> @@ -336,6 +339,7 @@ struct fuse_file_lock {
>>>   *			write/truncate sgid is killed only if file has group
>>>   *			execute permission. (Same as Linux VFS behavior).
>>>   * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
>>> + * FUSE_PERFILE_DAX:	kernel supports per-file DAX
>>>   */
>>>  #define FUSE_ASYNC_READ		(1 << 0)
>>>  #define FUSE_POSIX_LOCKS	(1 << 1)
>>> @@ -367,6 +371,7 @@ struct fuse_file_lock {
>>>  #define FUSE_SUBMOUNTS		(1 << 27)
>>>  #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
>>>  #define FUSE_SETXATTR_EXT	(1 << 29)
>>> +#define FUSE_PERFILE_DAX	(1 << 30)
>>>  
>>>  /**
>>>   * CUSE INIT request/reply flags
>>> @@ -449,8 +454,10 @@ struct fuse_file_lock {
>>>   * fuse_attr flags
>>>   *
>>>   * FUSE_ATTR_SUBMOUNT: Object is a submount root
>>> + * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
>>>   */
>>>  #define FUSE_ATTR_SUBMOUNT      (1 << 0)
>>> +#define FUSE_ATTR_DAX		(1 << 1)
>>>  
>>>  /**
>>>   * Open flags
>>> -- 
>>> 2.27.0
>>>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/7] fuse: negotiate per-file DAX in FUSE_INIT
  2021-10-18 14:30   ` Vivek Goyal
@ 2021-10-20  3:10     ` JeffleXu
  2021-10-20 15:44       ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: JeffleXu @ 2021-10-20  3:10 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi



On 10/18/21 10:30 PM, Vivek Goyal wrote:
> On Mon, Oct 11, 2021 at 11:00:49AM +0800, Jeffle Xu wrote:
>> Among the FUSE_INIT phase, client shall advertise per-file DAX if it's
>> mounted with "-o dax=inode". Then server is aware that client is in
>> per-file DAX mode, and will construct per-inode DAX attribute
>> accordingly.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/fuse/inode.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index b4b41683e97e..f4ad99e2415b 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -1203,6 +1203,8 @@ void fuse_send_init(struct fuse_mount *fm)
>>  #ifdef CONFIG_FUSE_DAX
>>  	if (fm->fc->dax)
>>  		ia->in.flags |= FUSE_MAP_ALIGNMENT;
>> +	if (fm->fc->dax_mode == FUSE_DAX_INODE)
>> +		ia->in.flags |= FUSE_PERFILE_DAX;
> 
> Are you not keeping track of server's response whether server supports
> per inode dax or not. Client might be new and server might be old and
> server might not support per inode dax. In that case, we probably 
> should error out if user mounted with "-o dax=inode".
> 

Yes, if guest virtiofs is mounted with '-o dax=inode' while virtiofsd is
old and doesn't support per inode dax, then guest virtiofs will never
receive FUSE_ATTR_DAX and actually behaves as '-o dax=never'. So the
whole system works in this case, though the behavior may be beyond the
expectation of users ....

If the behavior really matters, I could change the behavior and fail
directly if virtiofsd doesn't advertise supporting per inode DAX.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 0/7] fuse,virtiofs: support per-file DAX
  2021-10-18 15:21 ` Vivek Goyal
@ 2021-10-20  5:22   ` JeffleXu
  2021-10-20 16:06     ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: JeffleXu @ 2021-10-20  5:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi



On 10/18/21 11:21 PM, Vivek Goyal wrote:
> On Mon, Oct 11, 2021 at 11:00:45AM +0800, Jeffle Xu wrote:
>> changes since v5:
>> Overall Design Changes:
>> 1. virtiofsd now supports ioctl (only FS_IOC_SETFLAGS and
>>   FS_IOC_FSSETXATTR), so that users inside guest could set/clear
>>   persistent inode flags now. (FUSE kernel module has already supported
>>   .ioctl(), virtiofsd need to suuport it.)
> 
> So no changes needed in fuse side (kernel) to support FS_IOC_FSSETXATTR?
> Only virtiofsd needs to be changed. That sounds good.
> 

Yes, the fuse kernel modules has already supported FUSE_IOCTL.

Per inode DAX on ext4/xfs will also call d_mark_dontcache() and try to
evict this inode as soon as possible when the persistent (DAX) inode
attribute has changed, just like [1].

But because of following reason:
> 
>> 2. The
>>   algorithm used by virtiofsd to determine whether DAX shall be enabled
>>   or not is totally implementation specific, and thus the following
>>   scenario may exist: users inside guest has already set related persistent
>>   inode flag (i.e. FS_XFLAG_DAX) on corresponding file but FUSE server finnaly
>>   decides not to enable DAX for this file.

If we always call d_mark_dontcache() and try to evict this inode when
the persistent (DAX) inode attribute has changed, the DAX state returned
by virtiofsd may sustain the same, and thus the previous eviction is
totally wasted and unnecessary.

So, as the following said,

>> Also because of this, d_mark_dontcache() is
>>   not called when FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl is done inside
>>   guest. It's delayed to be done if the FUSE_ATTR_DAX flag **indeed**
>>   changes (as showed in patch 6).

the call for d_mark_dontcache() and inode eviction is delayed when the
DAX state returned by virtiofsd **indeed** changed (when dentry is timed
out and a new FUSE_LOOKUP is requested). But the defect is that, if '-o
cache=always' is set for virtiofsd, then the DAX state won't be updated
for a long time, after users have changed the persistent (DAX) inode
attribute inside guest via FS_IOC_FSSETXATTR ioctl.



[1] https://www.spinics.net/lists/linux-fsdevel/msg200851.html

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-20  2:52     ` JeffleXu
@ 2021-10-20 14:48       ` Vivek Goyal
  2021-10-29  8:33         ` JeffleXu
  2021-10-20 15:17       ` Vivek Goyal
  1 sibling, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-20 14:48 UTC (permalink / raw)
  To: JeffleXu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
> 
> 
> On 10/18/21 10:10 PM, Vivek Goyal wrote:
> > On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
> >> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> >> operate the same which is equivalent to 'always'. To be consistemt with
> >> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> >> option is specified, the default behaviour is equal to 'inode'.
> > 
> > Hi Jeffle,
> > 
> > I am not sure when  -o "dax=inode"  is used as a default? If user
> > specifies, "-o dax" then it is equal to "-o dax=always", otherwise
> > user will explicitly specify "-o dax=always/never/inode". So when
> > is dax=inode is used as default?
> 
> That means when neither '-o dax' nor '-o dax=always/never/inode' is
> specified, it is actually equal to '-o dax=inode', which is also how
> per-file DAX on ext4/xfs works.
> 
> This default behaviour for local filesystem, e.g. ext4/xfs, may be
> straightforward, since the disk inode will be read into memory during
> the inode instantiation, and checking for persistent inode attribute
> shall be realatively cheap, except that the default behaviour has
> changed from 'dax=never' to 'dax=inode'.

Interesting that ext4/xfs allowed for this behavior change.

> 
> Come back to virtiofs, when neither '-o dax' nor '-o
> dax=always/never/inode' is specified, and it actually behaves as '-o
> dax=inode', as long as '-o dax=server/attr' option is not specified for
> virtiofsd, virtiofsd will always clear FUSE_ATTR_DAX and thus guest will
> always disable DAX. IOWs, the guest virtiofs atually behaves as '-o
> dax=never' when neither '-o dax' nor '-o dax=always/never/inode' is
> specified, and '-o dax=server/attr' option is not specified for virtiofsd.
> 
> But I'm okay if we need to change the default behaviour for virtiofs.

This is change of behavior from client's perspective. Even if client
did not opt-in for DAX, DAX can be enabled based on server's setting.
Not that there is anything wrong with it, but change of behavior part
concerns me.

In case of virtiofs, lot of features we are controlling from server.
Client typically just calls "mount" and there are not many options
users can specify for mount.  

Given we already allowed to make client a choice about DAX behavior,
I will feel more comfortable that we don't change it and let client
request a specific DAX mode and if client does not specify anything,
then DAX is not enabled.

Vivek
> 
> 
> > 
> >>
> >> By the time this patch is applied, 'inode' mode is actually equal to
> >> 'always' mode, before the per-file DAX flag is introduced in the
> >> following patch.
> >>
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  fs/fuse/dax.c       | 19 ++++++++++++++++---
> >>  fs/fuse/fuse_i.h    | 14 ++++++++++++--
> >>  fs/fuse/inode.c     | 10 +++++++---
> >>  fs/fuse/virtio_fs.c | 16 ++++++++++++++--
> >>  4 files changed, 49 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> >> index 1eb6538bf1b2..4c6c64efc950 100644
> >> --- a/fs/fuse/dax.c
> >> +++ b/fs/fuse/dax.c
> >> @@ -1284,11 +1284,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
> >>  	return ret;
> >>  }
> >>  
> >> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
> >> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
> >> +			struct dax_device *dax_dev)
> >>  {
> >>  	struct fuse_conn_dax *fcd;
> >>  	int err;
> >>  
> >> +	fc->dax_mode = dax_mode;
> >> +
> >>  	if (!dax_dev)
> >>  		return 0;
> >>  
> >> @@ -1335,11 +1338,21 @@ static const struct address_space_operations fuse_dax_file_aops  = {
> >>  static bool fuse_should_enable_dax(struct inode *inode)
> >>  {
> >>  	struct fuse_conn *fc = get_fuse_conn(inode);
> >> +	unsigned int dax_mode = fc->dax_mode;
> >> +
> >> +	if (dax_mode == FUSE_DAX_NEVER)
> >> +		return false;
> >>  
> >> -	if (fc->dax)
> >> +	/*
> >> +	 * If 'dax=always/inode', fc->dax couldn't be NULL even when fuse
> >> +	 * daemon doesn't support DAX, since the mount routine will fail
> >> +	 * early in this case.
> >> +	 */
> >> +	if (dax_mode == FUSE_DAX_ALWAYS)
> >>  		return true;
> >>  
> >> -	return false;
> >> +	/* dax_mode == FUSE_DAX_INODE */
> >> +	return true;
> > 
> > So as of this patch except FUSE_DAX_NEVER return true and this will
> > change in later patches for FUSE_DAX_INODE? If that's the case, keep
> > it simple in this patch and change it later in the patch series.
> > 
> > fuse_should_enable_dax()
> > {
> > 	if (dax_mode == FUSE_DAX_NEVER)
> > 		return false;
> > 	return true;
> > }
> > 
> >>  }
> >>  
> >>  void fuse_dax_inode_init(struct inode *inode)
> >> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> >> index 319596df5dc6..5abf9749923f 100644
> >> --- a/fs/fuse/fuse_i.h
> >> +++ b/fs/fuse/fuse_i.h
> >> @@ -480,6 +480,12 @@ struct fuse_dev {
> >>  	struct list_head entry;
> >>  };
> >>  
> >> +enum fuse_dax_mode {
> >> +	FUSE_DAX_INODE,
> >> +	FUSE_DAX_ALWAYS,
> >> +	FUSE_DAX_NEVER,
> >> +};
> >> +
> >>  struct fuse_fs_context {
> >>  	int fd;
> >>  	struct file *file;
> >> @@ -497,7 +503,7 @@ struct fuse_fs_context {
> >>  	bool no_control:1;
> >>  	bool no_force_umount:1;
> >>  	bool legacy_opts_show:1;
> >> -	bool dax:1;
> >> +	enum fuse_dax_mode dax_mode;
> >>  	unsigned int max_read;
> >>  	unsigned int blksize;
> >>  	const char *subtype;
> >> @@ -802,6 +808,9 @@ struct fuse_conn {
> >>  	struct list_head devices;
> >>  
> >>  #ifdef CONFIG_FUSE_DAX
> >> +	/* dax mode: FUSE_DAX_* (always, never or per-file) */
> >> +	enum fuse_dax_mode dax_mode;
> >> +
> >>  	/* Dax specific conn data, non-NULL if DAX is enabled */
> >>  	struct fuse_conn_dax *dax;
> >>  #endif
> >> @@ -1255,7 +1264,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
> >>  ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
> >>  int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
> >>  int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
> >> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
> >> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
> >> +			struct dax_device *dax_dev);
> >>  void fuse_dax_conn_free(struct fuse_conn *fc);
> >>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
> >>  void fuse_dax_inode_init(struct inode *inode);
> >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> >> index 36cd03114b6d..b4b41683e97e 100644
> >> --- a/fs/fuse/inode.c
> >> +++ b/fs/fuse/inode.c
> >> @@ -742,8 +742,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
> >>  			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
> >>  	}
> >>  #ifdef CONFIG_FUSE_DAX
> >> -	if (fc->dax)
> >> -		seq_puts(m, ",dax");
> >> +	if (fc->dax_mode == FUSE_DAX_ALWAYS)
> >> +		seq_puts(m, ",dax=always");
> > 
> > So if somebody mounts with "-o dax" then kernel previous to this change
> > will show "dax" and kernel after this change will show "dax=always"?
> 
> Yes. It's actually how per-file DAX on ext4/xfs behaves.
> 
> > 
> > How about not change the behavior. Keep a mode say FUSE_DAX_LEGACY which
> > will be set when user specifies "-o dax". Internally FUSE_DAX_LEGACY
> > and FUSE_DAX_ALWAYS will be same.
> > 
> > 	if (fc->dax_mode == FUSE_DAX_LEGACY)
> > 		seq_puts(m, ",dax");
> > 
> 
> 
> 
> 
> > 
> >> +	else if (fc->dax_mode == FUSE_DAX_NEVER)
> >> +		seq_puts(m, ",dax=never");
> >> +	else if (fc->dax_mode == FUSE_DAX_INODE)
> >> +		seq_puts(m, ",dax=inode");
> >>  #endif
> >>  
> >>  	return 0;
> >> @@ -1493,7 +1497,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> >>  	sb->s_subtype = ctx->subtype;
> >>  	ctx->subtype = NULL;
> >>  	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
> >> -		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
> >> +		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
> >>  		if (err)
> >>  			goto err;
> >>  	}
> >> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> >> index 0ad89c6629d7..58cfbaeb4a7d 100644
> >> --- a/fs/fuse/virtio_fs.c
> >> +++ b/fs/fuse/virtio_fs.c
> >> @@ -88,12 +88,21 @@ struct virtio_fs_req_work {
> >>  static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> >>  				 struct fuse_req *req, bool in_flight);
> >>  
> >> +static const struct constant_table dax_param_enums[] = {
> >> +	{"inode",	FUSE_DAX_INODE },
> >> +	{"always",	FUSE_DAX_ALWAYS },
> >> +	{"never",	FUSE_DAX_NEVER },
> >> +	{}
> >> +};
> >> +
> >>  enum {
> >>  	OPT_DAX,
> >> +	OPT_DAX_ENUM,
> >>  };
> >>  
> >>  static const struct fs_parameter_spec virtio_fs_parameters[] = {
> >>  	fsparam_flag("dax", OPT_DAX),
> >> +	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
> >>  	{}
> >>  };
> >>  
> >> @@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
> >>  
> >>  	switch (opt) {
> >>  	case OPT_DAX:
> >> -		ctx->dax = 1;
> >> +		ctx->dax_mode = FUSE_DAX_ALWAYS;
> >> +		break;
> >> +	case OPT_DAX_ENUM:
> >> +		ctx->dax_mode = result.uint_32;
> >>  		break;
> >>  	default:
> >>  		return -EINVAL;
> >> @@ -1326,7 +1338,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
> >>  
> >>  	/* virtiofs allocates and installs its own fuse devices */
> >>  	ctx->fudptr = NULL;
> >> -	if (ctx->dax) {
> >> +	if (ctx->dax_mode != FUSE_DAX_NEVER) {
> >>  		if (!fs->dax_dev) {
> >>  			err = -EINVAL;
> >>  			pr_err("virtio-fs: dax can't be enabled as filesystem"
> >> -- 
> >> 2.27.0
> >>
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol
  2021-10-20  3:04       ` JeffleXu
@ 2021-10-20 14:54         ` Vivek Goyal
  0 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2021-10-20 14:54 UTC (permalink / raw)
  To: JeffleXu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Wed, Oct 20, 2021 at 11:04:03AM +0800, JeffleXu wrote:
> 
> 
> On 10/18/21 10:20 PM, Vivek Goyal wrote:
> > On Mon, Oct 18, 2021 at 10:14:04AM -0400, Vivek Goyal wrote:
> >> On Mon, Oct 11, 2021 at 11:00:48AM +0800, Jeffle Xu wrote:
> >>> Expand the fuse protocol to support per-file DAX.
> >>>
> >>> FUSE_PERFILE_DAX flag is added indicating if fuse server/client
> >>
> >> Should we call this flag FUSE_INODE_DAX instead? It is per inode property?
> >>
> 
> Yes, strictly specking, 'per-file' is not correct.
> 
> > 
> > I realized that you are using FUSE_DAX_INODE to represent dax mode. So it
> > will be confusing to use FUSE_INODE_DAX as protocol flag. How about
> > FUSE_INODE_DAX_STATE instead?
> > 
> 
> Emmm, the "_STATE" suffix is not straightforward and clear to me. How
> about FUSE_HAS_INODE_DAX or FUSE_DO_INODE_DAX, referring to the existing
> 'FUSE_HAS_IOCTL_DIR' and 'FUSE_DO_READDIRPLUS'?

FUSE_HAS_INODE_DAX or FUSE_DO_INODE_DAX are fine.

Vivek

> 
> 
> >>
> >>> supporting per-file DAX. It can be conveyed in both FUSE_INIT request
> >>> and reply.
> >>>
> >>> FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
> >>> corresponding file. It is conveyed in FUSE_LOOKUP reply.
> >>>
> >>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>> ---
> >>>  include/uapi/linux/fuse.h | 9 ++++++++-
> >>>  1 file changed, 8 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> >>> index 36ed092227fa..15a1f5fc0797 100644
> >>> --- a/include/uapi/linux/fuse.h
> >>> +++ b/include/uapi/linux/fuse.h
> >>> @@ -184,6 +184,9 @@
> >>>   *
> >>>   *  7.34
> >>>   *  - add FUSE_SYNCFS
> >>> + *
> >>> + *  7.35
> >>> + *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
> >>>   */
> >>>  
> >>>  #ifndef _LINUX_FUSE_H
> >>> @@ -219,7 +222,7 @@
> >>>  #define FUSE_KERNEL_VERSION 7
> >>>  
> >>>  /** Minor version number of this interface */
> >>> -#define FUSE_KERNEL_MINOR_VERSION 34
> >>> +#define FUSE_KERNEL_MINOR_VERSION 35
> >>>  
> >>>  /** The node ID of the root inode */
> >>>  #define FUSE_ROOT_ID 1
> >>> @@ -336,6 +339,7 @@ struct fuse_file_lock {
> >>>   *			write/truncate sgid is killed only if file has group
> >>>   *			execute permission. (Same as Linux VFS behavior).
> >>>   * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
> >>> + * FUSE_PERFILE_DAX:	kernel supports per-file DAX
> >>>   */
> >>>  #define FUSE_ASYNC_READ		(1 << 0)
> >>>  #define FUSE_POSIX_LOCKS	(1 << 1)
> >>> @@ -367,6 +371,7 @@ struct fuse_file_lock {
> >>>  #define FUSE_SUBMOUNTS		(1 << 27)
> >>>  #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
> >>>  #define FUSE_SETXATTR_EXT	(1 << 29)
> >>> +#define FUSE_PERFILE_DAX	(1 << 30)
> >>>  
> >>>  /**
> >>>   * CUSE INIT request/reply flags
> >>> @@ -449,8 +454,10 @@ struct fuse_file_lock {
> >>>   * fuse_attr flags
> >>>   *
> >>>   * FUSE_ATTR_SUBMOUNT: Object is a submount root
> >>> + * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
> >>>   */
> >>>  #define FUSE_ATTR_SUBMOUNT      (1 << 0)
> >>> +#define FUSE_ATTR_DAX		(1 << 1)
> >>>  
> >>>  /**
> >>>   * Open flags
> >>> -- 
> >>> 2.27.0
> >>>
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-20  2:52     ` JeffleXu
  2021-10-20 14:48       ` Vivek Goyal
@ 2021-10-20 15:17       ` Vivek Goyal
  2021-10-22  6:54         ` JeffleXu
  1 sibling, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-20 15:17 UTC (permalink / raw)
  To: JeffleXu, Dave Chinner
  Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi,
	Dave Chinner

On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
> 
> 
> On 10/18/21 10:10 PM, Vivek Goyal wrote:
> > On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
> >> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> >> operate the same which is equivalent to 'always'. To be consistemt with
> >> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> >> option is specified, the default behaviour is equal to 'inode'.
> > 
> > Hi Jeffle,
> > 
> > I am not sure when  -o "dax=inode"  is used as a default? If user
> > specifies, "-o dax" then it is equal to "-o dax=always", otherwise
> > user will explicitly specify "-o dax=always/never/inode". So when
> > is dax=inode is used as default?
> 
> That means when neither '-o dax' nor '-o dax=always/never/inode' is
> specified, it is actually equal to '-o dax=inode', which is also how
> per-file DAX on ext4/xfs works.

[ CC dave chinner] 

Is it not change of default behavior for ext4/xfs as well. My
understanding is that prior to this new dax options, "-o dax" enabled
dax on filesystem and if user did not specify it, DAX is disbaled
by default.

Now after introduction of "-o dax=always/never/inode", if suddenly
"-o dax=inode" became the default if user did not specify anything,
that's change of behavior. Is that intentional. If given a choice,
I would rather not change default and ask user to opt-in for
appropriate dax functionality.

Dave, you might have thoughts on this. It makes me uncomfortable to
change virtiofs dax default now just because other filesytems did it.

Thanks
Vivek

> 
> This default behaviour for local filesystem, e.g. ext4/xfs, may be
> straightforward, since the disk inode will be read into memory during
> the inode instantiation, and checking for persistent inode attribute
> shall be realatively cheap, except that the default behaviour has
> changed from 'dax=never' to 'dax=inode'.
> 
> Come back to virtiofs, when neither '-o dax' nor '-o
> dax=always/never/inode' is specified, and it actually behaves as '-o
> dax=inode', as long as '-o dax=server/attr' option is not specified for
> virtiofsd, virtiofsd will always clear FUSE_ATTR_DAX and thus guest will
> always disable DAX. IOWs, the guest virtiofs atually behaves as '-o
> dax=never' when neither '-o dax' nor '-o dax=always/never/inode' is
> specified, and '-o dax=server/attr' option is not specified for virtiofsd.
> 
> But I'm okay if we need to change the default behaviour for virtiofs.
> 
> 
> > 
> >>
> >> By the time this patch is applied, 'inode' mode is actually equal to
> >> 'always' mode, before the per-file DAX flag is introduced in the
> >> following patch.
> >>
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  fs/fuse/dax.c       | 19 ++++++++++++++++---
> >>  fs/fuse/fuse_i.h    | 14 ++++++++++++--
> >>  fs/fuse/inode.c     | 10 +++++++---
> >>  fs/fuse/virtio_fs.c | 16 ++++++++++++++--
> >>  4 files changed, 49 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> >> index 1eb6538bf1b2..4c6c64efc950 100644
> >> --- a/fs/fuse/dax.c
> >> +++ b/fs/fuse/dax.c
> >> @@ -1284,11 +1284,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
> >>  	return ret;
> >>  }
> >>  
> >> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
> >> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
> >> +			struct dax_device *dax_dev)
> >>  {
> >>  	struct fuse_conn_dax *fcd;
> >>  	int err;
> >>  
> >> +	fc->dax_mode = dax_mode;
> >> +
> >>  	if (!dax_dev)
> >>  		return 0;
> >>  
> >> @@ -1335,11 +1338,21 @@ static const struct address_space_operations fuse_dax_file_aops  = {
> >>  static bool fuse_should_enable_dax(struct inode *inode)
> >>  {
> >>  	struct fuse_conn *fc = get_fuse_conn(inode);
> >> +	unsigned int dax_mode = fc->dax_mode;
> >> +
> >> +	if (dax_mode == FUSE_DAX_NEVER)
> >> +		return false;
> >>  
> >> -	if (fc->dax)
> >> +	/*
> >> +	 * If 'dax=always/inode', fc->dax couldn't be NULL even when fuse
> >> +	 * daemon doesn't support DAX, since the mount routine will fail
> >> +	 * early in this case.
> >> +	 */
> >> +	if (dax_mode == FUSE_DAX_ALWAYS)
> >>  		return true;
> >>  
> >> -	return false;
> >> +	/* dax_mode == FUSE_DAX_INODE */
> >> +	return true;
> > 
> > So as of this patch except FUSE_DAX_NEVER return true and this will
> > change in later patches for FUSE_DAX_INODE? If that's the case, keep
> > it simple in this patch and change it later in the patch series.
> > 
> > fuse_should_enable_dax()
> > {
> > 	if (dax_mode == FUSE_DAX_NEVER)
> > 		return false;
> > 	return true;
> > }
> > 
> >>  }
> >>  
> >>  void fuse_dax_inode_init(struct inode *inode)
> >> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> >> index 319596df5dc6..5abf9749923f 100644
> >> --- a/fs/fuse/fuse_i.h
> >> +++ b/fs/fuse/fuse_i.h
> >> @@ -480,6 +480,12 @@ struct fuse_dev {
> >>  	struct list_head entry;
> >>  };
> >>  
> >> +enum fuse_dax_mode {
> >> +	FUSE_DAX_INODE,
> >> +	FUSE_DAX_ALWAYS,
> >> +	FUSE_DAX_NEVER,
> >> +};
> >> +
> >>  struct fuse_fs_context {
> >>  	int fd;
> >>  	struct file *file;
> >> @@ -497,7 +503,7 @@ struct fuse_fs_context {
> >>  	bool no_control:1;
> >>  	bool no_force_umount:1;
> >>  	bool legacy_opts_show:1;
> >> -	bool dax:1;
> >> +	enum fuse_dax_mode dax_mode;
> >>  	unsigned int max_read;
> >>  	unsigned int blksize;
> >>  	const char *subtype;
> >> @@ -802,6 +808,9 @@ struct fuse_conn {
> >>  	struct list_head devices;
> >>  
> >>  #ifdef CONFIG_FUSE_DAX
> >> +	/* dax mode: FUSE_DAX_* (always, never or per-file) */
> >> +	enum fuse_dax_mode dax_mode;
> >> +
> >>  	/* Dax specific conn data, non-NULL if DAX is enabled */
> >>  	struct fuse_conn_dax *dax;
> >>  #endif
> >> @@ -1255,7 +1264,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
> >>  ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
> >>  int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
> >>  int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
> >> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
> >> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
> >> +			struct dax_device *dax_dev);
> >>  void fuse_dax_conn_free(struct fuse_conn *fc);
> >>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
> >>  void fuse_dax_inode_init(struct inode *inode);
> >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> >> index 36cd03114b6d..b4b41683e97e 100644
> >> --- a/fs/fuse/inode.c
> >> +++ b/fs/fuse/inode.c
> >> @@ -742,8 +742,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
> >>  			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
> >>  	}
> >>  #ifdef CONFIG_FUSE_DAX
> >> -	if (fc->dax)
> >> -		seq_puts(m, ",dax");
> >> +	if (fc->dax_mode == FUSE_DAX_ALWAYS)
> >> +		seq_puts(m, ",dax=always");
> > 
> > So if somebody mounts with "-o dax" then kernel previous to this change
> > will show "dax" and kernel after this change will show "dax=always"?
> 
> Yes. It's actually how per-file DAX on ext4/xfs behaves.
> 
> > 
> > How about not change the behavior. Keep a mode say FUSE_DAX_LEGACY which
> > will be set when user specifies "-o dax". Internally FUSE_DAX_LEGACY
> > and FUSE_DAX_ALWAYS will be same.
> > 
> > 	if (fc->dax_mode == FUSE_DAX_LEGACY)
> > 		seq_puts(m, ",dax");
> > 
> 
> 
> 
> 
> > 
> >> +	else if (fc->dax_mode == FUSE_DAX_NEVER)
> >> +		seq_puts(m, ",dax=never");
> >> +	else if (fc->dax_mode == FUSE_DAX_INODE)
> >> +		seq_puts(m, ",dax=inode");
> >>  #endif
> >>  
> >>  	return 0;
> >> @@ -1493,7 +1497,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
> >>  	sb->s_subtype = ctx->subtype;
> >>  	ctx->subtype = NULL;
> >>  	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
> >> -		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
> >> +		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
> >>  		if (err)
> >>  			goto err;
> >>  	}
> >> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> >> index 0ad89c6629d7..58cfbaeb4a7d 100644
> >> --- a/fs/fuse/virtio_fs.c
> >> +++ b/fs/fuse/virtio_fs.c
> >> @@ -88,12 +88,21 @@ struct virtio_fs_req_work {
> >>  static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
> >>  				 struct fuse_req *req, bool in_flight);
> >>  
> >> +static const struct constant_table dax_param_enums[] = {
> >> +	{"inode",	FUSE_DAX_INODE },
> >> +	{"always",	FUSE_DAX_ALWAYS },
> >> +	{"never",	FUSE_DAX_NEVER },
> >> +	{}
> >> +};
> >> +
> >>  enum {
> >>  	OPT_DAX,
> >> +	OPT_DAX_ENUM,
> >>  };
> >>  
> >>  static const struct fs_parameter_spec virtio_fs_parameters[] = {
> >>  	fsparam_flag("dax", OPT_DAX),
> >> +	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
> >>  	{}
> >>  };
> >>  
> >> @@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
> >>  
> >>  	switch (opt) {
> >>  	case OPT_DAX:
> >> -		ctx->dax = 1;
> >> +		ctx->dax_mode = FUSE_DAX_ALWAYS;
> >> +		break;
> >> +	case OPT_DAX_ENUM:
> >> +		ctx->dax_mode = result.uint_32;
> >>  		break;
> >>  	default:
> >>  		return -EINVAL;
> >> @@ -1326,7 +1338,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
> >>  
> >>  	/* virtiofs allocates and installs its own fuse devices */
> >>  	ctx->fudptr = NULL;
> >> -	if (ctx->dax) {
> >> +	if (ctx->dax_mode != FUSE_DAX_NEVER) {
> >>  		if (!fs->dax_dev) {
> >>  			err = -EINVAL;
> >>  			pr_err("virtio-fs: dax can't be enabled as filesystem"
> >> -- 
> >> 2.27.0
> >>
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 4/7] fuse: negotiate per-file DAX in FUSE_INIT
  2021-10-20  3:10     ` JeffleXu
@ 2021-10-20 15:44       ` Vivek Goyal
  0 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2021-10-20 15:44 UTC (permalink / raw)
  To: JeffleXu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Wed, Oct 20, 2021 at 11:10:30AM +0800, JeffleXu wrote:
> 
> 
> On 10/18/21 10:30 PM, Vivek Goyal wrote:
> > On Mon, Oct 11, 2021 at 11:00:49AM +0800, Jeffle Xu wrote:
> >> Among the FUSE_INIT phase, client shall advertise per-file DAX if it's
> >> mounted with "-o dax=inode". Then server is aware that client is in
> >> per-file DAX mode, and will construct per-inode DAX attribute
> >> accordingly.
> >>
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  fs/fuse/inode.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> >> index b4b41683e97e..f4ad99e2415b 100644
> >> --- a/fs/fuse/inode.c
> >> +++ b/fs/fuse/inode.c
> >> @@ -1203,6 +1203,8 @@ void fuse_send_init(struct fuse_mount *fm)
> >>  #ifdef CONFIG_FUSE_DAX
> >>  	if (fm->fc->dax)
> >>  		ia->in.flags |= FUSE_MAP_ALIGNMENT;
> >> +	if (fm->fc->dax_mode == FUSE_DAX_INODE)
> >> +		ia->in.flags |= FUSE_PERFILE_DAX;
> > 
> > Are you not keeping track of server's response whether server supports
> > per inode dax or not. Client might be new and server might be old and
> > server might not support per inode dax. In that case, we probably 
> > should error out if user mounted with "-o dax=inode".
> > 
> 
> Yes, if guest virtiofs is mounted with '-o dax=inode' while virtiofsd is
> old and doesn't support per inode dax, then guest virtiofs will never
> receive FUSE_ATTR_DAX and actually behaves as '-o dax=never'. So the
> whole system works in this case, though the behavior may be beyond the
> expectation of users ....
> 
> If the behavior really matters, I could change the behavior and fail
> directly if virtiofsd doesn't advertise supporting per inode DAX.

I think it probably is better to error out if client asked for per-inode
DAX and server does not support it. 

Vivek
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 0/7] fuse,virtiofs: support per-file DAX
  2021-10-20  5:22   ` JeffleXu
@ 2021-10-20 16:06     ` Vivek Goyal
  0 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2021-10-20 16:06 UTC (permalink / raw)
  To: JeffleXu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Wed, Oct 20, 2021 at 01:22:32PM +0800, JeffleXu wrote:
> 
> 
> On 10/18/21 11:21 PM, Vivek Goyal wrote:
> > On Mon, Oct 11, 2021 at 11:00:45AM +0800, Jeffle Xu wrote:
> >> changes since v5:
> >> Overall Design Changes:
> >> 1. virtiofsd now supports ioctl (only FS_IOC_SETFLAGS and
> >>   FS_IOC_FSSETXATTR), so that users inside guest could set/clear
> >>   persistent inode flags now. (FUSE kernel module has already supported
> >>   .ioctl(), virtiofsd need to suuport it.)
> > 
> > So no changes needed in fuse side (kernel) to support FS_IOC_FSSETXATTR?
> > Only virtiofsd needs to be changed. That sounds good.
> > 
> 
> Yes, the fuse kernel modules has already supported FUSE_IOCTL.
> 
> Per inode DAX on ext4/xfs will also call d_mark_dontcache() and try to
> evict this inode as soon as possible when the persistent (DAX) inode
> attribute has changed, just like [1].
> 
> But because of following reason:
> > 
> >> 2. The
> >>   algorithm used by virtiofsd to determine whether DAX shall be enabled
> >>   or not is totally implementation specific, and thus the following
> >>   scenario may exist: users inside guest has already set related persistent
> >>   inode flag (i.e. FS_XFLAG_DAX) on corresponding file but FUSE server finnaly
> >>   decides not to enable DAX for this file.
> 
> If we always call d_mark_dontcache() and try to evict this inode when
> the persistent (DAX) inode attribute has changed, the DAX state returned
> by virtiofsd may sustain the same, and thus the previous eviction is
> totally wasted and unnecessary.
> 
> So, as the following said,
> 
> >> Also because of this, d_mark_dontcache() is
> >>   not called when FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl is done inside
> >>   guest. It's delayed to be done if the FUSE_ATTR_DAX flag **indeed**
> >>   changes (as showed in patch 6).
> 
> the call for d_mark_dontcache() and inode eviction is delayed when the
> DAX state returned by virtiofsd **indeed** changed (when dentry is timed
> out and a new FUSE_LOOKUP is requested). But the defect is that, if '-o
> cache=always' is set for virtiofsd, then the DAX state won't be updated
> for a long time, after users have changed the persistent (DAX) inode
> attribute inside guest via FS_IOC_FSSETXATTR ioctl.

Good point. I guess this probably is not too bad. If it becomes a concern,
we can always mark inode don't cache whenever client changes persistent
DAX flag.

Vivek
> 
> 
> 
> [1] https://www.spinics.net/lists/linux-fsdevel/msg200851.html
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-20 15:17       ` Vivek Goyal
@ 2021-10-22  6:54         ` JeffleXu
  2021-10-25 17:52           ` Ira Weiny
  0 siblings, 1 reply; 37+ messages in thread
From: JeffleXu @ 2021-10-22  6:54 UTC (permalink / raw)
  To: Vivek Goyal, Dave Chinner, ira.weiny
  Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

cc [Ira Weiny], author of per inode DAX on xfs/ext4

On 10/20/21 11:17 PM, Vivek Goyal wrote:
> On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
>>
>>
>> On 10/18/21 10:10 PM, Vivek Goyal wrote:
>>> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
>>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
>>>> operate the same which is equivalent to 'always'. To be consistemt with
>>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
>>>> option is specified, the default behaviour is equal to 'inode'.
>>>
>>> Hi Jeffle,
>>>
>>> I am not sure when  -o "dax=inode"  is used as a default? If user
>>> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
>>> user will explicitly specify "-o dax=always/never/inode". So when
>>> is dax=inode is used as default?
>>
>> That means when neither '-o dax' nor '-o dax=always/never/inode' is
>> specified, it is actually equal to '-o dax=inode', which is also how
>> per-file DAX on ext4/xfs works.
> 
> [ CC dave chinner] 
> 
> Is it not change of default behavior for ext4/xfs as well. My
> understanding is that prior to this new dax options, "-o dax" enabled
> dax on filesystem and if user did not specify it, DAX is disbaled
> by default.
> 
> Now after introduction of "-o dax=always/never/inode", if suddenly
> "-o dax=inode" became the default if user did not specify anything,
> that's change of behavior. Is that intentional. If given a choice,
> I would rather not change default and ask user to opt-in for
> appropriate dax functionality.
> 
> Dave, you might have thoughts on this. It makes me uncomfortable to
> change virtiofs dax default now just because other filesytems did it.
> 

I can only find the following discussions about the earliest record on
this tri-state mount option:

https://lore.kernel.org/lkml/20200316095509.GA13788@lst.de/
https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/


Hi, Ira Weiny,

Do you have any thought on this, i.e. why the default behavior has
changed after introduction of per inode dax?

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-22  6:54         ` JeffleXu
@ 2021-10-25 17:52           ` Ira Weiny
  2021-10-25 18:12             ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: Ira Weiny @ 2021-10-25 17:52 UTC (permalink / raw)
  To: JeffleXu
  Cc: Vivek Goyal, Dave Chinner, stefanha, miklos, linux-fsdevel,
	virtio-fs, bo.liu, joseph.qi

On Fri, Oct 22, 2021 at 02:54:03PM +0800, JeffleXu wrote:
> cc [Ira Weiny], author of per inode DAX on xfs/ext4
> 
> On 10/20/21 11:17 PM, Vivek Goyal wrote:
> > On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
> >>
> >>
> >> On 10/18/21 10:10 PM, Vivek Goyal wrote:
> >>> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
> >>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> >>>> operate the same which is equivalent to 'always'. To be consistemt with
> >>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> >>>> option is specified, the default behaviour is equal to 'inode'.
> >>>
> >>> Hi Jeffle,
> >>>
> >>> I am not sure when  -o "dax=inode"  is used as a default? If user
> >>> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
> >>> user will explicitly specify "-o dax=always/never/inode". So when
> >>> is dax=inode is used as default?
> >>
> >> That means when neither '-o dax' nor '-o dax=always/never/inode' is
> >> specified, it is actually equal to '-o dax=inode', which is also how
> >> per-file DAX on ext4/xfs works.
> > 

It's been a while so I'm fuzzy on the details of the discussions but yes that
is the way things are now in the code.

> > [ CC dave chinner] 
> > 
> > Is it not change of default behavior for ext4/xfs as well. My
> > understanding is that prior to this new dax options, "-o dax" enabled
> > dax on filesystem and if user did not specify it, DAX is disbaled
> > by default.

Technically it does change default behavior...  However, NOT in a way which
breaks anything.  See below.

> > 
> > Now after introduction of "-o dax=always/never/inode", if suddenly
> > "-o dax=inode" became the default if user did not specify anything,
> > that's change of behavior.

Technically yes but not in a broken way.

> >
> > Is that intentional. If given a choice,
> > I would rather not change default and ask user to opt-in for
> > appropriate dax functionality.

There is no need for this.

> > 
> > Dave, you might have thoughts on this. It makes me uncomfortable to
> > change virtiofs dax default now just because other filesytems did it.
> > 
> 
> I can only find the following discussions about the earliest record on
> this tri-state mount option:
> 
> https://lore.kernel.org/lkml/20200316095509.GA13788@lst.de/
> https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> 
> 
> Hi, Ira Weiny,
> 
> Do you have any thought on this, i.e. why the default behavior has
> changed after introduction of per inode dax?

While this is 'technically' different behavior the end user does not see any
difference in behavior if they continue without software changes.  Specifically
specifying nothing continues to operate with all the files on the FS to be
'_not_ DAX'.  While specifying '-o dax' forces DAX on all files.

This expands the default behavior in a backwards compatible manner.  The user
can now enable DAX on some files.  But this is an opt-in on the part of the
user of the FS and again does not change with existing software/scripts/etc.

Does that make sense?

Ira


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-25 17:52           ` Ira Weiny
@ 2021-10-25 18:12             ` Vivek Goyal
  2021-10-25 19:02               ` Ira Weiny
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-25 18:12 UTC (permalink / raw)
  To: Ira Weiny
  Cc: JeffleXu, Dave Chinner, stefanha, miklos, linux-fsdevel,
	virtio-fs, bo.liu, joseph.qi

On Mon, Oct 25, 2021 at 10:52:51AM -0700, Ira Weiny wrote:
> On Fri, Oct 22, 2021 at 02:54:03PM +0800, JeffleXu wrote:
> > cc [Ira Weiny], author of per inode DAX on xfs/ext4
> > 
> > On 10/20/21 11:17 PM, Vivek Goyal wrote:
> > > On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
> > >>
> > >>
> > >> On 10/18/21 10:10 PM, Vivek Goyal wrote:
> > >>> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
> > >>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> > >>>> operate the same which is equivalent to 'always'. To be consistemt with
> > >>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> > >>>> option is specified, the default behaviour is equal to 'inode'.
> > >>>
> > >>> Hi Jeffle,
> > >>>
> > >>> I am not sure when  -o "dax=inode"  is used as a default? If user
> > >>> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
> > >>> user will explicitly specify "-o dax=always/never/inode". So when
> > >>> is dax=inode is used as default?
> > >>
> > >> That means when neither '-o dax' nor '-o dax=always/never/inode' is
> > >> specified, it is actually equal to '-o dax=inode', which is also how
> > >> per-file DAX on ext4/xfs works.
> > > 
> 
> It's been a while so I'm fuzzy on the details of the discussions but yes that
> is the way things are now in the code.
> 
> > > [ CC dave chinner] 
> > > 
> > > Is it not change of default behavior for ext4/xfs as well. My
> > > understanding is that prior to this new dax options, "-o dax" enabled
> > > dax on filesystem and if user did not specify it, DAX is disbaled
> > > by default.
> 
> Technically it does change default behavior...  However, NOT in a way which
> breaks anything.  See below.
> 
> > > 
> > > Now after introduction of "-o dax=always/never/inode", if suddenly
> > > "-o dax=inode" became the default if user did not specify anything,
> > > that's change of behavior.
> 
> Technically yes but not in a broken way.
> 
> > >
> > > Is that intentional. If given a choice,
> > > I would rather not change default and ask user to opt-in for
> > > appropriate dax functionality.
> 
> There is no need for this.
> 
> > > 
> > > Dave, you might have thoughts on this. It makes me uncomfortable to
> > > change virtiofs dax default now just because other filesytems did it.
> > > 
> > 
> > I can only find the following discussions about the earliest record on
> > this tri-state mount option:
> > 
> > https://lore.kernel.org/lkml/20200316095509.GA13788@lst.de/
> > https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> > 
> > 
> > Hi, Ira Weiny,
> > 
> > Do you have any thought on this, i.e. why the default behavior has
> > changed after introduction of per inode dax?
> 
> While this is 'technically' different behavior the end user does not see any
> difference in behavior if they continue without software changes.  Specifically
> specifying nothing continues to operate with all the files on the FS to be
> '_not_ DAX'.  While specifying '-o dax' forces DAX on all files.
> 
> This expands the default behavior in a backwards compatible manner.

This is backward compatible in a sense that if somebody upgrades to new
kernel, things will still be same. 

I think little problematic change is that say I bring in persistent
memory from another system (which has FS_XFLAGS_DAX set on some inodes)
and then mount it without andy of the dax mount options, then per
inode dax will be enabled unexpectedly if I boot with newer kernels
but it will be disable if I mount with older kernels. Do I understand it
right.

> The user
> can now enable DAX on some files.  But this is an opt-in on the part of the
> user of the FS and again does not change with existing software/scripts/etc.

Don't understand this "opt-in" bit. If user mounts an fs without
specifying any of the dax options, then per inode dax will still be
enabled if inode has the correct flag set. So is setting of flag being
considered as opt-in (insted of mount option).

If setting of flag is being considered as opt-in, that probably will not
work very well with virtiofs. Because server can enforce a different
policy for enabling per file dax (instead of FS_XFLAG_DAX).

And given there are two entities here (client and server), I think it
will be good if if we give client a chance as well to decide whether
it wants to enable per file dax or not. I know it can alwasy do 
"dax=never" but it can still be broken if client software remains
same but host/server software is upgraded or commnad line changed.

So for virtiofs, I think better behavior is to continue to not enable
any dax until and unless user opts-in using "-o dax=foo" options.

Thanks
Vivek



> 
> Does that make sense?
> 
> Ira
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-25 18:12             ` Vivek Goyal
@ 2021-10-25 19:02               ` Ira Weiny
  2021-10-25 19:33                 ` Vivek Goyal
  2021-10-27  6:00                 ` JeffleXu
  0 siblings, 2 replies; 37+ messages in thread
From: Ira Weiny @ 2021-10-25 19:02 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: JeffleXu, Dave Chinner, stefanha, miklos, linux-fsdevel,
	virtio-fs, bo.liu, joseph.qi

On Mon, Oct 25, 2021 at 02:12:10PM -0400, Vivek Goyal wrote:
> On Mon, Oct 25, 2021 at 10:52:51AM -0700, Ira Weiny wrote:
> > On Fri, Oct 22, 2021 at 02:54:03PM +0800, JeffleXu wrote:
> > > cc [Ira Weiny], author of per inode DAX on xfs/ext4
> > > 
> > > On 10/20/21 11:17 PM, Vivek Goyal wrote:
> > > > On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
> > > >>
> > > >>
> > > >> On 10/18/21 10:10 PM, Vivek Goyal wrote:
> > > >>> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
> > > >>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> > > >>>> operate the same which is equivalent to 'always'. To be consistemt with
> > > >>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> > > >>>> option is specified, the default behaviour is equal to 'inode'.
> > > >>>
> > > >>> Hi Jeffle,
> > > >>>
> > > >>> I am not sure when  -o "dax=inode"  is used as a default? If user
> > > >>> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
> > > >>> user will explicitly specify "-o dax=always/never/inode". So when
> > > >>> is dax=inode is used as default?
> > > >>
> > > >> That means when neither '-o dax' nor '-o dax=always/never/inode' is
> > > >> specified, it is actually equal to '-o dax=inode', which is also how
> > > >> per-file DAX on ext4/xfs works.
> > > > 
> > 
> > It's been a while so I'm fuzzy on the details of the discussions but yes that
> > is the way things are now in the code.
> > 
> > > > [ CC dave chinner] 
> > > > 
> > > > Is it not change of default behavior for ext4/xfs as well. My
> > > > understanding is that prior to this new dax options, "-o dax" enabled
> > > > dax on filesystem and if user did not specify it, DAX is disbaled
> > > > by default.
> > 
> > Technically it does change default behavior...  However, NOT in a way which
> > breaks anything.  See below.
> > 
> > > > 
> > > > Now after introduction of "-o dax=always/never/inode", if suddenly
> > > > "-o dax=inode" became the default if user did not specify anything,
> > > > that's change of behavior.
> > 
> > Technically yes but not in a broken way.
> > 
> > > >
> > > > Is that intentional. If given a choice,
> > > > I would rather not change default and ask user to opt-in for
> > > > appropriate dax functionality.
> > 
> > There is no need for this.
> > 
> > > > 
> > > > Dave, you might have thoughts on this. It makes me uncomfortable to
> > > > change virtiofs dax default now just because other filesytems did it.
> > > > 
> > > 
> > > I can only find the following discussions about the earliest record on
> > > this tri-state mount option:
> > > 
> > > https://lore.kernel.org/lkml/20200316095509.GA13788@lst.de/
> > > https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> > > 
> > > 
> > > Hi, Ira Weiny,
> > > 
> > > Do you have any thought on this, i.e. why the default behavior has
> > > changed after introduction of per inode dax?
> > 
> > While this is 'technically' different behavior the end user does not see any
> > difference in behavior if they continue without software changes.  Specifically
> > specifying nothing continues to operate with all the files on the FS to be
> > '_not_ DAX'.  While specifying '-o dax' forces DAX on all files.
> > 
> > This expands the default behavior in a backwards compatible manner.
> 
> This is backward compatible in a sense that if somebody upgrades to new
> kernel, things will still be same. 
> 
> I think little problematic change is that say I bring in persistent
> memory from another system (which has FS_XFLAGS_DAX set on some inodes)
> and then mount it without andy of the dax mount options, then per
> inode dax will be enabled unexpectedly if I boot with newer kernels
> but it will be disable if I mount with older kernels. Do I understand it
> right.

Indeed that will happen.  However, wouldn't the users (software) of those files
have knowledge that those files were DAX and want to continue with them in that
mode?

> 
> > The user
> > can now enable DAX on some files.  But this is an opt-in on the part of the
> > user of the FS and again does not change with existing software/scripts/etc.
> 
> Don't understand this "opt-in" bit. If user mounts an fs without
> specifying any of the dax options, then per inode dax will still be
> enabled if inode has the correct flag set.

But only users who actually set that flag 'opt-in'.

> So is setting of flag being
> considered as opt-in (insted of mount option).

Yes.

> 
> If setting of flag is being considered as opt-in, that probably will not
> work very well with virtiofs. Because server can enforce a different
> policy for enabling per file dax (instead of FS_XFLAG_DAX).

I'm not sure I understand how this happens?  I think the server probably has to
enable per INODE by default to allow the client to do what the end users wants.

I agree that if the end user is expecting DAX and the server disables it then
that is a problem but couldn't that happen before?  Maybe I'm getting confused
because I'm not familiar enough with virtiofs.

> 
> And given there are two entities here (client and server), I think it
> will be good if if we give client a chance as well to decide whether
> it wants to enable per file dax or not. I know it can alwasy do 
> "dax=never" but it can still be broken if client software remains
> same but host/server software is upgraded or commnad line changed.

But the files are 'owned' by a single user or group of users who must have
placed the file in DAX mode at some point right?

> 
> So for virtiofs, I think better behavior is to continue to not enable
> any dax until and unless user opts-in using "-o dax=foo" options.

I'm not sure, maybe.

Ira

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-25 19:02               ` Ira Weiny
@ 2021-10-25 19:33                 ` Vivek Goyal
  2021-10-25 20:41                   ` Ira Weiny
  2021-10-27  6:00                 ` JeffleXu
  1 sibling, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-25 19:33 UTC (permalink / raw)
  To: Ira Weiny
  Cc: JeffleXu, Dave Chinner, stefanha, miklos, linux-fsdevel,
	virtio-fs, bo.liu, joseph.qi

On Mon, Oct 25, 2021 at 12:02:01PM -0700, Ira Weiny wrote:
> On Mon, Oct 25, 2021 at 02:12:10PM -0400, Vivek Goyal wrote:
> > On Mon, Oct 25, 2021 at 10:52:51AM -0700, Ira Weiny wrote:
> > > On Fri, Oct 22, 2021 at 02:54:03PM +0800, JeffleXu wrote:
> > > > cc [Ira Weiny], author of per inode DAX on xfs/ext4
> > > > 
> > > > On 10/20/21 11:17 PM, Vivek Goyal wrote:
> > > > > On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
> > > > >>
> > > > >>
> > > > >> On 10/18/21 10:10 PM, Vivek Goyal wrote:
> > > > >>> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
> > > > >>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> > > > >>>> operate the same which is equivalent to 'always'. To be consistemt with
> > > > >>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> > > > >>>> option is specified, the default behaviour is equal to 'inode'.
> > > > >>>
> > > > >>> Hi Jeffle,
> > > > >>>
> > > > >>> I am not sure when  -o "dax=inode"  is used as a default? If user
> > > > >>> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
> > > > >>> user will explicitly specify "-o dax=always/never/inode". So when
> > > > >>> is dax=inode is used as default?
> > > > >>
> > > > >> That means when neither '-o dax' nor '-o dax=always/never/inode' is
> > > > >> specified, it is actually equal to '-o dax=inode', which is also how
> > > > >> per-file DAX on ext4/xfs works.
> > > > > 
> > > 
> > > It's been a while so I'm fuzzy on the details of the discussions but yes that
> > > is the way things are now in the code.
> > > 
> > > > > [ CC dave chinner] 
> > > > > 
> > > > > Is it not change of default behavior for ext4/xfs as well. My
> > > > > understanding is that prior to this new dax options, "-o dax" enabled
> > > > > dax on filesystem and if user did not specify it, DAX is disbaled
> > > > > by default.
> > > 
> > > Technically it does change default behavior...  However, NOT in a way which
> > > breaks anything.  See below.
> > > 
> > > > > 
> > > > > Now after introduction of "-o dax=always/never/inode", if suddenly
> > > > > "-o dax=inode" became the default if user did not specify anything,
> > > > > that's change of behavior.
> > > 
> > > Technically yes but not in a broken way.
> > > 
> > > > >
> > > > > Is that intentional. If given a choice,
> > > > > I would rather not change default and ask user to opt-in for
> > > > > appropriate dax functionality.
> > > 
> > > There is no need for this.
> > > 
> > > > > 
> > > > > Dave, you might have thoughts on this. It makes me uncomfortable to
> > > > > change virtiofs dax default now just because other filesytems did it.
> > > > > 
> > > > 
> > > > I can only find the following discussions about the earliest record on
> > > > this tri-state mount option:
> > > > 
> > > > https://lore.kernel.org/lkml/20200316095509.GA13788@lst.de/
> > > > https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> > > > 
> > > > 
> > > > Hi, Ira Weiny,
> > > > 
> > > > Do you have any thought on this, i.e. why the default behavior has
> > > > changed after introduction of per inode dax?
> > > 
> > > While this is 'technically' different behavior the end user does not see any
> > > difference in behavior if they continue without software changes.  Specifically
> > > specifying nothing continues to operate with all the files on the FS to be
> > > '_not_ DAX'.  While specifying '-o dax' forces DAX on all files.
> > > 
> > > This expands the default behavior in a backwards compatible manner.
> > 
> > This is backward compatible in a sense that if somebody upgrades to new
> > kernel, things will still be same. 
> > 
> > I think little problematic change is that say I bring in persistent
> > memory from another system (which has FS_XFLAGS_DAX set on some inodes)
> > and then mount it without andy of the dax mount options, then per
> > inode dax will be enabled unexpectedly if I boot with newer kernels
> > but it will be disable if I mount with older kernels. Do I understand it
> > right.
> 
> Indeed that will happen.  However, wouldn't the users (software) of those files
> have knowledge that those files were DAX and want to continue with them in that
> mode?

I am not sure. Say before per-inode dax feature, I had written a script
which walks though all the mount points and figure out if dax is enabled
or not. I could simply look at mount options and tell if dax could be
enabled or not.

But now same script will give false results as per inode dax could
still be enabled.

> 
> > 
> > > The user
> > > can now enable DAX on some files.  But this is an opt-in on the part of the
> > > user of the FS and again does not change with existing software/scripts/etc.
> > 
> > Don't understand this "opt-in" bit. If user mounts an fs without
> > specifying any of the dax options, then per inode dax will still be
> > enabled if inode has the correct flag set.
> 
> But only users who actually set that flag 'opt-in'.
> 
> > So is setting of flag being
> > considered as opt-in (insted of mount option).
> 
> Yes.
> 
> > 
> > If setting of flag is being considered as opt-in, that probably will not
> > work very well with virtiofs. Because server can enforce a different
> > policy for enabling per file dax (instead of FS_XFLAG_DAX).
> 
> I'm not sure I understand how this happens?  I think the server probably has to
> enable per INODE by default to allow the client to do what the end users wants.
> 

Server can have either per inode disabled or enabled. If enabled, it could
determine DAX status of file based on FS_XFLAG_DAX or based on something
else depending on server policy. Users want to be able to determine
DAX status of file based on say file size.

> I agree that if the end user is expecting DAX and the server disables it then
> that is a problem but couldn't that happen before?

If end user expects to enable DAX and sever can't enable it, then mount
fails. So currently if you mount "-o dax" and server does not support
DAX, mount will fail.

I think same should happen when per inode DAX is introduced for virtiofs.
If sever does not support per inode dax and user mounts with "-o
dax=inode", then mount should fail.

In fact, this is another reason that probably "dax=inode" should not be
default. Say client is new and server is old and does not support
per inode dax, then client might start failing mount after client
upgrade, and that's not good.

More I think about it, more it feels like that "dax=never" should be
the default if user has not specified any of the dax options. This
probably will introduce least amount of surprise. Atleast for virtiofs.
IMHO, it probably would have made sense even for ext4/xfs but that
ship has already sailed.

> Maybe I'm getting confused
> because I'm not familiar enough with virtiofs.
> 
> > 
> > And given there are two entities here (client and server), I think it
> > will be good if if we give client a chance as well to decide whether
> > it wants to enable per file dax or not. I know it can alwasy do 
> > "dax=never" but it can still be broken if client software remains
> > same but host/server software is upgraded or commnad line changed.
> 
> But the files are 'owned' by a single user or group of users who must have
> placed the file in DAX mode at some point right?

Yes, either users/groups/admin might have set FS_XFLAG_DAX on inodes. But
now there is another controller (virtiofs server) which determines whether
that flag takes affect or not (based on server settings).

We did not have this server scenario in case of local filesystems.

Thanks
Vivek
>
> > 
> > So for virtiofs, I think better behavior is to continue to not enable
> > any dax until and unless user opts-in using "-o dax=foo" options.
> 
> I'm not sure, maybe.
> 
> Ira
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-25 19:33                 ` Vivek Goyal
@ 2021-10-25 20:41                   ` Ira Weiny
  2021-10-26 13:45                     ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: Ira Weiny @ 2021-10-25 20:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: JeffleXu, Dave Chinner, stefanha, miklos, linux-fsdevel,
	virtio-fs, bo.liu, joseph.qi

On Mon, Oct 25, 2021 at 03:33:31PM -0400, Vivek Goyal wrote:

[snip]

> > > > > > 
> > > > > 
> > > > > I can only find the following discussions about the earliest record on
> > > > > this tri-state mount option:
> > > > > 
> > > > > https://lore.kernel.org/lkml/20200316095509.GA13788@lst.de/
> > > > > https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> > > > > 
> > > > > 
> > > > > Hi, Ira Weiny,
> > > > > 
> > > > > Do you have any thought on this, i.e. why the default behavior has
> > > > > changed after introduction of per inode dax?
> > > > 
> > > > While this is 'technically' different behavior the end user does not see any
> > > > difference in behavior if they continue without software changes.  Specifically
> > > > specifying nothing continues to operate with all the files on the FS to be
> > > > '_not_ DAX'.  While specifying '-o dax' forces DAX on all files.
> > > > 
> > > > This expands the default behavior in a backwards compatible manner.
> > > 
> > > This is backward compatible in a sense that if somebody upgrades to new
> > > kernel, things will still be same. 
> > > 
> > > I think little problematic change is that say I bring in persistent
> > > memory from another system (which has FS_XFLAGS_DAX set on some inodes)
> > > and then mount it without andy of the dax mount options, then per
> > > inode dax will be enabled unexpectedly if I boot with newer kernels
> > > but it will be disable if I mount with older kernels. Do I understand it
> > > right.
> > 
> > Indeed that will happen.  However, wouldn't the users (software) of those files
> > have knowledge that those files were DAX and want to continue with them in that
> > mode?
> 
> I am not sure. Say before per-inode dax feature, I had written a script
> which walks though all the mount points and figure out if dax is enabled
> or not. I could simply look at mount options and tell if dax could be
> enabled or not.
> 
> But now same script will give false results as per inode dax could
> still be enabled.

The mount option is being deprecated.  So it is best to start to phase out
scripts like that.

> 
> > 
> > > 
> > > > The user
> > > > can now enable DAX on some files.  But this is an opt-in on the part of the
> > > > user of the FS and again does not change with existing software/scripts/etc.
> > > 
> > > Don't understand this "opt-in" bit. If user mounts an fs without
> > > specifying any of the dax options, then per inode dax will still be
> > > enabled if inode has the correct flag set.
> > 
> > But only users who actually set that flag 'opt-in'.
> > 
> > > So is setting of flag being
> > > considered as opt-in (insted of mount option).
> > 
> > Yes.
> > 
> > > 
> > > If setting of flag is being considered as opt-in, that probably will not
> > > work very well with virtiofs. Because server can enforce a different
> > > policy for enabling per file dax (instead of FS_XFLAG_DAX).
> > 
> > I'm not sure I understand how this happens?  I think the server probably has to
> > enable per INODE by default to allow the client to do what the end users wants.
> > 
> 
> Server can have either per inode disabled or enabled. If enabled, it could
> determine DAX status of file based on FS_XFLAG_DAX or based on something
> else depending on server policy. Users want to be able to determine
> DAX status of file based on say file size.

'file size'?  I'm not sure how that would work.  Did you mean something else?

> 
> > I agree that if the end user is expecting DAX and the server disables it then
> > that is a problem but couldn't that happen before?
> 
> If end user expects to enable DAX and sever can't enable it, then mount
> fails. So currently if you mount "-o dax" and server does not support
> DAX, mount will fail.

The same could happen on a server where the underlying device does not support
DAX.  What if the server was mounted without '-o dax'?  Wouldn't a client mount
with '-o dax' fail now?  So why can't the same be true with the new set of
options?

> 
> I think same should happen when per inode DAX is introduced for virtiofs.
> If sever does not support per inode dax and user mounts with "-o
> dax=inode", then mount should fail.

I think that is reasonable.  The client can't mount with something the server
can't support.

> 
> In fact, this is another reason that probably "dax=inode" should not be
> default. Say client is new and server is old and does not support
> per inode dax, then client might start failing mount after client
> upgrade, and that's not good.

Shouldn't the client fall back to whatever the server supports?  It is the same
as the client wanting DAX now without server and/or device support.  It just
can't get it.  Right?

> 
> More I think about it, more it feels like that "dax=never" should be
> the default if user has not specified any of the dax options. This
> probably will introduce least amount of surprise. Atleast for virtiofs.
> IMHO, it probably would have made sense even for ext4/xfs but that
> ship has already sailed.

I disagree because dax=never is backwards from what we really want for the
future.  'dax=inode' is the most flexible setting.  In fact that setting is
best for the server by default which allows more control to be in the clients
hands.  Would you agree?

> 
> > Maybe I'm getting confused
> > because I'm not familiar enough with virtiofs.
> > 
> > > 
> > > And given there are two entities here (client and server), I think it
> > > will be good if if we give client a chance as well to decide whether
> > > it wants to enable per file dax or not. I know it can alwasy do 
> > > "dax=never" but it can still be broken if client software remains
> > > same but host/server software is upgraded or commnad line changed.
> > 
> > But the files are 'owned' by a single user or group of users who must have
> > placed the file in DAX mode at some point right?
> 
> Yes, either users/groups/admin might have set FS_XFLAG_DAX on inodes. But
> now there is another controller (virtiofs server) which determines whether
> that flag takes affect or not (based on server settings).

I think this is just like the file being on a device which does not support
DAX.  The file inode flag can be set but the file will not be in DAX mode on a
non-dax device.  So in this case the server is a non-dax device.

Ira

> 
> We did not have this server scenario in case of local filesystems.
> 
> Thanks
> Vivek
> >
> > > 
> > > So for virtiofs, I think better behavior is to continue to not enable
> > > any dax until and unless user opts-in using "-o dax=foo" options.
> > 
> > I'm not sure, maybe.
> > 
> > Ira
> > 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-25 20:41                   ` Ira Weiny
@ 2021-10-26 13:45                     ` Vivek Goyal
  0 siblings, 0 replies; 37+ messages in thread
From: Vivek Goyal @ 2021-10-26 13:45 UTC (permalink / raw)
  To: Ira Weiny
  Cc: JeffleXu, Dave Chinner, stefanha, miklos, linux-fsdevel,
	virtio-fs, bo.liu, joseph.qi

On Mon, Oct 25, 2021 at 01:41:45PM -0700, Ira Weiny wrote:
> On Mon, Oct 25, 2021 at 03:33:31PM -0400, Vivek Goyal wrote:
> 
> [snip]
> 
> > > > > > > 
> > > > > > 
> > > > > > I can only find the following discussions about the earliest record on
> > > > > > this tri-state mount option:
> > > > > > 
> > > > > > https://lore.kernel.org/lkml/20200316095509.GA13788@lst.de/
> > > > > > https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> > > > > > 
> > > > > > 
> > > > > > Hi, Ira Weiny,
> > > > > > 
> > > > > > Do you have any thought on this, i.e. why the default behavior has
> > > > > > changed after introduction of per inode dax?
> > > > > 
> > > > > While this is 'technically' different behavior the end user does not see any
> > > > > difference in behavior if they continue without software changes.  Specifically
> > > > > specifying nothing continues to operate with all the files on the FS to be
> > > > > '_not_ DAX'.  While specifying '-o dax' forces DAX on all files.
> > > > > 
> > > > > This expands the default behavior in a backwards compatible manner.
> > > > 
> > > > This is backward compatible in a sense that if somebody upgrades to new
> > > > kernel, things will still be same. 
> > > > 
> > > > I think little problematic change is that say I bring in persistent
> > > > memory from another system (which has FS_XFLAGS_DAX set on some inodes)
> > > > and then mount it without andy of the dax mount options, then per
> > > > inode dax will be enabled unexpectedly if I boot with newer kernels
> > > > but it will be disable if I mount with older kernels. Do I understand it
> > > > right.
> > > 
> > > Indeed that will happen.  However, wouldn't the users (software) of those files
> > > have knowledge that those files were DAX and want to continue with them in that
> > > mode?
> > 
> > I am not sure. Say before per-inode dax feature, I had written a script
> > which walks though all the mount points and figure out if dax is enabled
> > or not. I could simply look at mount options and tell if dax could be
> > enabled or not.
> > 
> > But now same script will give false results as per inode dax could
> > still be enabled.
> 
> The mount option is being deprecated.  So it is best to start to phase out
> scripts like that.

Sure. But this change does break such scripts (if there is any). I am
just responding to previous comments that existing software/scripts
should not be broken. 

> 
> > 
> > > 
> > > > 
> > > > > The user
> > > > > can now enable DAX on some files.  But this is an opt-in on the part of the
> > > > > user of the FS and again does not change with existing software/scripts/etc.
> > > > 
> > > > Don't understand this "opt-in" bit. If user mounts an fs without
> > > > specifying any of the dax options, then per inode dax will still be
> > > > enabled if inode has the correct flag set.
> > > 
> > > But only users who actually set that flag 'opt-in'.
> > > 
> > > > So is setting of flag being
> > > > considered as opt-in (insted of mount option).
> > > 
> > > Yes.
> > > 
> > > > 
> > > > If setting of flag is being considered as opt-in, that probably will not
> > > > work very well with virtiofs. Because server can enforce a different
> > > > policy for enabling per file dax (instead of FS_XFLAG_DAX).
> > > 
> > > I'm not sure I understand how this happens?  I think the server probably has to
> > > enable per INODE by default to allow the client to do what the end users wants.
> > > 
> > 
> > Server can have either per inode disabled or enabled. If enabled, it could
> > determine DAX status of file based on FS_XFLAG_DAX or based on something
> > else depending on server policy. Users want to be able to determine
> > DAX status of file based on say file size.
> 
> 'file size'?  I'm not sure how that would work.  Did you mean something else?

So virtiofs uses DAX only to bypass page cache in guest. virtiofs pci
device advertizes a range of memory which is directly accessed using
dax. We use a chunk size of 2MB. That means for every 2MB chunk, there
will be around 512 pages. Each struct page will consume around 64 bytes
of RAM in guest. So for every 2MB chunk of file, RAM usage in guest
is around 512 * 64 = 32768 (32Kib). 

So there are users who claim that for smaller files say 4K or 8K in size,
it is probably better to not use DAX at all. In that case we will use
say 4K of page cache and leave DAX memory to be used for larger files.
(This will be useful only if virtiofs cache memory is in short supply). 

Hence the idea that why not use per inode dax and enable dax selectively
on files as needed. Given we have a remote server running, it gives
extra capability that we can take this DAX decision dynamically based
on some server policy (and not necessarily rely on FS_XFLAG_DAX stuff).

So once such policy is file size based policy. Where if a file size 
is small, server might not want to use DAX on that file. There could
be many more such policies depending on where DAX is most useful
in the context of virtiofs.

> 
> > 
> > > I agree that if the end user is expecting DAX and the server disables it then
> > > that is a problem but couldn't that happen before?
> > 
> > If end user expects to enable DAX and sever can't enable it, then mount
> > fails. So currently if you mount "-o dax" and server does not support
> > DAX, mount will fail.
> 
> The same could happen on a server where the underlying device does not support
> DAX.  What if the server was mounted without '-o dax'?

In general, there is no connection between DAX in guest and device on
host enabling DAX. We can very well enable DAX in guest without having
any DAX enabled on host device. From virtiofs perspective, we are just
mmapping host files in qemu address space and that works both with
dax enabled/disabled devices on host.

> Wouldn't a client mount
> with '-o dax' fail now?  So why can't the same be true with the new set of
> options?

So yes, if server does not support DAX and client asks for DAX, mount
will fail. (As it should fail).

Problem with enabling "dax=inode" by default is that if a client
is mounted without any dax option, then dax is disabled. Now if a server
is upgraded and restarted with some dax policy enabled, suddenly dax
will be enabled in client without it opting in for anything and client
might be surprised.

Now one argument can be hey, we have FS_XFLAG_DAX set on inode, so it
is ok to turn on dax. May be. But virtiofs serever can have its own
dax policies (like file size based policy), and it can ignore
FS_XFLAG_DAX completely. In that case enabling per inode dax by default
(without client opting in), seems contrary to what we are doing now.

Hence, I think not having "dax=inode" as default, is path of least
surprise for an existing user. A user can easily tell whether dax
is being used or not just by looking at filesystem mount optins.

> 
> > 
> > I think same should happen when per inode DAX is introduced for virtiofs.
> > If sever does not support per inode dax and user mounts with "-o
> > dax=inode", then mount should fail.
> 
> I think that is reasonable.  The client can't mount with something the server
> can't support.
> 
> > 
> > In fact, this is another reason that probably "dax=inode" should not be
> > default. Say client is new and server is old and does not support
> > per inode dax, then client might start failing mount after client
> > upgrade, and that's not good.
> 
> Shouldn't the client fall back to whatever the server supports?  It is the same
> as the client wanting DAX now without server and/or device support.  It just
> can't get it.  Right?

Well, current model is that fail the operation and let user try mount
again without DAX.

If we were to design fallback, then question will be how will user know
that server does not support DAX and we fallback to non-dax. Also it will
be change of behavior as well from exsiting non-fallback semantics.

I guess one could argue that if you are moving to new dax options
(-o dax=inode/always/never), then this is an opportunity to move to
fallback model. My concern remains thought that if user specified
"dax=inode or dax=always" and server does not support, how will user
know we are not using dax. 

Not sure there is a good answer here. In some cases users like to
see explicit failure if some option can't be supported. IIRC, in case
of overalayfs, if users passed in "-o metacopy=on" and if overlayfs
can't enable it, then users expected a failure (instead of a ignoring
metacopy silently).

So choosing not to fallback seems ok to be. Nobody has complained so far.

> 
> > 
> > More I think about it, more it feels like that "dax=never" should be
> > the default if user has not specified any of the dax options. This
> > probably will introduce least amount of surprise. Atleast for virtiofs.
> > IMHO, it probably would have made sense even for ext4/xfs but that
> > ship has already sailed.
> 
> I disagree because dax=never is backwards from what we really want for the
> future.  'dax=inode' is the most flexible setting.

If your goal is to enable dax by default if FS_XFLAG_DAX is set, then'
yes dax=inode default makes sense. I was only complaining about change
of behavior in some cases. I mean one coule argue same thing for
dax=always. If block device supports dax, then enable dax by default
until and unless user specifies "-o dax=never". But previous options
were not designed that way. A user had to opt-in for DAX behavior
even if device had the capability to support DAX.

And in the same line, I am arguing a user should opt-in for per inode
DAX, even if inode has the capability to be used as DAX inode.

And I don't mind "dax=inode" being default if that's deemed more useful.
My concern there is only change of behavior by default.

> In fact that setting is
> best for the server by default which allows more control to be in the clients
> hands.  Would you agree?

"dax=inode" on server default makes sense to me (as long as client asked
for dax=inode). Should it be enabled by default in client, I am still
afraid of change of behavior from existing dax mount options and having
to explain and justify change of behavior to users.

> 
> > 
> > > Maybe I'm getting confused
> > > because I'm not familiar enough with virtiofs.
> > > 
> > > > 
> > > > And given there are two entities here (client and server), I think it
> > > > will be good if if we give client a chance as well to decide whether
> > > > it wants to enable per file dax or not. I know it can alwasy do 
> > > > "dax=never" but it can still be broken if client software remains
> > > > same but host/server software is upgraded or commnad line changed.
> > > 
> > > But the files are 'owned' by a single user or group of users who must have
> > > placed the file in DAX mode at some point right?
> > 
> > Yes, either users/groups/admin might have set FS_XFLAG_DAX on inodes. But
> > now there is another controller (virtiofs server) which determines whether
> > that flag takes affect or not (based on server settings).
> 
> I think this is just like the file being on a device which does not support
> DAX.  The file inode flag can be set but the file will not be in DAX mode on a
> non-dax device.  So in this case the server is a non-dax device.

So if I mount with "dax=inode or dax=always" and block device does not
support DAX, what happens. Mount fails or it fallsback siliently to
non-dax mode?

I suspect that in new dax options it falls back to non-dax mode. And
your argument seems to be that user should stat every file and
query for STATX_ATTR_DAX to determine if dax is enabled on file
or not.

One one hand, I am not too fond of this new semantics of automatic fallback
and dax=inode default, and on the other hand, I want to be as close
as possible to ext4/xfs semantics so that there is less confusion for
users.

Vivek


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 6/7] fuse: mark inode DONT_CACHE when per-file DAX hint changes
  2021-10-18 15:19   ` Vivek Goyal
@ 2021-10-27  5:05     ` JeffleXu
  0 siblings, 0 replies; 37+ messages in thread
From: JeffleXu @ 2021-10-27  5:05 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

Sorry for the late reply, as your previous reply was moved to junk box
by the algorithm...

On 10/18/21 11:19 PM, Vivek Goyal wrote:
> On Mon, Oct 11, 2021 at 11:00:51AM +0800, Jeffle Xu wrote:
>> When the per-file DAX hint changes while the file is still *opened*, it
>> is quite complicated and maybe fragile to dynamically change the DAX
>> state.
>>
>> Hence mark the inode and corresponding dentries as DONE_CACHE once the
>> per-file DAX hint changes, so that the inode instance will be evicted
>> and freed as soon as possible once the file is closed and the last
>> reference to the inode is put. And then when the file gets reopened next
>> time, the new instantiated inode will reflect the new DAX state.
>>
>> In summary, when the per-file DAX hint changes for an *opened* file, the
>> DAX state of the file won't be updated until this file is closed and
>> reopened later.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/fuse/dax.c    | 9 +++++++++
>>  fs/fuse/fuse_i.h | 1 +
>>  fs/fuse/inode.c  | 3 +++
>>  3 files changed, 13 insertions(+)
>>
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index 15bde36829b8..ca083c13f5e8 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>>  }
>>  
>> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
>> +{
>> +	struct fuse_conn *fc = get_fuse_conn(inode);
>> +
>> +	if (fc->dax_mode == FUSE_DAX_INODE &&
>> +	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
>> +		d_mark_dontcache(inode);
>> +}
>> +
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>>  {
>>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index 0270a41c31d7..bb2c11e0311a 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -1270,6 +1270,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>>  void fuse_dax_inode_cleanup(struct inode *inode);
>> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>>  
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 73f19cd6e702..cf934c2ba761 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -268,6 +268,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>>  		if (inval)
>>  			invalidate_inode_pages2(inode->i_mapping);
>>  	}
>> +
>> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
>> +		fuse_dax_dontcache(inode, attr->flags);
> 
> Should we give this function more generic name. Say
> fuse_dax_change_attributes(). And let that function decide what attributes
> have changed and does it need to take any action.
> 

But currently we only need to handle 'attr->flags & FUSE_ATTR_DAX'. If
other attributes need to be handled later, then we can expand this
function and give it a more generic name. But as for now, I prefer to
keep it simple.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-25 19:02               ` Ira Weiny
  2021-10-25 19:33                 ` Vivek Goyal
@ 2021-10-27  6:00                 ` JeffleXu
  1 sibling, 0 replies; 37+ messages in thread
From: JeffleXu @ 2021-10-27  6:00 UTC (permalink / raw)
  To: Ira Weiny, Vivek Goyal
  Cc: Dave Chinner, stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu,
	joseph.qi

Thanks for your replying, Ira Weiny.


On 10/26/21 3:02 AM, Ira Weiny wrote:
> [snippet]
>>>> Hi, Ira Weiny,
>>>>
>>>> Do you have any thought on this, i.e. why the default behavior has
>>>> changed after introduction of per inode dax?
>>>
>>> While this is 'technically' different behavior the end user does not see any
>>> difference in behavior if they continue without software changes.  Specifically
>>> specifying nothing continues to operate with all the files on the FS to be
>>> '_not_ DAX'.  While specifying '-o dax' forces DAX on all files.
>>>
>>> This expands the default behavior in a backwards compatible manner.
>>
>> This is backward compatible in a sense that if somebody upgrades to new
>> kernel, things will still be same. 
>>
>> I think little problematic change is that say I bring in persistent
>> memory from another system (which has FS_XFLAGS_DAX set on some inodes)
>> and then mount it without andy of the dax mount options, then per
>> inode dax will be enabled unexpectedly if I boot with newer kernels
>> but it will be disable if I mount with older kernels. Do I understand it
>> right.
> 
> Indeed that will happen.  However, wouldn't the users (software) of those files
> have knowledge that those files were DAX and want to continue with them in that
> mode?
> 
>>
>>> The user
>>> can now enable DAX on some files.  But this is an opt-in on the part of the
>>> user of the FS and again does not change with existing software/scripts/etc.
>>
>> Don't understand this "opt-in" bit. If user mounts an fs without
>> specifying any of the dax options, then per inode dax will still be
>> enabled if inode has the correct flag set.
> 
> But only users who actually set that flag 'opt-in'.
> 
>> So is setting of flag being
>> considered as opt-in (insted of mount option).
> 
> Yes.
> 
>>
>> If setting of flag is being considered as opt-in, that probably will not
>> work very well with virtiofs. Because server can enforce a different
>> policy for enabling per file dax (instead of FS_XFLAG_DAX).
> 
> I'm not sure I understand how this happens?  I think the server probably has to
> enable per INODE by default to allow the client to do what the end users wants.
> 
> I agree that if the end user is expecting DAX and the server disables it then
> that is a problem but couldn't that happen before?  Maybe I'm getting confused
> because I'm not familiar enough with virtiofs.
> 
>>
>> And given there are two entities here (client and server), I think it
>> will be good if if we give client a chance as well to decide whether
>> it wants to enable per file dax or not. I know it can alwasy do 
>> "dax=never" but it can still be broken if client software remains
>> same but host/server software is upgraded or commnad line changed.
> 
> But the files are 'owned' by a single user or group of users who must have
> placed the file in DAX mode at some point right?

So this is the essence of this issue, i.e. whether those who mount the
filesystem (responsible for specifying mount options) and those who set
the persistent inode flag are one same group people.

For local filesystem like ext4/xfs, these two entities are most likely
one group people, so we can say that 'the default behavior is still
backward compatible'.

However this semantic can be challenged a little by the example exposed
by Vivek, that these two entities may not be one group even in local
filesystem. Though this case may be rare in real world.

But for remote filesystem like virtiofs, the deviation between these two
entities can be larger. For example, if the exported directory on host
is shared by two guest and guest A sets the persistent inode flag for
one file, then guest B will also see that DAX is enabled for this file
when the virtiofs is mounted with the default option inside guest B. In
this case, the persistent indoe flag is not set by guest B itself nor
the server, and it may break the expectation of guest B.

> 
>>
>> So for virtiofs, I think better behavior is to continue to not enable
>> any dax until and unless user opts-in using "-o dax=foo" options.
> 

I also prefer keeping the 'dax=never' default behavior for virtiofs.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-20 14:48       ` Vivek Goyal
@ 2021-10-29  8:33         ` JeffleXu
  2021-10-29 13:03           ` Vivek Goyal
  0 siblings, 1 reply; 37+ messages in thread
From: JeffleXu @ 2021-10-29  8:33 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi



On 10/20/21 10:48 PM, Vivek Goyal wrote:
> On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
>>
>>
>> On 10/18/21 10:10 PM, Vivek Goyal wrote:
>>> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
>>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
>>>> operate the same which is equivalent to 'always'. To be consistemt with
>>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
>>>> option is specified, the default behaviour is equal to 'inode'.
>>>
>>> Hi Jeffle,
>>>
>>> I am not sure when  -o "dax=inode"  is used as a default? If user
>>> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
>>> user will explicitly specify "-o dax=always/never/inode". So when
>>> is dax=inode is used as default?
>>
>> That means when neither '-o dax' nor '-o dax=always/never/inode' is
>> specified, it is actually equal to '-o dax=inode', which is also how
>> per-file DAX on ext4/xfs works.
>>
>> This default behaviour for local filesystem, e.g. ext4/xfs, may be
>> straightforward, since the disk inode will be read into memory during
>> the inode instantiation, and checking for persistent inode attribute
>> shall be realatively cheap, except that the default behaviour has
>> changed from 'dax=never' to 'dax=inode'.
> 
> Interesting that ext4/xfs allowed for this behavior change.
> 
>>
>> Come back to virtiofs, when neither '-o dax' nor '-o
>> dax=always/never/inode' is specified, and it actually behaves as '-o
>> dax=inode', as long as '-o dax=server/attr' option is not specified for
>> virtiofsd, virtiofsd will always clear FUSE_ATTR_DAX and thus guest will
>> always disable DAX. IOWs, the guest virtiofs atually behaves as '-o
>> dax=never' when neither '-o dax' nor '-o dax=always/never/inode' is
>> specified, and '-o dax=server/attr' option is not specified for virtiofsd.
>>
>> But I'm okay if we need to change the default behaviour for virtiofs.
> 
> This is change of behavior from client's perspective. Even if client
> did not opt-in for DAX, DAX can be enabled based on server's setting.
> Not that there is anything wrong with it, but change of behavior part
> concerns me.
> 
> In case of virtiofs, lot of features we are controlling from server.
> Client typically just calls "mount" and there are not many options
> users can specify for mount.  
> 
> Given we already allowed to make client a choice about DAX behavior,
> I will feel more comfortable that we don't change it and let client
> request a specific DAX mode and if client does not specify anything,
> then DAX is not enabled.
> 

Hi Vivek,

How about the following design about the default behavior to move this
patchset forward, considering the discussion on another thread [1]?

- guest side: '-o dax=inode' acts as the default, keeping consistent
with xfs/ext4
- virtiofsd: the default behavior is like, neither file size based
policy ('dax=server') nor persistent inode flags based policy
('dax=attr') is used, though virtiofsd indeed advertises that
it supports per inode DAX feature (so that it won't fail FUSE_INIT
negotiation phase when guest advertises dax=inode by default)... In
fact, it acts like 'dax=never' in this case.

Then when guest opts-in and specifies '-o dax=inode' manually, then it
shall realize that proper configuration for virtiofsd is also needed (-o
dax=server|attr).

[1] https://www.spinics.net/lists/linux-xfs/msg56642.html

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-29  8:33         ` JeffleXu
@ 2021-10-29 13:03           ` Vivek Goyal
  2021-11-01  8:21             ` JeffleXu
  0 siblings, 1 reply; 37+ messages in thread
From: Vivek Goyal @ 2021-10-29 13:03 UTC (permalink / raw)
  To: JeffleXu; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi

On Fri, Oct 29, 2021 at 04:33:06PM +0800, JeffleXu wrote:
> 
> 
> On 10/20/21 10:48 PM, Vivek Goyal wrote:
> > On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
> >>
> >>
> >> On 10/18/21 10:10 PM, Vivek Goyal wrote:
> >>> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
> >>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> >>>> operate the same which is equivalent to 'always'. To be consistemt with
> >>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> >>>> option is specified, the default behaviour is equal to 'inode'.
> >>>
> >>> Hi Jeffle,
> >>>
> >>> I am not sure when  -o "dax=inode"  is used as a default? If user
> >>> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
> >>> user will explicitly specify "-o dax=always/never/inode". So when
> >>> is dax=inode is used as default?
> >>
> >> That means when neither '-o dax' nor '-o dax=always/never/inode' is
> >> specified, it is actually equal to '-o dax=inode', which is also how
> >> per-file DAX on ext4/xfs works.
> >>
> >> This default behaviour for local filesystem, e.g. ext4/xfs, may be
> >> straightforward, since the disk inode will be read into memory during
> >> the inode instantiation, and checking for persistent inode attribute
> >> shall be realatively cheap, except that the default behaviour has
> >> changed from 'dax=never' to 'dax=inode'.
> > 
> > Interesting that ext4/xfs allowed for this behavior change.
> > 
> >>
> >> Come back to virtiofs, when neither '-o dax' nor '-o
> >> dax=always/never/inode' is specified, and it actually behaves as '-o
> >> dax=inode', as long as '-o dax=server/attr' option is not specified for
> >> virtiofsd, virtiofsd will always clear FUSE_ATTR_DAX and thus guest will
> >> always disable DAX. IOWs, the guest virtiofs atually behaves as '-o
> >> dax=never' when neither '-o dax' nor '-o dax=always/never/inode' is
> >> specified, and '-o dax=server/attr' option is not specified for virtiofsd.
> >>
> >> But I'm okay if we need to change the default behaviour for virtiofs.
> > 
> > This is change of behavior from client's perspective. Even if client
> > did not opt-in for DAX, DAX can be enabled based on server's setting.
> > Not that there is anything wrong with it, but change of behavior part
> > concerns me.
> > 
> > In case of virtiofs, lot of features we are controlling from server.
> > Client typically just calls "mount" and there are not many options
> > users can specify for mount.  
> > 
> > Given we already allowed to make client a choice about DAX behavior,
> > I will feel more comfortable that we don't change it and let client
> > request a specific DAX mode and if client does not specify anything,
> > then DAX is not enabled.
> > 
> 
> Hi Vivek,
> 
> How about the following design about the default behavior to move this
> patchset forward, considering the discussion on another thread [1]?
> 
> - guest side: '-o dax=inode' acts as the default, keeping consistent
> with xfs/ext4

This sounds good.

> - virtiofsd: the default behavior is like, neither file size based
> policy ('dax=server') nor persistent inode flags based policy
> ('dax=attr') is used, though virtiofsd indeed advertises that
> it supports per inode DAX feature (so that it won't fail FUSE_INIT
> negotiation phase when guest advertises dax=inode by default)... In
> fact, it acts like 'dax=never' in this case.

Not sure why virtiofsd needs to advertise that it supports per inode
DAX even if no per inode dax policy is in affect. Guest will know that
server is not supporting per inode DAX. But it will not return an
error to user space (because dax=inode seems to be advisory).

IOW, this is very similar to the case of using dax=inode on a block
device which does not support DAX. No errors and no warnings.

> 
> Then when guest opts-in and specifies '-o dax=inode' manually, then it
> shall realize that proper configuration for virtiofsd is also needed (-o
> dax=server|attr).

I gave some comments w.r.t dax=server naming in your posting. Not sure if
you got a chance to look at it.

Thanks
Vivek

> 
> [1] https://www.spinics.net/lists/linux-xfs/msg56642.html
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 2/7] fuse: make DAX mount option a tri-state
  2021-10-29 13:03           ` Vivek Goyal
@ 2021-11-01  8:21             ` JeffleXu
  0 siblings, 0 replies; 37+ messages in thread
From: JeffleXu @ 2021-11-01  8:21 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, linux-fsdevel, virtio-fs, bo.liu, joseph.qi



On 10/29/21 9:03 PM, Vivek Goyal wrote:
> On Fri, Oct 29, 2021 at 04:33:06PM +0800, JeffleXu wrote:
>>
>>
>> On 10/20/21 10:48 PM, Vivek Goyal wrote:
>>> On Wed, Oct 20, 2021 at 10:52:38AM +0800, JeffleXu wrote:
>>>>
>>>>
>>>> On 10/18/21 10:10 PM, Vivek Goyal wrote:
>>>>> On Mon, Oct 11, 2021 at 11:00:47AM +0800, Jeffle Xu wrote:
>>>>>> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
>>>>>> operate the same which is equivalent to 'always'. To be consistemt with
>>>>>> ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
>>>>>> option is specified, the default behaviour is equal to 'inode'.
>>>>>
>>>>> Hi Jeffle,
>>>>>
>>>>> I am not sure when  -o "dax=inode"  is used as a default? If user
>>>>> specifies, "-o dax" then it is equal to "-o dax=always", otherwise
>>>>> user will explicitly specify "-o dax=always/never/inode". So when
>>>>> is dax=inode is used as default?
>>>>
>>>> That means when neither '-o dax' nor '-o dax=always/never/inode' is
>>>> specified, it is actually equal to '-o dax=inode', which is also how
>>>> per-file DAX on ext4/xfs works.
>>>>
>>>> This default behaviour for local filesystem, e.g. ext4/xfs, may be
>>>> straightforward, since the disk inode will be read into memory during
>>>> the inode instantiation, and checking for persistent inode attribute
>>>> shall be realatively cheap, except that the default behaviour has
>>>> changed from 'dax=never' to 'dax=inode'.
>>>
>>> Interesting that ext4/xfs allowed for this behavior change.
>>>
>>>>
>>>> Come back to virtiofs, when neither '-o dax' nor '-o
>>>> dax=always/never/inode' is specified, and it actually behaves as '-o
>>>> dax=inode', as long as '-o dax=server/attr' option is not specified for
>>>> virtiofsd, virtiofsd will always clear FUSE_ATTR_DAX and thus guest will
>>>> always disable DAX. IOWs, the guest virtiofs atually behaves as '-o
>>>> dax=never' when neither '-o dax' nor '-o dax=always/never/inode' is
>>>> specified, and '-o dax=server/attr' option is not specified for virtiofsd.
>>>>
>>>> But I'm okay if we need to change the default behaviour for virtiofs.
>>>
>>> This is change of behavior from client's perspective. Even if client
>>> did not opt-in for DAX, DAX can be enabled based on server's setting.
>>> Not that there is anything wrong with it, but change of behavior part
>>> concerns me.
>>>
>>> In case of virtiofs, lot of features we are controlling from server.
>>> Client typically just calls "mount" and there are not many options
>>> users can specify for mount.  
>>>
>>> Given we already allowed to make client a choice about DAX behavior,
>>> I will feel more comfortable that we don't change it and let client
>>> request a specific DAX mode and if client does not specify anything,
>>> then DAX is not enabled.
>>>
>>
>> Hi Vivek,
>>
>> How about the following design about the default behavior to move this
>> patchset forward, considering the discussion on another thread [1]?
>>
>> - guest side: '-o dax=inode' acts as the default, keeping consistent
>> with xfs/ext4
> 
> This sounds good.
> 
>> - virtiofsd: the default behavior is like, neither file size based
>> policy ('dax=server') nor persistent inode flags based policy
>> ('dax=attr') is used, though virtiofsd indeed advertises that
>> it supports per inode DAX feature (so that it won't fail FUSE_INIT
>> negotiation phase when guest advertises dax=inode by default)... In
>> fact, it acts like 'dax=never' in this case.
> 
> Not sure why virtiofsd needs to advertise that it supports per inode
> DAX even if no per inode dax policy is in affect. Guest will know that
> server is not supporting per inode DAX. But it will not return an
> error to user space (because dax=inode seems to be advisory).
> 
> IOW, this is very similar to the case of using dax=inode on a block
> device which does not support DAX. No errors and no warnings.

OK. I will adopt this behavior. That is, if virtiofsd is not specified
with 'dax=server|attr' option, it won't advertise support for per inode
DAX in FUSE_INIT either. And then client will fallback to 'dax=never'
even if it is mounted with 'dax=inode'.

> 
>>
>> Then when guest opts-in and specifies '-o dax=inode' manually, then it
>> shall realize that proper configuration for virtiofsd is also needed (-o
>> dax=server|attr).
> 
> I gave some comments w.r.t dax=server naming in your posting. Not sure if
> you got a chance to look at it.
> 
> Thanks
> Vivek
> 
>>
>> [1] https://www.spinics.net/lists/linux-xfs/msg56642.html
>>
>> -- 
>> Thanks,
>> Jeffle
>>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-11-01  8:21 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11  3:00 [PATCH v6 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
2021-10-11  3:00 ` [PATCH v6 1/7] fuse: add fuse_should_enable_dax() helper Jeffle Xu
2021-10-11  3:00 ` [PATCH v6 2/7] fuse: make DAX mount option a tri-state Jeffle Xu
2021-10-18 14:10   ` Vivek Goyal
2021-10-20  2:52     ` JeffleXu
2021-10-20 14:48       ` Vivek Goyal
2021-10-29  8:33         ` JeffleXu
2021-10-29 13:03           ` Vivek Goyal
2021-11-01  8:21             ` JeffleXu
2021-10-20 15:17       ` Vivek Goyal
2021-10-22  6:54         ` JeffleXu
2021-10-25 17:52           ` Ira Weiny
2021-10-25 18:12             ` Vivek Goyal
2021-10-25 19:02               ` Ira Weiny
2021-10-25 19:33                 ` Vivek Goyal
2021-10-25 20:41                   ` Ira Weiny
2021-10-26 13:45                     ` Vivek Goyal
2021-10-27  6:00                 ` JeffleXu
2021-10-11  3:00 ` [PATCH v6 3/7] fuse: support per-file DAX in fuse protocol Jeffle Xu
2021-10-18 14:14   ` Vivek Goyal
2021-10-18 14:20     ` Vivek Goyal
2021-10-20  3:04       ` JeffleXu
2021-10-20 14:54         ` Vivek Goyal
2021-10-11  3:00 ` [PATCH v6 4/7] fuse: negotiate per-file DAX in FUSE_INIT Jeffle Xu
2021-10-18 14:30   ` Vivek Goyal
2021-10-20  3:10     ` JeffleXu
2021-10-20 15:44       ` Vivek Goyal
2021-10-11  3:00 ` [PATCH v6 5/7] fuse: enable per-file DAX Jeffle Xu
2021-10-18 15:11   ` Vivek Goyal
2021-10-11  3:00 ` [PATCH v6 6/7] fuse: mark inode DONT_CACHE when per-file DAX hint changes Jeffle Xu
2021-10-18 15:19   ` Vivek Goyal
2021-10-27  5:05     ` JeffleXu
2021-10-11  3:00 ` [PATCH v6 7/7] Documentation/filesystem/dax: record DAX on virtiofs Jeffle Xu
2021-10-15  3:33 ` [PATCH v6 0/7] fuse,virtiofs: support per-file DAX JeffleXu
2021-10-18 15:21 ` Vivek Goyal
2021-10-20  5:22   ` JeffleXu
2021-10-20 16:06     ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.