All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/7] fuse,virtiofs: support per-file DAX
@ 2021-11-25  7:05 ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

changes since v7:
- rebase to v5.16
- patch 2: rename FUSE_DAX_NONE|FUSE_DAX_INODE to
  FUSE_DAX_INODE_DEFAULT|FUSE_DAX_INODE_USER
- patch 5: remove redundant call for fuse_is_inode_dax_mode() in
  process_init_reply()
- patch 5: if server's map alignment is non-compliant (fail
  fuse_dax_check_alignment()), the mounted fs won't work and users are
  required to remount it explicitly, instead of silently falling back to
  'never' mode.

Corresponding changes to virtiofsd:
https://www.mail-archive.com/virtio-fs@redhat.com/msg04349.html

v7: https://lore.kernel.org/all/c41837f0-a183-d911-885d-cf3bcdd9b7c8@linux.alibaba.com/T/
v6: https://lore.kernel.org/all/20211011030052.98923-1-jefflexu@linux.alibaba.com/
v5: https://lore.kernel.org/all/20210923092526.72341-1-jefflexu@linux.alibaba.com/
v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html

Original Rationale for this Patchset
====================================

This patchset adds support of per-file DAX for virtiofs, which is
inspired by Ira Weiny's work on ext4[1] and xfs[2].

Any comment is welcome.

[1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
[2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")

[Purpose]
DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per-file DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.


[Note]
When the per-file DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state, since dynamic switching needs to switch a_ops atomiclly. Ira
Weiny had ever implemented a so called i_aops_sem lock [3] but
eventually gave up since the complexity of the implementation
[4][5][6][7].

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per-file DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later. This is also how ext4/xfs per-file DAX works.

[3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
[4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
[5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
[6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
[7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/


Jeffle Xu (7):
  fuse: add fuse_should_enable_dax() helper
  fuse: make DAX mount option a tri-state
  fuse: support per inode DAX in fuse protocol
  fuse: enable per inode DAX
  fuse: negotiate per inode DAX in FUSE_INIT
  fuse: mark inode DONT_CACHE when per inode DAX hint changes
  Documentation/filesystem/dax: DAX on virtiofs

 Documentation/filesystems/dax.rst | 20 +++++++++++++++--
 fs/fuse/dax.c                     | 36 +++++++++++++++++++++++++++++--
 fs/fuse/file.c                    |  4 ++--
 fs/fuse/fuse_i.h                  | 28 ++++++++++++++++++++----
 fs/fuse/inode.c                   | 28 +++++++++++++++++-------
 fs/fuse/virtio_fs.c               | 18 +++++++++++++---
 include/uapi/linux/fuse.h         |  5 +++++
 7 files changed, 118 insertions(+), 21 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v8 0/7] fuse,virtiofs: support per-file DAX
@ 2021-11-25  7:05 ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

changes since v7:
- rebase to v5.16
- patch 2: rename FUSE_DAX_NONE|FUSE_DAX_INODE to
  FUSE_DAX_INODE_DEFAULT|FUSE_DAX_INODE_USER
- patch 5: remove redundant call for fuse_is_inode_dax_mode() in
  process_init_reply()
- patch 5: if server's map alignment is non-compliant (fail
  fuse_dax_check_alignment()), the mounted fs won't work and users are
  required to remount it explicitly, instead of silently falling back to
  'never' mode.

Corresponding changes to virtiofsd:
https://www.mail-archive.com/virtio-fs@redhat.com/msg04349.html

v7: https://lore.kernel.org/all/c41837f0-a183-d911-885d-cf3bcdd9b7c8@linux.alibaba.com/T/
v6: https://lore.kernel.org/all/20211011030052.98923-1-jefflexu@linux.alibaba.com/
v5: https://lore.kernel.org/all/20210923092526.72341-1-jefflexu@linux.alibaba.com/
v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html

Original Rationale for this Patchset
====================================

This patchset adds support of per-file DAX for virtiofs, which is
inspired by Ira Weiny's work on ext4[1] and xfs[2].

Any comment is welcome.

[1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
[2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")

[Purpose]
DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per-file DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.


[Note]
When the per-file DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state, since dynamic switching needs to switch a_ops atomiclly. Ira
Weiny had ever implemented a so called i_aops_sem lock [3] but
eventually gave up since the complexity of the implementation
[4][5][6][7].

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per-file DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later. This is also how ext4/xfs per-file DAX works.

[3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
[4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
[5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
[6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
[7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/


Jeffle Xu (7):
  fuse: add fuse_should_enable_dax() helper
  fuse: make DAX mount option a tri-state
  fuse: support per inode DAX in fuse protocol
  fuse: enable per inode DAX
  fuse: negotiate per inode DAX in FUSE_INIT
  fuse: mark inode DONT_CACHE when per inode DAX hint changes
  Documentation/filesystem/dax: DAX on virtiofs

 Documentation/filesystems/dax.rst | 20 +++++++++++++++--
 fs/fuse/dax.c                     | 36 +++++++++++++++++++++++++++++--
 fs/fuse/file.c                    |  4 ++--
 fs/fuse/fuse_i.h                  | 28 ++++++++++++++++++++----
 fs/fuse/inode.c                   | 28 +++++++++++++++++-------
 fs/fuse/virtio_fs.c               | 18 +++++++++++++---
 include/uapi/linux/fuse.h         |  5 +++++
 7 files changed, 118 insertions(+), 21 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v8 1/7] fuse: add fuse_should_enable_dax() helper
  2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
@ 2021-11-25  7:05   ` Jeffle Xu
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

This is in prep for following per inode DAX checking.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 5778ebfbce5e..4c48a57632bd 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1329,11 +1329,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-void fuse_dax_inode_init(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
 	if (!fc->dax)
+		return false;
+
+	return true;
+}
+
+void fuse_dax_inode_init(struct inode *inode)
+{
+	if (!fuse_should_enable_dax(inode))
 		return;
 
 	inode->i_flags |= S_DAX;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v8 1/7] fuse: add fuse_should_enable_dax() helper
@ 2021-11-25  7:05   ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

This is in prep for following per inode DAX checking.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 5778ebfbce5e..4c48a57632bd 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1329,11 +1329,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-void fuse_dax_inode_init(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
 	if (!fc->dax)
+		return false;
+
+	return true;
+}
+
+void fuse_dax_inode_init(struct inode *inode)
+{
+	if (!fuse_should_enable_dax(inode))
 		return;
 
 	inode->i_flags |= S_DAX;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v8 2/7] fuse: make DAX mount option a tri-state
  2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
@ 2021-11-25  7:05   ` Jeffle Xu
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'.

The following behavior is consistent with that on ext4/xfs:
- The default behavior (when neither '-o dax' nor
  '-o dax=always|never|inode' option is specified) is equal to 'inode'
  mode, while 'dax=inode' won't be printed among the mount option list.
- The 'inode' mode is only advisory. It will silently fallback to
  'never' mode if fuse server doesn't support that.

Also noted that by the time of this commit, 'inode' mode is actually
equal to 'always' mode, before the per inode DAX flag is introduced in
the following patch.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c       | 13 ++++++++++++-
 fs/fuse/fuse_i.h    | 20 ++++++++++++++++++--
 fs/fuse/inode.c     | 10 +++++++---
 fs/fuse/virtio_fs.c | 18 +++++++++++++++---
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 4c48a57632bd..b9a031a82934 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1281,11 +1281,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
 	return ret;
 }
 
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
+			struct dax_device *dax_dev)
 {
 	struct fuse_conn_dax *fcd;
 	int err;
 
+	fc->dax_mode = dax_mode;
+
 	if (!dax_dev)
 		return 0;
 
@@ -1332,7 +1335,15 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	enum fuse_dax_mode dax_mode = fc->dax_mode;
+
+	if (dax_mode == FUSE_DAX_NEVER)
+		return false;
 
+	/*
+	 * fc->dax may be NULL in 'inode' mode when filesystem device doesn't
+	 * support DAX, in which case it will silently fallback to 'never' mode.
+	 */
 	if (!fc->dax)
 		return false;
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 198637b41e19..19ded93cfc49 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -480,6 +480,18 @@ struct fuse_dev {
 	struct list_head entry;
 };
 
+enum fuse_dax_mode {
+	FUSE_DAX_INODE_DEFAULT,	/* default */
+	FUSE_DAX_ALWAYS,	/* "-o dax=always" */
+	FUSE_DAX_NEVER,		/* "-o dax=never" */
+	FUSE_DAX_INODE_USER,	/* "-o dax=inode" */
+};
+
+static inline bool fuse_is_inode_dax_mode(enum fuse_dax_mode mode)
+{
+	return mode == FUSE_DAX_INODE_DEFAULT || mode == FUSE_DAX_INODE_USER;
+}
+
 struct fuse_fs_context {
 	int fd;
 	struct file *file;
@@ -497,7 +509,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
-	bool dax:1;
+	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
@@ -802,6 +814,9 @@ struct fuse_conn {
 	struct list_head devices;
 
 #ifdef CONFIG_FUSE_DAX
+	/* Dax mode */
+	enum fuse_dax_mode dax_mode;
+
 	/* Dax specific conn data, non-NULL if DAX is enabled */
 	struct fuse_conn_dax *dax;
 #endif
@@ -1269,7 +1284,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
 ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
 int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
+			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8b89e3ba7df3..4a41e6a73f3f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -767,8 +767,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
 	}
 #ifdef CONFIG_FUSE_DAX
-	if (fc->dax)
-		seq_puts(m, ",dax");
+	if (fc->dax_mode == FUSE_DAX_ALWAYS)
+		seq_puts(m, ",dax=always");
+	else if (fc->dax_mode == FUSE_DAX_NEVER)
+		seq_puts(m, ",dax=never");
+	else if (fc->dax_mode == FUSE_DAX_INODE_USER)
+		seq_puts(m, ",dax=inode");
 #endif
 
 	return 0;
@@ -1514,7 +1518,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	sb->s_subtype = ctx->subtype;
 	ctx->subtype = NULL;
 	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
-		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
+		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
 		if (err)
 			goto err;
 	}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 4cfa4bc1f579..e54dc069587d 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -88,12 +88,21 @@ struct virtio_fs_req_work {
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
+static const struct constant_table dax_param_enums[] = {
+	{"always",	FUSE_DAX_ALWAYS },
+	{"never",	FUSE_DAX_NEVER },
+	{"inode",	FUSE_DAX_INODE_USER },
+	{}
+};
+
 enum {
 	OPT_DAX,
+	OPT_DAX_ENUM,
 };
 
 static const struct fs_parameter_spec virtio_fs_parameters[] = {
 	fsparam_flag("dax", OPT_DAX),
+	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
 	{}
 };
 
@@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
 
 	switch (opt) {
 	case OPT_DAX:
-		ctx->dax = 1;
+		ctx->dax_mode = FUSE_DAX_ALWAYS;
+		break;
+	case OPT_DAX_ENUM:
+		ctx->dax_mode = result.uint_32;
 		break;
 	default:
 		return -EINVAL;
@@ -1326,8 +1338,8 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
 
 	/* virtiofs allocates and installs its own fuse devices */
 	ctx->fudptr = NULL;
-	if (ctx->dax) {
-		if (!fs->dax_dev) {
+	if (ctx->dax_mode != FUSE_DAX_NEVER) {
+		if (ctx->dax_mode == FUSE_DAX_ALWAYS && !fs->dax_dev) {
 			err = -EINVAL;
 			pr_err("virtio-fs: dax can't be enabled as filesystem"
 			       " device does not support it.\n");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v8 2/7] fuse: make DAX mount option a tri-state
@ 2021-11-25  7:05   ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'.

The following behavior is consistent with that on ext4/xfs:
- The default behavior (when neither '-o dax' nor
  '-o dax=always|never|inode' option is specified) is equal to 'inode'
  mode, while 'dax=inode' won't be printed among the mount option list.
- The 'inode' mode is only advisory. It will silently fallback to
  'never' mode if fuse server doesn't support that.

Also noted that by the time of this commit, 'inode' mode is actually
equal to 'always' mode, before the per inode DAX flag is introduced in
the following patch.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c       | 13 ++++++++++++-
 fs/fuse/fuse_i.h    | 20 ++++++++++++++++++--
 fs/fuse/inode.c     | 10 +++++++---
 fs/fuse/virtio_fs.c | 18 +++++++++++++++---
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 4c48a57632bd..b9a031a82934 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1281,11 +1281,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
 	return ret;
 }
 
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
+			struct dax_device *dax_dev)
 {
 	struct fuse_conn_dax *fcd;
 	int err;
 
+	fc->dax_mode = dax_mode;
+
 	if (!dax_dev)
 		return 0;
 
@@ -1332,7 +1335,15 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	enum fuse_dax_mode dax_mode = fc->dax_mode;
+
+	if (dax_mode == FUSE_DAX_NEVER)
+		return false;
 
+	/*
+	 * fc->dax may be NULL in 'inode' mode when filesystem device doesn't
+	 * support DAX, in which case it will silently fallback to 'never' mode.
+	 */
 	if (!fc->dax)
 		return false;
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 198637b41e19..19ded93cfc49 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -480,6 +480,18 @@ struct fuse_dev {
 	struct list_head entry;
 };
 
+enum fuse_dax_mode {
+	FUSE_DAX_INODE_DEFAULT,	/* default */
+	FUSE_DAX_ALWAYS,	/* "-o dax=always" */
+	FUSE_DAX_NEVER,		/* "-o dax=never" */
+	FUSE_DAX_INODE_USER,	/* "-o dax=inode" */
+};
+
+static inline bool fuse_is_inode_dax_mode(enum fuse_dax_mode mode)
+{
+	return mode == FUSE_DAX_INODE_DEFAULT || mode == FUSE_DAX_INODE_USER;
+}
+
 struct fuse_fs_context {
 	int fd;
 	struct file *file;
@@ -497,7 +509,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
-	bool dax:1;
+	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
@@ -802,6 +814,9 @@ struct fuse_conn {
 	struct list_head devices;
 
 #ifdef CONFIG_FUSE_DAX
+	/* Dax mode */
+	enum fuse_dax_mode dax_mode;
+
 	/* Dax specific conn data, non-NULL if DAX is enabled */
 	struct fuse_conn_dax *dax;
 #endif
@@ -1269,7 +1284,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
 ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
 int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
+			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8b89e3ba7df3..4a41e6a73f3f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -767,8 +767,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
 	}
 #ifdef CONFIG_FUSE_DAX
-	if (fc->dax)
-		seq_puts(m, ",dax");
+	if (fc->dax_mode == FUSE_DAX_ALWAYS)
+		seq_puts(m, ",dax=always");
+	else if (fc->dax_mode == FUSE_DAX_NEVER)
+		seq_puts(m, ",dax=never");
+	else if (fc->dax_mode == FUSE_DAX_INODE_USER)
+		seq_puts(m, ",dax=inode");
 #endif
 
 	return 0;
@@ -1514,7 +1518,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	sb->s_subtype = ctx->subtype;
 	ctx->subtype = NULL;
 	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
-		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
+		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
 		if (err)
 			goto err;
 	}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 4cfa4bc1f579..e54dc069587d 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -88,12 +88,21 @@ struct virtio_fs_req_work {
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
+static const struct constant_table dax_param_enums[] = {
+	{"always",	FUSE_DAX_ALWAYS },
+	{"never",	FUSE_DAX_NEVER },
+	{"inode",	FUSE_DAX_INODE_USER },
+	{}
+};
+
 enum {
 	OPT_DAX,
+	OPT_DAX_ENUM,
 };
 
 static const struct fs_parameter_spec virtio_fs_parameters[] = {
 	fsparam_flag("dax", OPT_DAX),
+	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
 	{}
 };
 
@@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
 
 	switch (opt) {
 	case OPT_DAX:
-		ctx->dax = 1;
+		ctx->dax_mode = FUSE_DAX_ALWAYS;
+		break;
+	case OPT_DAX_ENUM:
+		ctx->dax_mode = result.uint_32;
 		break;
 	default:
 		return -EINVAL;
@@ -1326,8 +1338,8 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
 
 	/* virtiofs allocates and installs its own fuse devices */
 	ctx->fudptr = NULL;
-	if (ctx->dax) {
-		if (!fs->dax_dev) {
+	if (ctx->dax_mode != FUSE_DAX_NEVER) {
+		if (ctx->dax_mode == FUSE_DAX_ALWAYS && !fs->dax_dev) {
 			err = -EINVAL;
 			pr_err("virtio-fs: dax can't be enabled as filesystem"
 			       " device does not support it.\n");
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v8 3/7] fuse: support per inode DAX in fuse protocol
  2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
@ 2021-11-25  7:05   ` Jeffle Xu
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

Expand the fuse protocol to support per inode DAX.

FUSE_HAS_INODE_DAX flag is added indicating if fuse server/client
supporting per inode DAX. It can be conveyed in both FUSE_INIT request
and reply.

FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
corresponding file. It is conveyed in FUSE_LOOKUP reply.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/uapi/linux/fuse.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index a1dc3ee1d17c..63a9a963f4d9 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -187,6 +187,7 @@
  *
  *  7.35
  *  - add FOPEN_NOFLUSH
+ *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -341,6 +342,7 @@ struct fuse_file_lock {
  *			write/truncate sgid is killed only if file has group
  *			execute permission. (Same as Linux VFS behavior).
  * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
+ * FUSE_HAS_INODE_DAX:  use per inode DAX
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -372,6 +374,7 @@ struct fuse_file_lock {
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
 #define FUSE_SETXATTR_EXT	(1 << 29)
+#define FUSE_HAS_INODE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -454,8 +457,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per inode DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v8 3/7] fuse: support per inode DAX in fuse protocol
@ 2021-11-25  7:05   ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

Expand the fuse protocol to support per inode DAX.

FUSE_HAS_INODE_DAX flag is added indicating if fuse server/client
supporting per inode DAX. It can be conveyed in both FUSE_INIT request
and reply.

FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
corresponding file. It is conveyed in FUSE_LOOKUP reply.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/uapi/linux/fuse.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index a1dc3ee1d17c..63a9a963f4d9 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -187,6 +187,7 @@
  *
  *  7.35
  *  - add FOPEN_NOFLUSH
+ *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -341,6 +342,7 @@ struct fuse_file_lock {
  *			write/truncate sgid is killed only if file has group
  *			execute permission. (Same as Linux VFS behavior).
  * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
+ * FUSE_HAS_INODE_DAX:  use per inode DAX
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -372,6 +374,7 @@ struct fuse_file_lock {
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
 #define FUSE_SETXATTR_EXT	(1 << 29)
+#define FUSE_HAS_INODE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -454,8 +457,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per inode DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v8 4/7] fuse: enable per inode DAX
  2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
@ 2021-11-25  7:05   ` Jeffle Xu
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per inode DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.

The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
DAX should be enabled or not for corresponding file. Currently the state
whether DAX is enabled or not for the file is initialized only when
inode is instantiated.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 12 ++++++++----
 fs/fuse/file.c   |  4 ++--
 fs/fuse/fuse_i.h |  4 ++--
 fs/fuse/inode.c  |  2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index b9a031a82934..1550c3624414 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1332,7 +1332,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-static bool fuse_should_enable_dax(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	enum fuse_dax_mode dax_mode = fc->dax_mode;
@@ -1347,12 +1347,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
 	if (!fc->dax)
 		return false;
 
-	return true;
+	if (dax_mode == FUSE_DAX_ALWAYS)
+		return true;
+
+	/* dax_mode is FUSE_DAX_INODE* */
+	return flags & FUSE_ATTR_DAX;
 }
 
-void fuse_dax_inode_init(struct inode *inode)
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 {
-	if (!fuse_should_enable_dax(inode))
+	if (!fuse_should_enable_dax(inode, flags))
 		return;
 
 	inode->i_flags |= S_DAX;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 9d6c5f6361f7..90067584e103 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3169,7 +3169,7 @@ static const struct address_space_operations fuse_file_aops  = {
 	.write_end	= fuse_write_end,
 };
 
-void fuse_init_file_inode(struct inode *inode)
+void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 
@@ -3183,5 +3183,5 @@ void fuse_init_file_inode(struct inode *inode)
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
-		fuse_dax_inode_init(inode);
+		fuse_dax_inode_init(inode, flags);
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 19ded93cfc49..f03ea7cb74b0 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1022,7 +1022,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 /**
  * Initialize file operations on a regular file
  */
-void fuse_init_file_inode(struct inode *inode);
+void fuse_init_file_inode(struct inode *inode, unsigned int flags);
 
 /**
  * Initialize inode operations on regular files and special files
@@ -1288,7 +1288,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
 			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
-void fuse_dax_inode_init(struct inode *inode);
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 4a41e6a73f3f..0669e41a9645 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -313,7 +313,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	inode->i_ctime.tv_nsec = attr->ctimensec;
 	if (S_ISREG(inode->i_mode)) {
 		fuse_init_common(inode);
-		fuse_init_file_inode(inode);
+		fuse_init_file_inode(inode, attr->flags);
 	} else if (S_ISDIR(inode->i_mode))
 		fuse_init_dir(inode);
 	else if (S_ISLNK(inode->i_mode))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v8 4/7] fuse: enable per inode DAX
@ 2021-11-25  7:05   ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

DAX may be limited in some specific situation. When the number of usable
DAX windows is under watermark, the recalim routine will be triggered to
reclaim some DAX windows. It may have a negative impact on the
performance, since some processes may need to wait for DAX windows to be
recalimed and reused then. To mitigate the performance degradation, the
overall DAX window need to be expanded larger.

However, simply expanding the DAX window may not be a good deal in some
scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
(512 * 64 bytes) memory footprint will be consumed for page descriptors
inside guest, which is greater than the memory footprint if it uses
guest page cache when DAX disabled. Thus it'd better disable DAX for
those files smaller than 32KB, to reduce the demand for DAX window and
thus avoid the unworthy memory overhead.

Per inode DAX feature is introduced to address this issue, by offering a
finer grained control for dax to users, trying to achieve a balance
between performance and memory overhead.

The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
DAX should be enabled or not for corresponding file. Currently the state
whether DAX is enabled or not for the file is initialized only when
inode is instantiated.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 12 ++++++++----
 fs/fuse/file.c   |  4 ++--
 fs/fuse/fuse_i.h |  4 ++--
 fs/fuse/inode.c  |  2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index b9a031a82934..1550c3624414 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1332,7 +1332,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-static bool fuse_should_enable_dax(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	enum fuse_dax_mode dax_mode = fc->dax_mode;
@@ -1347,12 +1347,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
 	if (!fc->dax)
 		return false;
 
-	return true;
+	if (dax_mode == FUSE_DAX_ALWAYS)
+		return true;
+
+	/* dax_mode is FUSE_DAX_INODE* */
+	return flags & FUSE_ATTR_DAX;
 }
 
-void fuse_dax_inode_init(struct inode *inode)
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 {
-	if (!fuse_should_enable_dax(inode))
+	if (!fuse_should_enable_dax(inode, flags))
 		return;
 
 	inode->i_flags |= S_DAX;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 9d6c5f6361f7..90067584e103 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3169,7 +3169,7 @@ static const struct address_space_operations fuse_file_aops  = {
 	.write_end	= fuse_write_end,
 };
 
-void fuse_init_file_inode(struct inode *inode)
+void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 
@@ -3183,5 +3183,5 @@ void fuse_init_file_inode(struct inode *inode)
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
-		fuse_dax_inode_init(inode);
+		fuse_dax_inode_init(inode, flags);
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 19ded93cfc49..f03ea7cb74b0 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1022,7 +1022,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 /**
  * Initialize file operations on a regular file
  */
-void fuse_init_file_inode(struct inode *inode);
+void fuse_init_file_inode(struct inode *inode, unsigned int flags);
 
 /**
  * Initialize inode operations on regular files and special files
@@ -1288,7 +1288,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
 			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
-void fuse_dax_inode_init(struct inode *inode);
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 4a41e6a73f3f..0669e41a9645 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -313,7 +313,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	inode->i_ctime.tv_nsec = attr->ctimensec;
 	if (S_ISREG(inode->i_mode)) {
 		fuse_init_common(inode);
-		fuse_init_file_inode(inode);
+		fuse_init_file_inode(inode, attr->flags);
 	} else if (S_ISDIR(inode->i_mode))
 		fuse_init_dir(inode);
 	else if (S_ISLNK(inode->i_mode))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v8 5/7] fuse: negotiate per inode DAX in FUSE_INIT
  2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
@ 2021-11-25  7:05   ` Jeffle Xu
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

Among the FUSE_INIT phase, client shall advertise per inode DAX if it's
mounted with "dax=inode". Then server is aware that client is in per
inode DAX mode, and will construct per-inode DAX attribute accordingly.

Server shall also advertise support for per inode DAX. If server doesn't
support it while client is mounted with "dax=inode", client will
silently fallback to "dax=never" since "dax=inode" is advisory only.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    |  2 +-
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/inode.c  | 13 +++++++++----
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 1550c3624414..234c2278420f 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1351,7 +1351,7 @@ static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 		return true;
 
 	/* dax_mode is FUSE_DAX_INODE* */
-	return flags & FUSE_ATTR_DAX;
+	return fc->inode_dax && (flags & FUSE_ATTR_DAX);
 }
 
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f03ea7cb74b0..83970723314a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -777,6 +777,9 @@ struct fuse_conn {
 	/* Propagate syncfs() to server */
 	unsigned int sync_fs:1;
 
+	/* Does the filesystem support per inode DAX? */
+	unsigned int inode_dax:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0669e41a9645..b26612fce6d0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1169,10 +1169,13 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 					min_t(unsigned int, fc->max_pages_limit,
 					max_t(unsigned int, arg->max_pages, 1));
 			}
-			if (IS_ENABLED(CONFIG_FUSE_DAX) &&
-			    arg->flags & FUSE_MAP_ALIGNMENT &&
-			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
-				ok = false;
+			if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+				if (arg->flags & FUSE_MAP_ALIGNMENT &&
+				    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
+					ok = false;
+				}
+				if (arg->flags & FUSE_HAS_INODE_DAX)
+					fc->inode_dax = 1;
 			}
 			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
 				fc->handle_killpriv_v2 = 1;
@@ -1227,6 +1230,8 @@ void fuse_send_init(struct fuse_mount *fm)
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		ia->in.flags |= FUSE_MAP_ALIGNMENT;
+	if (fuse_is_inode_dax_mode(fm->fc->dax_mode))
+		ia->in.flags |= FUSE_HAS_INODE_DAX;
 #endif
 	if (fm->fc->auto_submounts)
 		ia->in.flags |= FUSE_SUBMOUNTS;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v8 5/7] fuse: negotiate per inode DAX in FUSE_INIT
@ 2021-11-25  7:05   ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

Among the FUSE_INIT phase, client shall advertise per inode DAX if it's
mounted with "dax=inode". Then server is aware that client is in per
inode DAX mode, and will construct per-inode DAX attribute accordingly.

Server shall also advertise support for per inode DAX. If server doesn't
support it while client is mounted with "dax=inode", client will
silently fallback to "dax=never" since "dax=inode" is advisory only.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    |  2 +-
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/inode.c  | 13 +++++++++----
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 1550c3624414..234c2278420f 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1351,7 +1351,7 @@ static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 		return true;
 
 	/* dax_mode is FUSE_DAX_INODE* */
-	return flags & FUSE_ATTR_DAX;
+	return fc->inode_dax && (flags & FUSE_ATTR_DAX);
 }
 
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f03ea7cb74b0..83970723314a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -777,6 +777,9 @@ struct fuse_conn {
 	/* Propagate syncfs() to server */
 	unsigned int sync_fs:1;
 
+	/* Does the filesystem support per inode DAX? */
+	unsigned int inode_dax:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0669e41a9645..b26612fce6d0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1169,10 +1169,13 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 					min_t(unsigned int, fc->max_pages_limit,
 					max_t(unsigned int, arg->max_pages, 1));
 			}
-			if (IS_ENABLED(CONFIG_FUSE_DAX) &&
-			    arg->flags & FUSE_MAP_ALIGNMENT &&
-			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
-				ok = false;
+			if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+				if (arg->flags & FUSE_MAP_ALIGNMENT &&
+				    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
+					ok = false;
+				}
+				if (arg->flags & FUSE_HAS_INODE_DAX)
+					fc->inode_dax = 1;
 			}
 			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
 				fc->handle_killpriv_v2 = 1;
@@ -1227,6 +1230,8 @@ void fuse_send_init(struct fuse_mount *fm)
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		ia->in.flags |= FUSE_MAP_ALIGNMENT;
+	if (fuse_is_inode_dax_mode(fm->fc->dax_mode))
+		ia->in.flags |= FUSE_HAS_INODE_DAX;
 #endif
 	if (fm->fc->auto_submounts)
 		ia->in.flags |= FUSE_SUBMOUNTS;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes
  2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
@ 2021-11-25  7:05   ` Jeffle Xu
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

When the per inode DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per inode DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per inode DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 9 +++++++++
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 234c2278420f..b19e7eaed4ef 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1363,6 +1363,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 	inode->i_data.a_ops = &fuse_dax_file_aops;
 }
 
+void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
+	if (fuse_is_inode_dax_mode(fc->dax_mode) &&
+	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
+		d_mark_dontcache(inode);
+}
+
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
 {
 	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 83970723314a..af19d1d821ea 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1293,6 +1293,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b26612fce6d0..b25d99eb8411 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -301,6 +301,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 		if (inval)
 			invalidate_inode_pages2(inode->i_mapping);
 	}
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX))
+		fuse_dax_dontcache(inode, attr->flags);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes
@ 2021-11-25  7:05   ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

When the per inode DAX hint changes while the file is still *opened*, it
is quite complicated and maybe fragile to dynamically change the DAX
state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per inode DAX hint changes, so that the inode instance will be evicted
and freed as soon as possible once the file is closed and the last
reference to the inode is put. And then when the file gets reopened next
time, the new instantiated inode will reflect the new DAX state.

In summary, when the per inode DAX hint changes for an *opened* file, the
DAX state of the file won't be updated until this file is closed and
reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 9 +++++++++
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 234c2278420f..b19e7eaed4ef 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1363,6 +1363,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 	inode->i_data.a_ops = &fuse_dax_file_aops;
 }
 
+void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
+	if (fuse_is_inode_dax_mode(fc->dax_mode) &&
+	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
+		d_mark_dontcache(inode);
+}
+
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
 {
 	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 83970723314a..af19d1d821ea 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1293,6 +1293,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b26612fce6d0..b25d99eb8411 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -301,6 +301,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 		if (inval)
 			invalidate_inode_pages2(inode->i_mapping);
 	}
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX))
+		fuse_dax_dontcache(inode, attr->flags);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v8 7/7] Documentation/filesystem/dax: DAX on virtiofs
  2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
@ 2021-11-25  7:05   ` Jeffle Xu
  -1 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

Record DAX on virtiofs and the semantic difference with that on ext4
and xfs.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 Documentation/filesystems/dax.rst | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/dax.rst b/Documentation/filesystems/dax.rst
index 9a1b8fd9e82b..e3b30429d703 100644
--- a/Documentation/filesystems/dax.rst
+++ b/Documentation/filesystems/dax.rst
@@ -23,8 +23,8 @@ on it as usual.  The `DAX` code currently only supports files with a block
 size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block
 size when creating the filesystem.
 
-Currently 3 filesystems support `DAX`: ext2, ext4 and xfs.  Enabling `DAX` on them
-is different.
+Currently 4 filesystems support `DAX`: ext2, ext4, xfs and virtiofs.
+Enabling `DAX` on them is different.
 
 Enabling DAX on ext2
 --------------------
@@ -168,6 +168,22 @@ if the underlying media does not support dax and/or the filesystem is
 overridden with a mount option.
 
 
+Enabling DAX on virtiofs
+----------------------------
+The semantic of DAX on virtiofs is basically equal to that on ext4 and xfs,
+except that when '-o dax=inode' is specified, virtiofs client derives the hint
+whether DAX shall be enabled or not from virtiofs server through FUSE protocol,
+rather than the persistent `FS_XFLAG_DAX` flag. That is, whether DAX shall be
+enabled or not is completely determined by virtiofs server, while virtiofs
+server itself may deploy various algorithm making this decision, e.g. depending
+on the persistent `FS_XFLAG_DAX` flag on the host.
+
+It is still supported to set or clear persistent `FS_XFLAG_DAX` flag inside
+guest, but it is not guaranteed that DAX will be enabled or disabled for
+corresponding file then. Users inside guest still need to call statx(2) and
+check the statx flag `STATX_ATTR_DAX` to see if DAX is enabled for this file.
+
+
 Implementation Tips for Block Driver Writers
 --------------------------------------------
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [Virtio-fs] [PATCH v8 7/7] Documentation/filesystem/dax: DAX on virtiofs
@ 2021-11-25  7:05   ` Jeffle Xu
  0 siblings, 0 replies; 36+ messages in thread
From: Jeffle Xu @ 2021-11-25  7:05 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos; +Cc: virtio-fs, linux-fsdevel, joseph.qi

Record DAX on virtiofs and the semantic difference with that on ext4
and xfs.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 Documentation/filesystems/dax.rst | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/dax.rst b/Documentation/filesystems/dax.rst
index 9a1b8fd9e82b..e3b30429d703 100644
--- a/Documentation/filesystems/dax.rst
+++ b/Documentation/filesystems/dax.rst
@@ -23,8 +23,8 @@ on it as usual.  The `DAX` code currently only supports files with a block
 size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block
 size when creating the filesystem.
 
-Currently 3 filesystems support `DAX`: ext2, ext4 and xfs.  Enabling `DAX` on them
-is different.
+Currently 4 filesystems support `DAX`: ext2, ext4, xfs and virtiofs.
+Enabling `DAX` on them is different.
 
 Enabling DAX on ext2
 --------------------
@@ -168,6 +168,22 @@ if the underlying media does not support dax and/or the filesystem is
 overridden with a mount option.
 
 
+Enabling DAX on virtiofs
+----------------------------
+The semantic of DAX on virtiofs is basically equal to that on ext4 and xfs,
+except that when '-o dax=inode' is specified, virtiofs client derives the hint
+whether DAX shall be enabled or not from virtiofs server through FUSE protocol,
+rather than the persistent `FS_XFLAG_DAX` flag. That is, whether DAX shall be
+enabled or not is completely determined by virtiofs server, while virtiofs
+server itself may deploy various algorithm making this decision, e.g. depending
+on the persistent `FS_XFLAG_DAX` flag on the host.
+
+It is still supported to set or clear persistent `FS_XFLAG_DAX` flag inside
+guest, but it is not guaranteed that DAX will be enabled or disabled for
+corresponding file then. Users inside guest still need to call statx(2) and
+check the statx flag `STATX_ATTR_DAX` to see if DAX is enabled for this file.
+
+
 Implementation Tips for Block Driver Writers
 --------------------------------------------
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes
  2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
@ 2021-12-07 16:00     ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-07 16:00 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:29PM +0800, Jeffle Xu wrote:
> When the per inode DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per inode DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per inode DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later.

Is this true for cache=always mode as well? If inode continues to be
cached in guest cache, I am assuming we will not refresh inode attrs
due to cache=always mode and will not get new DAX state from server.

IOW, changing DAX state of file is combination of two things.

A. Client comes to know about it using GETATTR.
B. Once client knows about updated state, it will take affect when
   existing inode is released and new inode is instantiated.

And step A might take time depending on cache mode as well as when
is GETATTR actually initiated by the client. Its not deterministic
and can't be guaranteed. Right?

Vivek

> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 234c2278420f..b19e7eaed4ef 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1363,6 +1363,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +	if (fuse_is_inode_dax_mode(fc->dax_mode) &&
> +	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
> +		d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>  {
>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 83970723314a..af19d1d821ea 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1293,6 +1293,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b26612fce6d0..b25d99eb8411 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -301,6 +301,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>  		if (inval)
>  			invalidate_inode_pages2(inode->i_mapping);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
> +		fuse_dax_dontcache(inode, attr->flags);
>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes
@ 2021-12-07 16:00     ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-07 16:00 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:29PM +0800, Jeffle Xu wrote:
> When the per inode DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per inode DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per inode DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later.

Is this true for cache=always mode as well? If inode continues to be
cached in guest cache, I am assuming we will not refresh inode attrs
due to cache=always mode and will not get new DAX state from server.

IOW, changing DAX state of file is combination of two things.

A. Client comes to know about it using GETATTR.
B. Once client knows about updated state, it will take affect when
   existing inode is released and new inode is instantiated.

And step A might take time depending on cache mode as well as when
is GETATTR actually initiated by the client. Its not deterministic
and can't be guaranteed. Right?

Vivek

> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 234c2278420f..b19e7eaed4ef 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1363,6 +1363,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +	if (fuse_is_inode_dax_mode(fc->dax_mode) &&
> +	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
> +		d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>  {
>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 83970723314a..af19d1d821ea 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1293,6 +1293,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b26612fce6d0..b25d99eb8411 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -301,6 +301,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>  		if (inval)
>  			invalidate_inode_pages2(inode->i_mapping);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
> +		fuse_dax_dontcache(inode, attr->flags);
>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes
  2021-12-07 16:00     ` [Virtio-fs] " Vivek Goyal
@ 2021-12-08  1:36       ` JeffleXu
  -1 siblings, 0 replies; 36+ messages in thread
From: JeffleXu @ 2021-12-08  1:36 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi



On 12/8/21 12:00 AM, Vivek Goyal wrote:
> On Thu, Nov 25, 2021 at 03:05:29PM +0800, Jeffle Xu wrote:
>> When the per inode DAX hint changes while the file is still *opened*, it
>> is quite complicated and maybe fragile to dynamically change the DAX
>> state.
>>
>> Hence mark the inode and corresponding dentries as DONE_CACHE once the
>> per inode DAX hint changes, so that the inode instance will be evicted
>> and freed as soon as possible once the file is closed and the last
>> reference to the inode is put. And then when the file gets reopened next
>> time, the new instantiated inode will reflect the new DAX state.
>>
>> In summary, when the per inode DAX hint changes for an *opened* file, the
>> DAX state of the file won't be updated until this file is closed and
>> reopened later.
> 
> Is this true for cache=always mode as well? If inode continues to be
> cached in guest cache, I am assuming we will not refresh inode attrs
> due to cache=always mode and will not get new DAX state from server.
> 
> IOW, changing DAX state of file is combination of two things.
> 
> A. Client comes to know about it using GETATTR.
> B. Once client knows about updated state, it will take affect when
>    existing inode is released and new inode is instantiated.
> 
> And step A might take time depending on cache mode as well as when
> is GETATTR actually initiated by the client. Its not deterministic
> and can't be guaranteed. Right?
> 

Right.

If it is virtiofsd that determines to *change* the DAX state, then guest
kernel has no way knowing that. If the notify queue is ready, maybe a
notify event can be sent to guest kernel to notify that the DAX state is
to be changed, if it is really needed later.

If it is the user inside guest that changes the DAX state, e.g. by
SETFLAGS ioctl, then we can mark the inode/dentry DONTCACHE following
the SETFLAGS ioctl, though it's not included in this patch.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes
@ 2021-12-08  1:36       ` JeffleXu
  0 siblings, 0 replies; 36+ messages in thread
From: JeffleXu @ 2021-12-08  1:36 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos



On 12/8/21 12:00 AM, Vivek Goyal wrote:
> On Thu, Nov 25, 2021 at 03:05:29PM +0800, Jeffle Xu wrote:
>> When the per inode DAX hint changes while the file is still *opened*, it
>> is quite complicated and maybe fragile to dynamically change the DAX
>> state.
>>
>> Hence mark the inode and corresponding dentries as DONE_CACHE once the
>> per inode DAX hint changes, so that the inode instance will be evicted
>> and freed as soon as possible once the file is closed and the last
>> reference to the inode is put. And then when the file gets reopened next
>> time, the new instantiated inode will reflect the new DAX state.
>>
>> In summary, when the per inode DAX hint changes for an *opened* file, the
>> DAX state of the file won't be updated until this file is closed and
>> reopened later.
> 
> Is this true for cache=always mode as well? If inode continues to be
> cached in guest cache, I am assuming we will not refresh inode attrs
> due to cache=always mode and will not get new DAX state from server.
> 
> IOW, changing DAX state of file is combination of two things.
> 
> A. Client comes to know about it using GETATTR.
> B. Once client knows about updated state, it will take affect when
>    existing inode is released and new inode is instantiated.
> 
> And step A might take time depending on cache mode as well as when
> is GETATTR actually initiated by the client. Its not deterministic
> and can't be guaranteed. Right?
> 

Right.

If it is virtiofsd that determines to *change* the DAX state, then guest
kernel has no way knowing that. If the notify queue is ready, maybe a
notify event can be sent to guest kernel to notify that the DAX state is
to be changed, if it is really needed later.

If it is the user inside guest that changes the DAX state, e.g. by
SETFLAGS ioctl, then we can mark the inode/dentry DONTCACHE following
the SETFLAGS ioctl, though it's not included in this patch.

-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 1/7] fuse: add fuse_should_enable_dax() helper
  2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
@ 2021-12-13 18:08     ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:08 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:24PM +0800, Jeffle Xu wrote:
> This is in prep for following per inode DAX checking.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek
> ---
>  fs/fuse/dax.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 5778ebfbce5e..4c48a57632bd 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1329,11 +1329,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  	.invalidatepage	= noop_invalidatepage,
>  };
>  
> -void fuse_dax_inode_init(struct inode *inode)
> +static bool fuse_should_enable_dax(struct inode *inode)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
>  
>  	if (!fc->dax)
> +		return false;
> +
> +	return true;
> +}
> +
> +void fuse_dax_inode_init(struct inode *inode)
> +{
> +	if (!fuse_should_enable_dax(inode))
>  		return;
>  
>  	inode->i_flags |= S_DAX;
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 1/7] fuse: add fuse_should_enable_dax() helper
@ 2021-12-13 18:08     ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:08 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:24PM +0800, Jeffle Xu wrote:
> This is in prep for following per inode DAX checking.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek
> ---
>  fs/fuse/dax.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 5778ebfbce5e..4c48a57632bd 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1329,11 +1329,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  	.invalidatepage	= noop_invalidatepage,
>  };
>  
> -void fuse_dax_inode_init(struct inode *inode)
> +static bool fuse_should_enable_dax(struct inode *inode)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
>  
>  	if (!fc->dax)
> +		return false;
> +
> +	return true;
> +}
> +
> +void fuse_dax_inode_init(struct inode *inode)
> +{
> +	if (!fuse_should_enable_dax(inode))
>  		return;
>  
>  	inode->i_flags |= S_DAX;
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 2/7] fuse: make DAX mount option a tri-state
  2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
@ 2021-12-13 18:09     ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:09 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:25PM +0800, Jeffle Xu wrote:
> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> operate the same which is equivalent to 'always'.
> 
> The following behavior is consistent with that on ext4/xfs:
> - The default behavior (when neither '-o dax' nor
>   '-o dax=always|never|inode' option is specified) is equal to 'inode'
>   mode, while 'dax=inode' won't be printed among the mount option list.
> - The 'inode' mode is only advisory. It will silently fallback to
>   'never' mode if fuse server doesn't support that.
> 
> Also noted that by the time of this commit, 'inode' mode is actually
> equal to 'always' mode, before the per inode DAX flag is introduced in
> the following patch.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
>  fs/fuse/dax.c       | 13 ++++++++++++-
>  fs/fuse/fuse_i.h    | 20 ++++++++++++++++++--
>  fs/fuse/inode.c     | 10 +++++++---
>  fs/fuse/virtio_fs.c | 18 +++++++++++++++---
>  4 files changed, 52 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 4c48a57632bd..b9a031a82934 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1281,11 +1281,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
>  	return ret;
>  }
>  
> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
> +			struct dax_device *dax_dev)
>  {
>  	struct fuse_conn_dax *fcd;
>  	int err;
>  
> +	fc->dax_mode = dax_mode;
> +
>  	if (!dax_dev)
>  		return 0;
>  
> @@ -1332,7 +1335,15 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  static bool fuse_should_enable_dax(struct inode *inode)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
> +	enum fuse_dax_mode dax_mode = fc->dax_mode;
> +
> +	if (dax_mode == FUSE_DAX_NEVER)
> +		return false;
>  
> +	/*
> +	 * fc->dax may be NULL in 'inode' mode when filesystem device doesn't
> +	 * support DAX, in which case it will silently fallback to 'never' mode.
> +	 */
>  	if (!fc->dax)
>  		return false;
>  
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 198637b41e19..19ded93cfc49 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -480,6 +480,18 @@ struct fuse_dev {
>  	struct list_head entry;
>  };
>  
> +enum fuse_dax_mode {
> +	FUSE_DAX_INODE_DEFAULT,	/* default */
> +	FUSE_DAX_ALWAYS,	/* "-o dax=always" */
> +	FUSE_DAX_NEVER,		/* "-o dax=never" */
> +	FUSE_DAX_INODE_USER,	/* "-o dax=inode" */
> +};
> +
> +static inline bool fuse_is_inode_dax_mode(enum fuse_dax_mode mode)
> +{
> +	return mode == FUSE_DAX_INODE_DEFAULT || mode == FUSE_DAX_INODE_USER;
> +}
> +
>  struct fuse_fs_context {
>  	int fd;
>  	struct file *file;
> @@ -497,7 +509,7 @@ struct fuse_fs_context {
>  	bool no_control:1;
>  	bool no_force_umount:1;
>  	bool legacy_opts_show:1;
> -	bool dax:1;
> +	enum fuse_dax_mode dax_mode;
>  	unsigned int max_read;
>  	unsigned int blksize;
>  	const char *subtype;
> @@ -802,6 +814,9 @@ struct fuse_conn {
>  	struct list_head devices;
>  
>  #ifdef CONFIG_FUSE_DAX
> +	/* Dax mode */
> +	enum fuse_dax_mode dax_mode;
> +
>  	/* Dax specific conn data, non-NULL if DAX is enabled */
>  	struct fuse_conn_dax *dax;
>  #endif
> @@ -1269,7 +1284,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
>  ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
>  int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
>  int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
> +			struct dax_device *dax_dev);
>  void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 8b89e3ba7df3..4a41e6a73f3f 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -767,8 +767,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>  			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
>  	}
>  #ifdef CONFIG_FUSE_DAX
> -	if (fc->dax)
> -		seq_puts(m, ",dax");
> +	if (fc->dax_mode == FUSE_DAX_ALWAYS)
> +		seq_puts(m, ",dax=always");
> +	else if (fc->dax_mode == FUSE_DAX_NEVER)
> +		seq_puts(m, ",dax=never");
> +	else if (fc->dax_mode == FUSE_DAX_INODE_USER)
> +		seq_puts(m, ",dax=inode");
>  #endif
>  
>  	return 0;
> @@ -1514,7 +1518,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  	sb->s_subtype = ctx->subtype;
>  	ctx->subtype = NULL;
>  	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
> -		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
> +		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
>  		if (err)
>  			goto err;
>  	}
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 4cfa4bc1f579..e54dc069587d 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -88,12 +88,21 @@ struct virtio_fs_req_work {
>  static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>  				 struct fuse_req *req, bool in_flight);
>  
> +static const struct constant_table dax_param_enums[] = {
> +	{"always",	FUSE_DAX_ALWAYS },
> +	{"never",	FUSE_DAX_NEVER },
> +	{"inode",	FUSE_DAX_INODE_USER },
> +	{}
> +};
> +
>  enum {
>  	OPT_DAX,
> +	OPT_DAX_ENUM,
>  };
>  
>  static const struct fs_parameter_spec virtio_fs_parameters[] = {
>  	fsparam_flag("dax", OPT_DAX),
> +	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
>  	{}
>  };
>  
> @@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
>  
>  	switch (opt) {
>  	case OPT_DAX:
> -		ctx->dax = 1;
> +		ctx->dax_mode = FUSE_DAX_ALWAYS;
> +		break;
> +	case OPT_DAX_ENUM:
> +		ctx->dax_mode = result.uint_32;
>  		break;
>  	default:
>  		return -EINVAL;
> @@ -1326,8 +1338,8 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
>  
>  	/* virtiofs allocates and installs its own fuse devices */
>  	ctx->fudptr = NULL;
> -	if (ctx->dax) {
> -		if (!fs->dax_dev) {
> +	if (ctx->dax_mode != FUSE_DAX_NEVER) {
> +		if (ctx->dax_mode == FUSE_DAX_ALWAYS && !fs->dax_dev) {
>  			err = -EINVAL;
>  			pr_err("virtio-fs: dax can't be enabled as filesystem"
>  			       " device does not support it.\n");
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 2/7] fuse: make DAX mount option a tri-state
@ 2021-12-13 18:09     ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:09 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:25PM +0800, Jeffle Xu wrote:
> We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> operate the same which is equivalent to 'always'.
> 
> The following behavior is consistent with that on ext4/xfs:
> - The default behavior (when neither '-o dax' nor
>   '-o dax=always|never|inode' option is specified) is equal to 'inode'
>   mode, while 'dax=inode' won't be printed among the mount option list.
> - The 'inode' mode is only advisory. It will silently fallback to
>   'never' mode if fuse server doesn't support that.
> 
> Also noted that by the time of this commit, 'inode' mode is actually
> equal to 'always' mode, before the per inode DAX flag is introduced in
> the following patch.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
>  fs/fuse/dax.c       | 13 ++++++++++++-
>  fs/fuse/fuse_i.h    | 20 ++++++++++++++++++--
>  fs/fuse/inode.c     | 10 +++++++---
>  fs/fuse/virtio_fs.c | 18 +++++++++++++++---
>  4 files changed, 52 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 4c48a57632bd..b9a031a82934 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1281,11 +1281,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
>  	return ret;
>  }
>  
> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
> +			struct dax_device *dax_dev)
>  {
>  	struct fuse_conn_dax *fcd;
>  	int err;
>  
> +	fc->dax_mode = dax_mode;
> +
>  	if (!dax_dev)
>  		return 0;
>  
> @@ -1332,7 +1335,15 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  static bool fuse_should_enable_dax(struct inode *inode)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
> +	enum fuse_dax_mode dax_mode = fc->dax_mode;
> +
> +	if (dax_mode == FUSE_DAX_NEVER)
> +		return false;
>  
> +	/*
> +	 * fc->dax may be NULL in 'inode' mode when filesystem device doesn't
> +	 * support DAX, in which case it will silently fallback to 'never' mode.
> +	 */
>  	if (!fc->dax)
>  		return false;
>  
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 198637b41e19..19ded93cfc49 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -480,6 +480,18 @@ struct fuse_dev {
>  	struct list_head entry;
>  };
>  
> +enum fuse_dax_mode {
> +	FUSE_DAX_INODE_DEFAULT,	/* default */
> +	FUSE_DAX_ALWAYS,	/* "-o dax=always" */
> +	FUSE_DAX_NEVER,		/* "-o dax=never" */
> +	FUSE_DAX_INODE_USER,	/* "-o dax=inode" */
> +};
> +
> +static inline bool fuse_is_inode_dax_mode(enum fuse_dax_mode mode)
> +{
> +	return mode == FUSE_DAX_INODE_DEFAULT || mode == FUSE_DAX_INODE_USER;
> +}
> +
>  struct fuse_fs_context {
>  	int fd;
>  	struct file *file;
> @@ -497,7 +509,7 @@ struct fuse_fs_context {
>  	bool no_control:1;
>  	bool no_force_umount:1;
>  	bool legacy_opts_show:1;
> -	bool dax:1;
> +	enum fuse_dax_mode dax_mode;
>  	unsigned int max_read;
>  	unsigned int blksize;
>  	const char *subtype;
> @@ -802,6 +814,9 @@ struct fuse_conn {
>  	struct list_head devices;
>  
>  #ifdef CONFIG_FUSE_DAX
> +	/* Dax mode */
> +	enum fuse_dax_mode dax_mode;
> +
>  	/* Dax specific conn data, non-NULL if DAX is enabled */
>  	struct fuse_conn_dax *dax;
>  #endif
> @@ -1269,7 +1284,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
>  ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
>  int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
>  int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
> -int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
> +int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
> +			struct dax_device *dax_dev);
>  void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 8b89e3ba7df3..4a41e6a73f3f 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -767,8 +767,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>  			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
>  	}
>  #ifdef CONFIG_FUSE_DAX
> -	if (fc->dax)
> -		seq_puts(m, ",dax");
> +	if (fc->dax_mode == FUSE_DAX_ALWAYS)
> +		seq_puts(m, ",dax=always");
> +	else if (fc->dax_mode == FUSE_DAX_NEVER)
> +		seq_puts(m, ",dax=never");
> +	else if (fc->dax_mode == FUSE_DAX_INODE_USER)
> +		seq_puts(m, ",dax=inode");
>  #endif
>  
>  	return 0;
> @@ -1514,7 +1518,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  	sb->s_subtype = ctx->subtype;
>  	ctx->subtype = NULL;
>  	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
> -		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
> +		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
>  		if (err)
>  			goto err;
>  	}
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 4cfa4bc1f579..e54dc069587d 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -88,12 +88,21 @@ struct virtio_fs_req_work {
>  static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>  				 struct fuse_req *req, bool in_flight);
>  
> +static const struct constant_table dax_param_enums[] = {
> +	{"always",	FUSE_DAX_ALWAYS },
> +	{"never",	FUSE_DAX_NEVER },
> +	{"inode",	FUSE_DAX_INODE_USER },
> +	{}
> +};
> +
>  enum {
>  	OPT_DAX,
> +	OPT_DAX_ENUM,
>  };
>  
>  static const struct fs_parameter_spec virtio_fs_parameters[] = {
>  	fsparam_flag("dax", OPT_DAX),
> +	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
>  	{}
>  };
>  
> @@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fsc,
>  
>  	switch (opt) {
>  	case OPT_DAX:
> -		ctx->dax = 1;
> +		ctx->dax_mode = FUSE_DAX_ALWAYS;
> +		break;
> +	case OPT_DAX_ENUM:
> +		ctx->dax_mode = result.uint_32;
>  		break;
>  	default:
>  		return -EINVAL;
> @@ -1326,8 +1338,8 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
>  
>  	/* virtiofs allocates and installs its own fuse devices */
>  	ctx->fudptr = NULL;
> -	if (ctx->dax) {
> -		if (!fs->dax_dev) {
> +	if (ctx->dax_mode != FUSE_DAX_NEVER) {
> +		if (ctx->dax_mode == FUSE_DAX_ALWAYS && !fs->dax_dev) {
>  			err = -EINVAL;
>  			pr_err("virtio-fs: dax can't be enabled as filesystem"
>  			       " device does not support it.\n");
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 3/7] fuse: support per inode DAX in fuse protocol
  2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
@ 2021-12-13 18:09     ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:09 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:26PM +0800, Jeffle Xu wrote:
> Expand the fuse protocol to support per inode DAX.
> 
> FUSE_HAS_INODE_DAX flag is added indicating if fuse server/client
> supporting per inode DAX. It can be conveyed in both FUSE_INIT request
> and reply.
> 
> FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
> corresponding file. It is conveyed in FUSE_LOOKUP reply.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek
> ---
>  include/uapi/linux/fuse.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index a1dc3ee1d17c..63a9a963f4d9 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -187,6 +187,7 @@
>   *
>   *  7.35
>   *  - add FOPEN_NOFLUSH
> + *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
>   */
>  
>  #ifndef _LINUX_FUSE_H
> @@ -341,6 +342,7 @@ struct fuse_file_lock {
>   *			write/truncate sgid is killed only if file has group
>   *			execute permission. (Same as Linux VFS behavior).
>   * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
> + * FUSE_HAS_INODE_DAX:  use per inode DAX
>   */
>  #define FUSE_ASYNC_READ		(1 << 0)
>  #define FUSE_POSIX_LOCKS	(1 << 1)
> @@ -372,6 +374,7 @@ struct fuse_file_lock {
>  #define FUSE_SUBMOUNTS		(1 << 27)
>  #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
>  #define FUSE_SETXATTR_EXT	(1 << 29)
> +#define FUSE_HAS_INODE_DAX	(1 << 30)
>  
>  /**
>   * CUSE INIT request/reply flags
> @@ -454,8 +457,10 @@ struct fuse_file_lock {
>   * fuse_attr flags
>   *
>   * FUSE_ATTR_SUBMOUNT: Object is a submount root
> + * FUSE_ATTR_DAX: Enable DAX for this file in per inode DAX mode
>   */
>  #define FUSE_ATTR_SUBMOUNT      (1 << 0)
> +#define FUSE_ATTR_DAX		(1 << 1)
>  
>  /**
>   * Open flags
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 3/7] fuse: support per inode DAX in fuse protocol
@ 2021-12-13 18:09     ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:09 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:26PM +0800, Jeffle Xu wrote:
> Expand the fuse protocol to support per inode DAX.
> 
> FUSE_HAS_INODE_DAX flag is added indicating if fuse server/client
> supporting per inode DAX. It can be conveyed in both FUSE_INIT request
> and reply.
> 
> FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled for
> corresponding file. It is conveyed in FUSE_LOOKUP reply.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek
> ---
>  include/uapi/linux/fuse.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index a1dc3ee1d17c..63a9a963f4d9 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -187,6 +187,7 @@
>   *
>   *  7.35
>   *  - add FOPEN_NOFLUSH
> + *  - add FUSE_HAS_INODE_DAX, FUSE_ATTR_DAX
>   */
>  
>  #ifndef _LINUX_FUSE_H
> @@ -341,6 +342,7 @@ struct fuse_file_lock {
>   *			write/truncate sgid is killed only if file has group
>   *			execute permission. (Same as Linux VFS behavior).
>   * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
> + * FUSE_HAS_INODE_DAX:  use per inode DAX
>   */
>  #define FUSE_ASYNC_READ		(1 << 0)
>  #define FUSE_POSIX_LOCKS	(1 << 1)
> @@ -372,6 +374,7 @@ struct fuse_file_lock {
>  #define FUSE_SUBMOUNTS		(1 << 27)
>  #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
>  #define FUSE_SETXATTR_EXT	(1 << 29)
> +#define FUSE_HAS_INODE_DAX	(1 << 30)
>  
>  /**
>   * CUSE INIT request/reply flags
> @@ -454,8 +457,10 @@ struct fuse_file_lock {
>   * fuse_attr flags
>   *
>   * FUSE_ATTR_SUBMOUNT: Object is a submount root
> + * FUSE_ATTR_DAX: Enable DAX for this file in per inode DAX mode
>   */
>  #define FUSE_ATTR_SUBMOUNT      (1 << 0)
> +#define FUSE_ATTR_DAX		(1 << 1)
>  
>  /**
>   * Open flags
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 4/7] fuse: enable per inode DAX
  2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
@ 2021-12-13 18:10     ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:10 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:27PM +0800, Jeffle Xu wrote:
> DAX may be limited in some specific situation. When the number of usable
> DAX windows is under watermark, the recalim routine will be triggered to
> reclaim some DAX windows. It may have a negative impact on the
> performance, since some processes may need to wait for DAX windows to be
> recalimed and reused then. To mitigate the performance degradation, the
> overall DAX window need to be expanded larger.
> 
> However, simply expanding the DAX window may not be a good deal in some
> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest, which is greater than the memory footprint if it uses
> guest page cache when DAX disabled. Thus it'd better disable DAX for
> those files smaller than 32KB, to reduce the demand for DAX window and
> thus avoid the unworthy memory overhead.
> 
> Per inode DAX feature is introduced to address this issue, by offering a
> finer grained control for dax to users, trying to achieve a balance
> between performance and memory overhead.
> 
> The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
> DAX should be enabled or not for corresponding file. Currently the state
> whether DAX is enabled or not for the file is initialized only when
> inode is instantiated.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviwed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
>  fs/fuse/dax.c    | 12 ++++++++----
>  fs/fuse/file.c   |  4 ++--
>  fs/fuse/fuse_i.h |  4 ++--
>  fs/fuse/inode.c  |  2 +-
>  4 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index b9a031a82934..1550c3624414 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1332,7 +1332,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  	.invalidatepage	= noop_invalidatepage,
>  };
>  
> -static bool fuse_should_enable_dax(struct inode *inode)
> +static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
>  	enum fuse_dax_mode dax_mode = fc->dax_mode;
> @@ -1347,12 +1347,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
>  	if (!fc->dax)
>  		return false;
>  
> -	return true;
> +	if (dax_mode == FUSE_DAX_ALWAYS)
> +		return true;
> +
> +	/* dax_mode is FUSE_DAX_INODE* */
> +	return flags & FUSE_ATTR_DAX;
>  }
>  
> -void fuse_dax_inode_init(struct inode *inode)
> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  {
> -	if (!fuse_should_enable_dax(inode))
> +	if (!fuse_should_enable_dax(inode, flags))
>  		return;
>  
>  	inode->i_flags |= S_DAX;
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 9d6c5f6361f7..90067584e103 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -3169,7 +3169,7 @@ static const struct address_space_operations fuse_file_aops  = {
>  	.write_end	= fuse_write_end,
>  };
>  
> -void fuse_init_file_inode(struct inode *inode)
> +void fuse_init_file_inode(struct inode *inode, unsigned int flags)
>  {
>  	struct fuse_inode *fi = get_fuse_inode(inode);
>  
> @@ -3183,5 +3183,5 @@ void fuse_init_file_inode(struct inode *inode)
>  	fi->writepages = RB_ROOT;
>  
>  	if (IS_ENABLED(CONFIG_FUSE_DAX))
> -		fuse_dax_inode_init(inode);
> +		fuse_dax_inode_init(inode, flags);
>  }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 19ded93cfc49..f03ea7cb74b0 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1022,7 +1022,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
>  /**
>   * Initialize file operations on a regular file
>   */
> -void fuse_init_file_inode(struct inode *inode);
> +void fuse_init_file_inode(struct inode *inode, unsigned int flags);
>  
>  /**
>   * Initialize inode operations on regular files and special files
> @@ -1288,7 +1288,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
>  			struct dax_device *dax_dev);
>  void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
> -void fuse_dax_inode_init(struct inode *inode);
> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 4a41e6a73f3f..0669e41a9645 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -313,7 +313,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
>  	inode->i_ctime.tv_nsec = attr->ctimensec;
>  	if (S_ISREG(inode->i_mode)) {
>  		fuse_init_common(inode);
> -		fuse_init_file_inode(inode);
> +		fuse_init_file_inode(inode, attr->flags);
>  	} else if (S_ISDIR(inode->i_mode))
>  		fuse_init_dir(inode);
>  	else if (S_ISLNK(inode->i_mode))
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 4/7] fuse: enable per inode DAX
@ 2021-12-13 18:10     ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:10 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:27PM +0800, Jeffle Xu wrote:
> DAX may be limited in some specific situation. When the number of usable
> DAX windows is under watermark, the recalim routine will be triggered to
> reclaim some DAX windows. It may have a negative impact on the
> performance, since some processes may need to wait for DAX windows to be
> recalimed and reused then. To mitigate the performance degradation, the
> overall DAX window need to be expanded larger.
> 
> However, simply expanding the DAX window may not be a good deal in some
> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest, which is greater than the memory footprint if it uses
> guest page cache when DAX disabled. Thus it'd better disable DAX for
> those files smaller than 32KB, to reduce the demand for DAX window and
> thus avoid the unworthy memory overhead.
> 
> Per inode DAX feature is introduced to address this issue, by offering a
> finer grained control for dax to users, trying to achieve a balance
> between performance and memory overhead.
> 
> The FUSE_ATTR_DAX flag in FUSE_LOOKUP reply is used to indicate whether
> DAX should be enabled or not for corresponding file. Currently the state
> whether DAX is enabled or not for the file is initialized only when
> inode is instantiated.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviwed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
>  fs/fuse/dax.c    | 12 ++++++++----
>  fs/fuse/file.c   |  4 ++--
>  fs/fuse/fuse_i.h |  4 ++--
>  fs/fuse/inode.c  |  2 +-
>  4 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index b9a031a82934..1550c3624414 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1332,7 +1332,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
>  	.invalidatepage	= noop_invalidatepage,
>  };
>  
> -static bool fuse_should_enable_dax(struct inode *inode)
> +static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
>  {
>  	struct fuse_conn *fc = get_fuse_conn(inode);
>  	enum fuse_dax_mode dax_mode = fc->dax_mode;
> @@ -1347,12 +1347,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
>  	if (!fc->dax)
>  		return false;
>  
> -	return true;
> +	if (dax_mode == FUSE_DAX_ALWAYS)
> +		return true;
> +
> +	/* dax_mode is FUSE_DAX_INODE* */
> +	return flags & FUSE_ATTR_DAX;
>  }
>  
> -void fuse_dax_inode_init(struct inode *inode)
> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  {
> -	if (!fuse_should_enable_dax(inode))
> +	if (!fuse_should_enable_dax(inode, flags))
>  		return;
>  
>  	inode->i_flags |= S_DAX;
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 9d6c5f6361f7..90067584e103 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -3169,7 +3169,7 @@ static const struct address_space_operations fuse_file_aops  = {
>  	.write_end	= fuse_write_end,
>  };
>  
> -void fuse_init_file_inode(struct inode *inode)
> +void fuse_init_file_inode(struct inode *inode, unsigned int flags)
>  {
>  	struct fuse_inode *fi = get_fuse_inode(inode);
>  
> @@ -3183,5 +3183,5 @@ void fuse_init_file_inode(struct inode *inode)
>  	fi->writepages = RB_ROOT;
>  
>  	if (IS_ENABLED(CONFIG_FUSE_DAX))
> -		fuse_dax_inode_init(inode);
> +		fuse_dax_inode_init(inode, flags);
>  }
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 19ded93cfc49..f03ea7cb74b0 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1022,7 +1022,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
>  /**
>   * Initialize file operations on a regular file
>   */
> -void fuse_init_file_inode(struct inode *inode);
> +void fuse_init_file_inode(struct inode *inode, unsigned int flags);
>  
>  /**
>   * Initialize inode operations on regular files and special files
> @@ -1288,7 +1288,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
>  			struct dax_device *dax_dev);
>  void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
> -void fuse_dax_inode_init(struct inode *inode);
> +void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 4a41e6a73f3f..0669e41a9645 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -313,7 +313,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
>  	inode->i_ctime.tv_nsec = attr->ctimensec;
>  	if (S_ISREG(inode->i_mode)) {
>  		fuse_init_common(inode);
> -		fuse_init_file_inode(inode);
> +		fuse_init_file_inode(inode, attr->flags);
>  	} else if (S_ISDIR(inode->i_mode))
>  		fuse_init_dir(inode);
>  	else if (S_ISLNK(inode->i_mode))
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 5/7] fuse: negotiate per inode DAX in FUSE_INIT
  2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
@ 2021-12-13 18:10     ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:10 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:28PM +0800, Jeffle Xu wrote:
> Among the FUSE_INIT phase, client shall advertise per inode DAX if it's
> mounted with "dax=inode". Then server is aware that client is in per
> inode DAX mode, and will construct per-inode DAX attribute accordingly.
> 
> Server shall also advertise support for per inode DAX. If server doesn't
> support it while client is mounted with "dax=inode", client will
> silently fallback to "dax=never" since "dax=inode" is advisory only.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek
> ---
>  fs/fuse/dax.c    |  2 +-
>  fs/fuse/fuse_i.h |  3 +++
>  fs/fuse/inode.c  | 13 +++++++++----
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 1550c3624414..234c2278420f 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1351,7 +1351,7 @@ static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
>  		return true;
>  
>  	/* dax_mode is FUSE_DAX_INODE* */
> -	return flags & FUSE_ATTR_DAX;
> +	return fc->inode_dax && (flags & FUSE_ATTR_DAX);
>  }
>  
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index f03ea7cb74b0..83970723314a 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -777,6 +777,9 @@ struct fuse_conn {
>  	/* Propagate syncfs() to server */
>  	unsigned int sync_fs:1;
>  
> +	/* Does the filesystem support per inode DAX? */
> +	unsigned int inode_dax:1;
> +
>  	/** The number of requests waiting for completion */
>  	atomic_t num_waiting;
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 0669e41a9645..b26612fce6d0 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1169,10 +1169,13 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>  					min_t(unsigned int, fc->max_pages_limit,
>  					max_t(unsigned int, arg->max_pages, 1));
>  			}
> -			if (IS_ENABLED(CONFIG_FUSE_DAX) &&
> -			    arg->flags & FUSE_MAP_ALIGNMENT &&
> -			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
> -				ok = false;
> +			if (IS_ENABLED(CONFIG_FUSE_DAX)) {
> +				if (arg->flags & FUSE_MAP_ALIGNMENT &&
> +				    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
> +					ok = false;
> +				}
> +				if (arg->flags & FUSE_HAS_INODE_DAX)
> +					fc->inode_dax = 1;
>  			}
>  			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>  				fc->handle_killpriv_v2 = 1;
> @@ -1227,6 +1230,8 @@ void fuse_send_init(struct fuse_mount *fm)
>  #ifdef CONFIG_FUSE_DAX
>  	if (fm->fc->dax)
>  		ia->in.flags |= FUSE_MAP_ALIGNMENT;
> +	if (fuse_is_inode_dax_mode(fm->fc->dax_mode))
> +		ia->in.flags |= FUSE_HAS_INODE_DAX;
>  #endif
>  	if (fm->fc->auto_submounts)
>  		ia->in.flags |= FUSE_SUBMOUNTS;
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 5/7] fuse: negotiate per inode DAX in FUSE_INIT
@ 2021-12-13 18:10     ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:10 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:28PM +0800, Jeffle Xu wrote:
> Among the FUSE_INIT phase, client shall advertise per inode DAX if it's
> mounted with "dax=inode". Then server is aware that client is in per
> inode DAX mode, and will construct per-inode DAX attribute accordingly.
> 
> Server shall also advertise support for per inode DAX. If server doesn't
> support it while client is mounted with "dax=inode", client will
> silently fallback to "dax=never" since "dax=inode" is advisory only.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek
> ---
>  fs/fuse/dax.c    |  2 +-
>  fs/fuse/fuse_i.h |  3 +++
>  fs/fuse/inode.c  | 13 +++++++++----
>  3 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 1550c3624414..234c2278420f 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1351,7 +1351,7 @@ static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
>  		return true;
>  
>  	/* dax_mode is FUSE_DAX_INODE* */
> -	return flags & FUSE_ATTR_DAX;
> +	return fc->inode_dax && (flags & FUSE_ATTR_DAX);
>  }
>  
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index f03ea7cb74b0..83970723314a 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -777,6 +777,9 @@ struct fuse_conn {
>  	/* Propagate syncfs() to server */
>  	unsigned int sync_fs:1;
>  
> +	/* Does the filesystem support per inode DAX? */
> +	unsigned int inode_dax:1;
> +
>  	/** The number of requests waiting for completion */
>  	atomic_t num_waiting;
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 0669e41a9645..b26612fce6d0 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1169,10 +1169,13 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
>  					min_t(unsigned int, fc->max_pages_limit,
>  					max_t(unsigned int, arg->max_pages, 1));
>  			}
> -			if (IS_ENABLED(CONFIG_FUSE_DAX) &&
> -			    arg->flags & FUSE_MAP_ALIGNMENT &&
> -			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
> -				ok = false;
> +			if (IS_ENABLED(CONFIG_FUSE_DAX)) {
> +				if (arg->flags & FUSE_MAP_ALIGNMENT &&
> +				    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
> +					ok = false;
> +				}
> +				if (arg->flags & FUSE_HAS_INODE_DAX)
> +					fc->inode_dax = 1;
>  			}
>  			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>  				fc->handle_killpriv_v2 = 1;
> @@ -1227,6 +1230,8 @@ void fuse_send_init(struct fuse_mount *fm)
>  #ifdef CONFIG_FUSE_DAX
>  	if (fm->fc->dax)
>  		ia->in.flags |= FUSE_MAP_ALIGNMENT;
> +	if (fuse_is_inode_dax_mode(fm->fc->dax_mode))
> +		ia->in.flags |= FUSE_HAS_INODE_DAX;
>  #endif
>  	if (fm->fc->auto_submounts)
>  		ia->in.flags |= FUSE_SUBMOUNTS;
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes
  2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
@ 2021-12-13 18:10     ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:10 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:29PM +0800, Jeffle Xu wrote:
> When the per inode DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per inode DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per inode DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 234c2278420f..b19e7eaed4ef 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1363,6 +1363,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +	if (fuse_is_inode_dax_mode(fc->dax_mode) &&
> +	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
> +		d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>  {
>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 83970723314a..af19d1d821ea 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1293,6 +1293,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b26612fce6d0..b25d99eb8411 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -301,6 +301,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>  		if (inval)
>  			invalidate_inode_pages2(inode->i_mapping);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
> +		fuse_dax_dontcache(inode, attr->flags);
>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes
@ 2021-12-13 18:10     ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:10 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:29PM +0800, Jeffle Xu wrote:
> When the per inode DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per inode DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per inode DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 234c2278420f..b19e7eaed4ef 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1363,6 +1363,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +	if (fuse_is_inode_dax_mode(fc->dax_mode) &&
> +	    (!!IS_DAX(inode) != !!(flags & FUSE_ATTR_DAX)))
> +		d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>  {
>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 83970723314a..af19d1d821ea 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1293,6 +1293,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index b26612fce6d0..b25d99eb8411 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -301,6 +301,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>  		if (inval)
>  			invalidate_inode_pages2(inode->i_mapping);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
> +		fuse_dax_dontcache(inode, attr->flags);
>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 7/7] Documentation/filesystem/dax: DAX on virtiofs
  2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
@ 2021-12-13 18:11     ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:11 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:30PM +0800, Jeffle Xu wrote:
> Record DAX on virtiofs and the semantic difference with that on ext4
> and xfs.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
>  Documentation/filesystems/dax.rst | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/dax.rst b/Documentation/filesystems/dax.rst
> index 9a1b8fd9e82b..e3b30429d703 100644
> --- a/Documentation/filesystems/dax.rst
> +++ b/Documentation/filesystems/dax.rst
> @@ -23,8 +23,8 @@ on it as usual.  The `DAX` code currently only supports files with a block
>  size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block
>  size when creating the filesystem.
>  
> -Currently 3 filesystems support `DAX`: ext2, ext4 and xfs.  Enabling `DAX` on them
> -is different.
> +Currently 4 filesystems support `DAX`: ext2, ext4, xfs and virtiofs.
> +Enabling `DAX` on them is different.
>  
>  Enabling DAX on ext2
>  --------------------
> @@ -168,6 +168,22 @@ if the underlying media does not support dax and/or the filesystem is
>  overridden with a mount option.
>  
>  
> +Enabling DAX on virtiofs
> +----------------------------
> +The semantic of DAX on virtiofs is basically equal to that on ext4 and xfs,
> +except that when '-o dax=inode' is specified, virtiofs client derives the hint
> +whether DAX shall be enabled or not from virtiofs server through FUSE protocol,
> +rather than the persistent `FS_XFLAG_DAX` flag. That is, whether DAX shall be
> +enabled or not is completely determined by virtiofs server, while virtiofs
> +server itself may deploy various algorithm making this decision, e.g. depending
> +on the persistent `FS_XFLAG_DAX` flag on the host.
> +
> +It is still supported to set or clear persistent `FS_XFLAG_DAX` flag inside
> +guest, but it is not guaranteed that DAX will be enabled or disabled for
> +corresponding file then. Users inside guest still need to call statx(2) and
> +check the statx flag `STATX_ATTR_DAX` to see if DAX is enabled for this file.
> +
> +
>  Implementation Tips for Block Driver Writers
>  --------------------------------------------
>  
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 7/7] Documentation/filesystem/dax: DAX on virtiofs
@ 2021-12-13 18:11     ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:11 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:30PM +0800, Jeffle Xu wrote:
> Record DAX on virtiofs and the semantic difference with that on ext4
> and xfs.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>

Reviewed-by: Vivek Goyal <vgoyal@redhat.com>

Vivek

> ---
>  Documentation/filesystems/dax.rst | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/dax.rst b/Documentation/filesystems/dax.rst
> index 9a1b8fd9e82b..e3b30429d703 100644
> --- a/Documentation/filesystems/dax.rst
> +++ b/Documentation/filesystems/dax.rst
> @@ -23,8 +23,8 @@ on it as usual.  The `DAX` code currently only supports files with a block
>  size equal to your kernel's `PAGE_SIZE`, so you may need to specify a block
>  size when creating the filesystem.
>  
> -Currently 3 filesystems support `DAX`: ext2, ext4 and xfs.  Enabling `DAX` on them
> -is different.
> +Currently 4 filesystems support `DAX`: ext2, ext4, xfs and virtiofs.
> +Enabling `DAX` on them is different.
>  
>  Enabling DAX on ext2
>  --------------------
> @@ -168,6 +168,22 @@ if the underlying media does not support dax and/or the filesystem is
>  overridden with a mount option.
>  
>  
> +Enabling DAX on virtiofs
> +----------------------------
> +The semantic of DAX on virtiofs is basically equal to that on ext4 and xfs,
> +except that when '-o dax=inode' is specified, virtiofs client derives the hint
> +whether DAX shall be enabled or not from virtiofs server through FUSE protocol,
> +rather than the persistent `FS_XFLAG_DAX` flag. That is, whether DAX shall be
> +enabled or not is completely determined by virtiofs server, while virtiofs
> +server itself may deploy various algorithm making this decision, e.g. depending
> +on the persistent `FS_XFLAG_DAX` flag on the host.
> +
> +It is still supported to set or clear persistent `FS_XFLAG_DAX` flag inside
> +guest, but it is not guaranteed that DAX will be enabled or disabled for
> +corresponding file then. Users inside guest still need to call statx(2) and
> +check the statx flag `STATX_ATTR_DAX` to see if DAX is enabled for this file.
> +
> +
>  Implementation Tips for Block Driver Writers
>  --------------------------------------------
>  
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v8 0/7] fuse,virtiofs: support per-file DAX
  2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
@ 2021-12-13 18:12   ` Vivek Goyal
  -1 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:12 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: stefanha, miklos, virtio-fs, linux-fsdevel, joseph.qi

On Thu, Nov 25, 2021 at 03:05:23PM +0800, Jeffle Xu wrote:
> changes since v7:
> - rebase to v5.16
> - patch 2: rename FUSE_DAX_NONE|FUSE_DAX_INODE to
>   FUSE_DAX_INODE_DEFAULT|FUSE_DAX_INODE_USER
> - patch 5: remove redundant call for fuse_is_inode_dax_mode() in
>   process_init_reply()
> - patch 5: if server's map alignment is non-compliant (fail
>   fuse_dax_check_alignment()), the mounted fs won't work and users are
>   required to remount it explicitly, instead of silently falling back to
>   'never' mode.

Thanks Jeffle for this work. These patches look good to me. I have done
a basic testing with it and it seems to work as expected. Hence I have
provided my Reviewed-by tags for all the patches. 

Now it is up to the Miklos to decide whether he likes the patches or not.

Thanks
Vivek
> 
> Corresponding changes to virtiofsd:
> https://www.mail-archive.com/virtio-fs@redhat.com/msg04349.html
> 
> v7: https://lore.kernel.org/all/c41837f0-a183-d911-885d-cf3bcdd9b7c8@linux.alibaba.com/T/
> v6: https://lore.kernel.org/all/20211011030052.98923-1-jefflexu@linux.alibaba.com/
> v5: https://lore.kernel.org/all/20210923092526.72341-1-jefflexu@linux.alibaba.com/
> v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
> v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
> v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
> v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html
> 
> Original Rationale for this Patchset
> ====================================
> 
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Any comment is welcome.
> 
> [1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
> [2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")
> 
> [Purpose]
> DAX may be limited in some specific situation. When the number of usable
> DAX windows is under watermark, the recalim routine will be triggered to
> reclaim some DAX windows. It may have a negative impact on the
> performance, since some processes may need to wait for DAX windows to be
> recalimed and reused then. To mitigate the performance degradation, the
> overall DAX window need to be expanded larger.
> 
> However, simply expanding the DAX window may not be a good deal in some
> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest, which is greater than the memory footprint if it uses
> guest page cache when DAX disabled. Thus it'd better disable DAX for
> those files smaller than 32KB, to reduce the demand for DAX window and
> thus avoid the unworthy memory overhead.
> 
> Per-file DAX feature is introduced to address this issue, by offering a
> finer grained control for dax to users, trying to achieve a balance
> between performance and memory overhead.
> 
> 
> [Note]
> When the per-file DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state, since dynamic switching needs to switch a_ops atomiclly. Ira
> Weiny had ever implemented a so called i_aops_sem lock [3] but
> eventually gave up since the complexity of the implementation
> [4][5][6][7].
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per-file DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later. This is also how ext4/xfs per-file DAX works.
> 
> [3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
> [4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
> [5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
> [6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> [7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/
> 
> 
> Jeffle Xu (7):
>   fuse: add fuse_should_enable_dax() helper
>   fuse: make DAX mount option a tri-state
>   fuse: support per inode DAX in fuse protocol
>   fuse: enable per inode DAX
>   fuse: negotiate per inode DAX in FUSE_INIT
>   fuse: mark inode DONT_CACHE when per inode DAX hint changes
>   Documentation/filesystem/dax: DAX on virtiofs
> 
>  Documentation/filesystems/dax.rst | 20 +++++++++++++++--
>  fs/fuse/dax.c                     | 36 +++++++++++++++++++++++++++++--
>  fs/fuse/file.c                    |  4 ++--
>  fs/fuse/fuse_i.h                  | 28 ++++++++++++++++++++----
>  fs/fuse/inode.c                   | 28 +++++++++++++++++-------
>  fs/fuse/virtio_fs.c               | 18 +++++++++++++---
>  include/uapi/linux/fuse.h         |  5 +++++
>  7 files changed, 118 insertions(+), 21 deletions(-)
> 
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Virtio-fs] [PATCH v8 0/7] fuse,virtiofs: support per-file DAX
@ 2021-12-13 18:12   ` Vivek Goyal
  0 siblings, 0 replies; 36+ messages in thread
From: Vivek Goyal @ 2021-12-13 18:12 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: virtio-fs, linux-fsdevel, joseph.qi, miklos

On Thu, Nov 25, 2021 at 03:05:23PM +0800, Jeffle Xu wrote:
> changes since v7:
> - rebase to v5.16
> - patch 2: rename FUSE_DAX_NONE|FUSE_DAX_INODE to
>   FUSE_DAX_INODE_DEFAULT|FUSE_DAX_INODE_USER
> - patch 5: remove redundant call for fuse_is_inode_dax_mode() in
>   process_init_reply()
> - patch 5: if server's map alignment is non-compliant (fail
>   fuse_dax_check_alignment()), the mounted fs won't work and users are
>   required to remount it explicitly, instead of silently falling back to
>   'never' mode.

Thanks Jeffle for this work. These patches look good to me. I have done
a basic testing with it and it seems to work as expected. Hence I have
provided my Reviewed-by tags for all the patches. 

Now it is up to the Miklos to decide whether he likes the patches or not.

Thanks
Vivek
> 
> Corresponding changes to virtiofsd:
> https://www.mail-archive.com/virtio-fs@redhat.com/msg04349.html
> 
> v7: https://lore.kernel.org/all/c41837f0-a183-d911-885d-cf3bcdd9b7c8@linux.alibaba.com/T/
> v6: https://lore.kernel.org/all/20211011030052.98923-1-jefflexu@linux.alibaba.com/
> v5: https://lore.kernel.org/all/20210923092526.72341-1-jefflexu@linux.alibaba.com/
> v4: https://lore.kernel.org/linux-fsdevel/20210817022220.17574-1-jefflexu@linux.alibaba.com/
> v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
> v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
> v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html
> 
> Original Rationale for this Patchset
> ====================================
> 
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Any comment is welcome.
> 
> [1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
> [2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")
> 
> [Purpose]
> DAX may be limited in some specific situation. When the number of usable
> DAX windows is under watermark, the recalim routine will be triggered to
> reclaim some DAX windows. It may have a negative impact on the
> performance, since some processes may need to wait for DAX windows to be
> recalimed and reused then. To mitigate the performance degradation, the
> overall DAX window need to be expanded larger.
> 
> However, simply expanding the DAX window may not be a good deal in some
> scenario. To maintain one DAX window chunk (i.e., 2MB in size), 32KB
> (512 * 64 bytes) memory footprint will be consumed for page descriptors
> inside guest, which is greater than the memory footprint if it uses
> guest page cache when DAX disabled. Thus it'd better disable DAX for
> those files smaller than 32KB, to reduce the demand for DAX window and
> thus avoid the unworthy memory overhead.
> 
> Per-file DAX feature is introduced to address this issue, by offering a
> finer grained control for dax to users, trying to achieve a balance
> between performance and memory overhead.
> 
> 
> [Note]
> When the per-file DAX hint changes while the file is still *opened*, it
> is quite complicated and maybe fragile to dynamically change the DAX
> state, since dynamic switching needs to switch a_ops atomiclly. Ira
> Weiny had ever implemented a so called i_aops_sem lock [3] but
> eventually gave up since the complexity of the implementation
> [4][5][6][7].
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> per-file DAX hint changes, so that the inode instance will be evicted
> and freed as soon as possible once the file is closed and the last
> reference to the inode is put. And then when the file gets reopened next
> time, the new instantiated inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX hint changes for an *opened* file, the
> DAX state of the file won't be updated until this file is closed and
> reopened later. This is also how ext4/xfs per-file DAX works.
> 
> [3] https://lore.kernel.org/lkml/20200227052442.22524-7-ira.weiny@intel.com/
> [4] https://patchwork.kernel.org/project/xfs/cover/20200407182958.568475-1-ira.weiny@intel.com/
> [5] https://lore.kernel.org/lkml/20200305155144.GA5598@lst.de/
> [6] https://lore.kernel.org/lkml/20200401040021.GC56958@magnolia/
> [7] https://lore.kernel.org/lkml/20200403182904.GP80283@magnolia/
> 
> 
> Jeffle Xu (7):
>   fuse: add fuse_should_enable_dax() helper
>   fuse: make DAX mount option a tri-state
>   fuse: support per inode DAX in fuse protocol
>   fuse: enable per inode DAX
>   fuse: negotiate per inode DAX in FUSE_INIT
>   fuse: mark inode DONT_CACHE when per inode DAX hint changes
>   Documentation/filesystem/dax: DAX on virtiofs
> 
>  Documentation/filesystems/dax.rst | 20 +++++++++++++++--
>  fs/fuse/dax.c                     | 36 +++++++++++++++++++++++++++++--
>  fs/fuse/file.c                    |  4 ++--
>  fs/fuse/fuse_i.h                  | 28 ++++++++++++++++++++----
>  fs/fuse/inode.c                   | 28 +++++++++++++++++-------
>  fs/fuse/virtio_fs.c               | 18 +++++++++++++---
>  include/uapi/linux/fuse.h         |  5 +++++
>  7 files changed, 118 insertions(+), 21 deletions(-)
> 
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-12-13 18:13 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-25  7:05 [PATCH v8 0/7] fuse,virtiofs: support per-file DAX Jeffle Xu
2021-11-25  7:05 ` [Virtio-fs] " Jeffle Xu
2021-11-25  7:05 ` [PATCH v8 1/7] fuse: add fuse_should_enable_dax() helper Jeffle Xu
2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
2021-12-13 18:08   ` Vivek Goyal
2021-12-13 18:08     ` [Virtio-fs] " Vivek Goyal
2021-11-25  7:05 ` [PATCH v8 2/7] fuse: make DAX mount option a tri-state Jeffle Xu
2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
2021-12-13 18:09   ` Vivek Goyal
2021-12-13 18:09     ` [Virtio-fs] " Vivek Goyal
2021-11-25  7:05 ` [PATCH v8 3/7] fuse: support per inode DAX in fuse protocol Jeffle Xu
2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
2021-12-13 18:09   ` Vivek Goyal
2021-12-13 18:09     ` [Virtio-fs] " Vivek Goyal
2021-11-25  7:05 ` [PATCH v8 4/7] fuse: enable per inode DAX Jeffle Xu
2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
2021-12-13 18:10   ` Vivek Goyal
2021-12-13 18:10     ` [Virtio-fs] " Vivek Goyal
2021-11-25  7:05 ` [PATCH v8 5/7] fuse: negotiate per inode DAX in FUSE_INIT Jeffle Xu
2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
2021-12-13 18:10   ` Vivek Goyal
2021-12-13 18:10     ` [Virtio-fs] " Vivek Goyal
2021-11-25  7:05 ` [PATCH v8 6/7] fuse: mark inode DONT_CACHE when per inode DAX hint changes Jeffle Xu
2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
2021-12-07 16:00   ` Vivek Goyal
2021-12-07 16:00     ` [Virtio-fs] " Vivek Goyal
2021-12-08  1:36     ` JeffleXu
2021-12-08  1:36       ` [Virtio-fs] " JeffleXu
2021-12-13 18:10   ` Vivek Goyal
2021-12-13 18:10     ` [Virtio-fs] " Vivek Goyal
2021-11-25  7:05 ` [PATCH v8 7/7] Documentation/filesystem/dax: DAX on virtiofs Jeffle Xu
2021-11-25  7:05   ` [Virtio-fs] " Jeffle Xu
2021-12-13 18:11   ` Vivek Goyal
2021-12-13 18:11     ` [Virtio-fs] " Vivek Goyal
2021-12-13 18:12 ` [PATCH v8 0/7] fuse,virtiofs: support per-file DAX Vivek Goyal
2021-12-13 18:12   ` [Virtio-fs] " Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.