All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17  2:22 ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

This patchset adds support of per-file DAX for virtiofs, which is
inspired by Ira Weiny's work on ext4[1] and xfs[2].

Any comment is welcome.

[1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
[2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")


changes since v3:
- bug fix (patch 6): s/"IS_DAX(inode) != newdax"/"!!IS_DAX(inode) != newdax"
- during FUSE_INIT, advertise capability for per-file DAX only when
  mounted as "-o dax=inode" (patch 4)

changes since v2:
- modify fuse_show_options() accordingly to make it compatible with
  new tri-state mount option (patch 2)
- extract FUSE protocol changes into one seperate patch (patch 3)
- FUSE server/client need to negotiate if they support per-file DAX
  (patch 4)
- extract DONT_CACHE logic into patch 6/7

v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html

Jeffle Xu (8):
  fuse: add fuse_should_enable_dax() helper
  fuse: Make DAX mount option a tri-state
  fuse: support per-file DAX
  fuse: negotiate if server/client supports per-file DAX
  fuse: enable per-file DAX
  fuse: mark inode DONT_CACHE when per-file DAX indication changes
  fuse: support changing per-file DAX flag inside guest
  fuse: show '-o dax=inode' option only when FUSE server supports

 fs/fuse/dax.c             | 32 +++++++++++++++++++++++++++++---
 fs/fuse/file.c            |  4 ++--
 fs/fuse/fuse_i.h          | 22 ++++++++++++++++++----
 fs/fuse/inode.c           | 27 +++++++++++++++++++--------
 fs/fuse/ioctl.c           | 15 +++++++++++++--
 fs/fuse/virtio_fs.c       | 16 ++++++++++++++--
 include/uapi/linux/fuse.h |  9 ++++++++-
 7 files changed, 103 insertions(+), 22 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17  2:22 ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

This patchset adds support of per-file DAX for virtiofs, which is
inspired by Ira Weiny's work on ext4[1] and xfs[2].

Any comment is welcome.

[1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
[2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")


changes since v3:
- bug fix (patch 6): s/"IS_DAX(inode) != newdax"/"!!IS_DAX(inode) != newdax"
- during FUSE_INIT, advertise capability for per-file DAX only when
  mounted as "-o dax=inode" (patch 4)

changes since v2:
- modify fuse_show_options() accordingly to make it compatible with
  new tri-state mount option (patch 2)
- extract FUSE protocol changes into one seperate patch (patch 3)
- FUSE server/client need to negotiate if they support per-file DAX
  (patch 4)
- extract DONT_CACHE logic into patch 6/7

v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html

Jeffle Xu (8):
  fuse: add fuse_should_enable_dax() helper
  fuse: Make DAX mount option a tri-state
  fuse: support per-file DAX
  fuse: negotiate if server/client supports per-file DAX
  fuse: enable per-file DAX
  fuse: mark inode DONT_CACHE when per-file DAX indication changes
  fuse: support changing per-file DAX flag inside guest
  fuse: show '-o dax=inode' option only when FUSE server supports

 fs/fuse/dax.c             | 32 +++++++++++++++++++++++++++++---
 fs/fuse/file.c            |  4 ++--
 fs/fuse/fuse_i.h          | 22 ++++++++++++++++++----
 fs/fuse/inode.c           | 27 +++++++++++++++++++--------
 fs/fuse/ioctl.c           | 15 +++++++++++++--
 fs/fuse/virtio_fs.c       | 16 ++++++++++++++--
 include/uapi/linux/fuse.h |  9 ++++++++-
 7 files changed, 103 insertions(+), 22 deletions(-)

-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17  2:22 ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

This patchset adds support of per-file DAX for virtiofs, which is
inspired by Ira Weiny's work on ext4[1] and xfs[2].

Any comment is welcome.

[1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
[2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")


changes since v3:
- bug fix (patch 6): s/"IS_DAX(inode) != newdax"/"!!IS_DAX(inode) != newdax"
- during FUSE_INIT, advertise capability for per-file DAX only when
  mounted as "-o dax=inode" (patch 4)

changes since v2:
- modify fuse_show_options() accordingly to make it compatible with
  new tri-state mount option (patch 2)
- extract FUSE protocol changes into one seperate patch (patch 3)
- FUSE server/client need to negotiate if they support per-file DAX
  (patch 4)
- extract DONT_CACHE logic into patch 6/7

v3: https://www.spinics.net/lists/linux-fsdevel/msg200852.html
v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html

Jeffle Xu (8):
  fuse: add fuse_should_enable_dax() helper
  fuse: Make DAX mount option a tri-state
  fuse: support per-file DAX
  fuse: negotiate if server/client supports per-file DAX
  fuse: enable per-file DAX
  fuse: mark inode DONT_CACHE when per-file DAX indication changes
  fuse: support changing per-file DAX flag inside guest
  fuse: show '-o dax=inode' option only when FUSE server supports

 fs/fuse/dax.c             | 32 +++++++++++++++++++++++++++++---
 fs/fuse/file.c            |  4 ++--
 fs/fuse/fuse_i.h          | 22 ++++++++++++++++++----
 fs/fuse/inode.c           | 27 +++++++++++++++++++--------
 fs/fuse/ioctl.c           | 15 +++++++++++++--
 fs/fuse/virtio_fs.c       | 16 ++++++++++++++--
 include/uapi/linux/fuse.h |  9 ++++++++-
 7 files changed, 103 insertions(+), 22 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [PATCH v4 1/8] fuse: add fuse_should_enable_dax() helper
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:22   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

This is in prep for following per-file DAX checking.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 0e5407f48e6a..c6f4e82e65f3 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1336,11 +1336,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-void fuse_dax_inode_init(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
 	if (!fc->dax)
+		return false;
+
+	return true;
+}
+
+void fuse_dax_inode_init(struct inode *inode)
+{
+	if (!fuse_should_enable_dax(inode))
 		return;
 
 	inode->i_flags |= S_DAX;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 1/8] fuse: add fuse_should_enable_dax() helper
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

This is in prep for following per-file DAX checking.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 0e5407f48e6a..c6f4e82e65f3 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1336,11 +1336,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-void fuse_dax_inode_init(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
 	if (!fc->dax)
+		return false;
+
+	return true;
+}
+
+void fuse_dax_inode_init(struct inode *inode)
+{
+	if (!fuse_should_enable_dax(inode))
 		return;
 
 	inode->i_flags |= S_DAX;
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 1/8] fuse: add fuse_should_enable_dax() helper
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

This is in prep for following per-file DAX checking.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 0e5407f48e6a..c6f4e82e65f3 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1336,11 +1336,19 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-void fuse_dax_inode_init(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
 	if (!fc->dax)
+		return false;
+
+	return true;
+}
+
+void fuse_dax_inode_init(struct inode *inode)
+{
+	if (!fuse_should_enable_dax(inode))
 		return;
 
 	inode->i_flags |= S_DAX;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 2/8] fuse: Make DAX mount option a tri-state
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:22   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'.

By the time this patch is applied, 'inode' mode is actually equal to
'always' mode, before the per-file DAX flag is introduced in the
following patch.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c       |  9 +++++++--
 fs/fuse/fuse_i.h    | 14 ++++++++++++--
 fs/fuse/inode.c     | 10 +++++++---
 fs/fuse/virtio_fs.c | 16 ++++++++++++++--
 4 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index c6f4e82e65f3..fe4e9593a590 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1288,11 +1288,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
 	return ret;
 }
 
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
+			struct dax_device *dax_dev)
 {
 	struct fuse_conn_dax *fcd;
 	int err;
 
+	fc->dax_mode = dax_mode;
+
 	if (!dax_dev)
 		return 0;
 
@@ -1339,8 +1342,10 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	unsigned int dax_mode = fc->dax_mode;
 
-	if (!fc->dax)
+	/* If 'dax=always/inode', fc->dax couldn't be NULL */
+	if (dax_mode == FUSE_DAX_NEVER)
 		return false;
 
 	return true;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 07829ce78695..a23dd8d0c181 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -487,6 +487,12 @@ struct fuse_dev {
 	struct list_head entry;
 };
 
+enum fuse_dax_mode {
+	FUSE_DAX_INODE,
+	FUSE_DAX_ALWAYS,
+	FUSE_DAX_NEVER,
+};
+
 struct fuse_fs_context {
 	int fd;
 	unsigned int rootmode;
@@ -503,7 +509,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
-	bool dax:1;
+	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
@@ -801,6 +807,9 @@ struct fuse_conn {
 	struct list_head devices;
 
 #ifdef CONFIG_FUSE_DAX
+	/* dax mode: FUSE_DAX_* (always, never or per-file) */
+	enum fuse_dax_mode dax_mode;
+
 	/* Dax specific conn data, non-NULL if DAX is enabled */
 	struct fuse_conn_dax *dax;
 #endif
@@ -1242,7 +1251,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
 ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
 int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
+			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b9beb39a4a18..0bc0d8af81e1 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -690,8 +690,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
 	}
 #ifdef CONFIG_FUSE_DAX
-	if (fc->dax)
-		seq_puts(m, ",dax");
+	if (fc->dax_mode == FUSE_DAX_ALWAYS)
+		seq_puts(m, ",dax=always");
+	else if (fc->dax_mode == FUSE_DAX_NEVER)
+		seq_puts(m, ",dax=never");
+	else if (fc->dax_mode == FUSE_DAX_INODE)
+		seq_puts(m, ",dax=inode");
 #endif
 
 	return 0;
@@ -1434,7 +1438,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	sb->s_subtype = ctx->subtype;
 	ctx->subtype = NULL;
 	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
-		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
+		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
 		if (err)
 			goto err;
 	}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 6a3a23320edc..7dbf5502c57e 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -88,12 +88,21 @@ struct virtio_fs_req_work {
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
+static const struct constant_table dax_param_enums[] = {
+	{"inode",	FUSE_DAX_INODE },
+	{"always",	FUSE_DAX_ALWAYS },
+	{"never",	FUSE_DAX_NEVER },
+	{}
+};
+
 enum {
 	OPT_DAX,
+	OPT_DAX_ENUM,
 };
 
 static const struct fs_parameter_spec virtio_fs_parameters[] = {
 	fsparam_flag("dax", OPT_DAX),
+	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
 	{}
 };
 
@@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fc,
 
 	switch (opt) {
 	case OPT_DAX:
-		ctx->dax = 1;
+		ctx->dax_mode = FUSE_DAX_ALWAYS;
+		break;
+	case OPT_DAX_ENUM:
+		ctx->dax_mode = result.uint_32;
 		break;
 	default:
 		return -EINVAL;
@@ -1323,7 +1335,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
 
 	/* virtiofs allocates and installs its own fuse devices */
 	ctx->fudptr = NULL;
-	if (ctx->dax) {
+	if (ctx->dax_mode != FUSE_DAX_NEVER) {
 		if (!fs->dax_dev) {
 			err = -EINVAL;
 			pr_err("virtio-fs: dax can't be enabled as filesystem"
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 2/8] fuse: Make DAX mount option a tri-state
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'.

By the time this patch is applied, 'inode' mode is actually equal to
'always' mode, before the per-file DAX flag is introduced in the
following patch.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c       |  9 +++++++--
 fs/fuse/fuse_i.h    | 14 ++++++++++++--
 fs/fuse/inode.c     | 10 +++++++---
 fs/fuse/virtio_fs.c | 16 ++++++++++++++--
 4 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index c6f4e82e65f3..fe4e9593a590 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1288,11 +1288,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
 	return ret;
 }
 
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
+			struct dax_device *dax_dev)
 {
 	struct fuse_conn_dax *fcd;
 	int err;
 
+	fc->dax_mode = dax_mode;
+
 	if (!dax_dev)
 		return 0;
 
@@ -1339,8 +1342,10 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	unsigned int dax_mode = fc->dax_mode;
 
-	if (!fc->dax)
+	/* If 'dax=always/inode', fc->dax couldn't be NULL */
+	if (dax_mode == FUSE_DAX_NEVER)
 		return false;
 
 	return true;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 07829ce78695..a23dd8d0c181 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -487,6 +487,12 @@ struct fuse_dev {
 	struct list_head entry;
 };
 
+enum fuse_dax_mode {
+	FUSE_DAX_INODE,
+	FUSE_DAX_ALWAYS,
+	FUSE_DAX_NEVER,
+};
+
 struct fuse_fs_context {
 	int fd;
 	unsigned int rootmode;
@@ -503,7 +509,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
-	bool dax:1;
+	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
@@ -801,6 +807,9 @@ struct fuse_conn {
 	struct list_head devices;
 
 #ifdef CONFIG_FUSE_DAX
+	/* dax mode: FUSE_DAX_* (always, never or per-file) */
+	enum fuse_dax_mode dax_mode;
+
 	/* Dax specific conn data, non-NULL if DAX is enabled */
 	struct fuse_conn_dax *dax;
 #endif
@@ -1242,7 +1251,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
 ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
 int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
+			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b9beb39a4a18..0bc0d8af81e1 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -690,8 +690,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
 	}
 #ifdef CONFIG_FUSE_DAX
-	if (fc->dax)
-		seq_puts(m, ",dax");
+	if (fc->dax_mode == FUSE_DAX_ALWAYS)
+		seq_puts(m, ",dax=always");
+	else if (fc->dax_mode == FUSE_DAX_NEVER)
+		seq_puts(m, ",dax=never");
+	else if (fc->dax_mode == FUSE_DAX_INODE)
+		seq_puts(m, ",dax=inode");
 #endif
 
 	return 0;
@@ -1434,7 +1438,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	sb->s_subtype = ctx->subtype;
 	ctx->subtype = NULL;
 	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
-		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
+		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
 		if (err)
 			goto err;
 	}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 6a3a23320edc..7dbf5502c57e 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -88,12 +88,21 @@ struct virtio_fs_req_work {
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
+static const struct constant_table dax_param_enums[] = {
+	{"inode",	FUSE_DAX_INODE },
+	{"always",	FUSE_DAX_ALWAYS },
+	{"never",	FUSE_DAX_NEVER },
+	{}
+};
+
 enum {
 	OPT_DAX,
+	OPT_DAX_ENUM,
 };
 
 static const struct fs_parameter_spec virtio_fs_parameters[] = {
 	fsparam_flag("dax", OPT_DAX),
+	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
 	{}
 };
 
@@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fc,
 
 	switch (opt) {
 	case OPT_DAX:
-		ctx->dax = 1;
+		ctx->dax_mode = FUSE_DAX_ALWAYS;
+		break;
+	case OPT_DAX_ENUM:
+		ctx->dax_mode = result.uint_32;
 		break;
 	default:
 		return -EINVAL;
@@ -1323,7 +1335,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
 
 	/* virtiofs allocates and installs its own fuse devices */
 	ctx->fudptr = NULL;
-	if (ctx->dax) {
+	if (ctx->dax_mode != FUSE_DAX_NEVER) {
 		if (!fs->dax_dev) {
 			err = -EINVAL;
 			pr_err("virtio-fs: dax can't be enabled as filesystem"
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 2/8] fuse: Make DAX mount option a tri-state
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'.

By the time this patch is applied, 'inode' mode is actually equal to
'always' mode, before the per-file DAX flag is introduced in the
following patch.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c       |  9 +++++++--
 fs/fuse/fuse_i.h    | 14 ++++++++++++--
 fs/fuse/inode.c     | 10 +++++++---
 fs/fuse/virtio_fs.c | 16 ++++++++++++++--
 4 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index c6f4e82e65f3..fe4e9593a590 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1288,11 +1288,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax *fcd)
 	return ret;
 }
 
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
+			struct dax_device *dax_dev)
 {
 	struct fuse_conn_dax *fcd;
 	int err;
 
+	fc->dax_mode = dax_mode;
+
 	if (!dax_dev)
 		return 0;
 
@@ -1339,8 +1342,10 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 static bool fuse_should_enable_dax(struct inode *inode)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	unsigned int dax_mode = fc->dax_mode;
 
-	if (!fc->dax)
+	/* If 'dax=always/inode', fc->dax couldn't be NULL */
+	if (dax_mode == FUSE_DAX_NEVER)
 		return false;
 
 	return true;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 07829ce78695..a23dd8d0c181 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -487,6 +487,12 @@ struct fuse_dev {
 	struct list_head entry;
 };
 
+enum fuse_dax_mode {
+	FUSE_DAX_INODE,
+	FUSE_DAX_ALWAYS,
+	FUSE_DAX_NEVER,
+};
+
 struct fuse_fs_context {
 	int fd;
 	unsigned int rootmode;
@@ -503,7 +509,7 @@ struct fuse_fs_context {
 	bool no_control:1;
 	bool no_force_umount:1;
 	bool legacy_opts_show:1;
-	bool dax:1;
+	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
 	const char *subtype;
@@ -801,6 +807,9 @@ struct fuse_conn {
 	struct list_head devices;
 
 #ifdef CONFIG_FUSE_DAX
+	/* dax mode: FUSE_DAX_* (always, never or per-file) */
+	enum fuse_dax_mode dax_mode;
+
 	/* Dax specific conn data, non-NULL if DAX is enabled */
 	struct fuse_conn_dax *dax;
 #endif
@@ -1242,7 +1251,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to);
 ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
 int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
+			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b9beb39a4a18..0bc0d8af81e1 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -690,8 +690,12 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 			seq_printf(m, ",blksize=%lu", sb->s_blocksize);
 	}
 #ifdef CONFIG_FUSE_DAX
-	if (fc->dax)
-		seq_puts(m, ",dax");
+	if (fc->dax_mode == FUSE_DAX_ALWAYS)
+		seq_puts(m, ",dax=always");
+	else if (fc->dax_mode == FUSE_DAX_NEVER)
+		seq_puts(m, ",dax=never");
+	else if (fc->dax_mode == FUSE_DAX_INODE)
+		seq_puts(m, ",dax=inode");
 #endif
 
 	return 0;
@@ -1434,7 +1438,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	sb->s_subtype = ctx->subtype;
 	ctx->subtype = NULL;
 	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
-		err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
+		err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
 		if (err)
 			goto err;
 	}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 6a3a23320edc..7dbf5502c57e 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -88,12 +88,21 @@ struct virtio_fs_req_work {
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
+static const struct constant_table dax_param_enums[] = {
+	{"inode",	FUSE_DAX_INODE },
+	{"always",	FUSE_DAX_ALWAYS },
+	{"never",	FUSE_DAX_NEVER },
+	{}
+};
+
 enum {
 	OPT_DAX,
+	OPT_DAX_ENUM,
 };
 
 static const struct fs_parameter_spec virtio_fs_parameters[] = {
 	fsparam_flag("dax", OPT_DAX),
+	fsparam_enum("dax", OPT_DAX_ENUM, dax_param_enums),
 	{}
 };
 
@@ -110,7 +119,10 @@ static int virtio_fs_parse_param(struct fs_context *fc,
 
 	switch (opt) {
 	case OPT_DAX:
-		ctx->dax = 1;
+		ctx->dax_mode = FUSE_DAX_ALWAYS;
+		break;
+	case OPT_DAX_ENUM:
+		ctx->dax_mode = result.uint_32;
 		break;
 	default:
 		return -EINVAL;
@@ -1323,7 +1335,7 @@ static int virtio_fs_fill_super(struct super_block *sb, struct fs_context *fsc)
 
 	/* virtiofs allocates and installs its own fuse devices */
 	ctx->fudptr = NULL;
-	if (ctx->dax) {
+	if (ctx->dax_mode != FUSE_DAX_NEVER) {
 		if (!fs->dax_dev) {
 			err = -EINVAL;
 			pr_err("virtio-fs: dax can't be enabled as filesystem"
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 3/8] fuse: support per-file DAX
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:22   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

Expand the fuse protocol to support per-file DAX.

FUSE_PERFILE_DAX flag is added indicating if fuse server/client
supporting per-file DAX when sending or replying FUSE_INIT request.

Besides, FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled
for corresponding file when replying FUSE_LOOKUP request.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/uapi/linux/fuse.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 36ed092227fa..15a1f5fc0797 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -184,6 +184,9 @@
  *
  *  7.34
  *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -219,7 +222,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 34
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -336,6 +339,7 @@ struct fuse_file_lock {
  *			write/truncate sgid is killed only if file has group
  *			execute permission. (Same as Linux VFS behavior).
  * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
+ * FUSE_PERFILE_DAX:	kernel supports per-file DAX
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -367,6 +371,7 @@ struct fuse_file_lock {
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
 #define FUSE_SETXATTR_EXT	(1 << 29)
+#define FUSE_PERFILE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -449,8 +454,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 3/8] fuse: support per-file DAX
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

Expand the fuse protocol to support per-file DAX.

FUSE_PERFILE_DAX flag is added indicating if fuse server/client
supporting per-file DAX when sending or replying FUSE_INIT request.

Besides, FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled
for corresponding file when replying FUSE_LOOKUP request.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/uapi/linux/fuse.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 36ed092227fa..15a1f5fc0797 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -184,6 +184,9 @@
  *
  *  7.34
  *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -219,7 +222,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 34
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -336,6 +339,7 @@ struct fuse_file_lock {
  *			write/truncate sgid is killed only if file has group
  *			execute permission. (Same as Linux VFS behavior).
  * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
+ * FUSE_PERFILE_DAX:	kernel supports per-file DAX
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -367,6 +371,7 @@ struct fuse_file_lock {
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
 #define FUSE_SETXATTR_EXT	(1 << 29)
+#define FUSE_PERFILE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -449,8 +454,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 3/8] fuse: support per-file DAX
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

Expand the fuse protocol to support per-file DAX.

FUSE_PERFILE_DAX flag is added indicating if fuse server/client
supporting per-file DAX when sending or replying FUSE_INIT request.

Besides, FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled
for corresponding file when replying FUSE_LOOKUP request.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/uapi/linux/fuse.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 36ed092227fa..15a1f5fc0797 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -184,6 +184,9 @@
  *
  *  7.34
  *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -219,7 +222,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 34
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -336,6 +339,7 @@ struct fuse_file_lock {
  *			write/truncate sgid is killed only if file has group
  *			execute permission. (Same as Linux VFS behavior).
  * FUSE_SETXATTR_EXT:	Server supports extended struct fuse_setxattr_in
+ * FUSE_PERFILE_DAX:	kernel supports per-file DAX
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -367,6 +371,7 @@ struct fuse_file_lock {
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
 #define FUSE_SETXATTR_EXT	(1 << 29)
+#define FUSE_PERFILE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -449,8 +454,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 4/8] fuse: negotiate if server/client supports per-file DAX
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:22   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

Among the FUSE_INIT phase, server/client shall negotiate if supporting
per-file DAX.

Requirements for server:
- capable of handling SETFLAGS/FSSETXATTR ioctl and storing
  FS_DAX_FL/FS_XFLAG_DAX persistently.
- set FUSE_ATTR_DAX if the file capable of per-file DAX when replying
  FUSE_LOOKUP request accordingly.

Requirements for client:
- capable of handling per-file DAX when receiving FUSE_ATTR_DAX.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/inode.c  | 12 ++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index a23dd8d0c181..0b21e76a379a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -770,6 +770,9 @@ struct fuse_conn {
 	/* Propagate syncfs() to server */
 	unsigned int sync_fs:1;
 
+	/* Does the filesystem support per-file DAX? */
+	unsigned int perfile_dax:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0bc0d8af81e1..9d302079281c 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1087,10 +1087,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 					min_t(unsigned int, fc->max_pages_limit,
 					max_t(unsigned int, arg->max_pages, 1));
 			}
-			if (IS_ENABLED(CONFIG_FUSE_DAX) &&
-			    arg->flags & FUSE_MAP_ALIGNMENT &&
-			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
-				ok = false;
+			if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+				if (arg->flags & FUSE_MAP_ALIGNMENT &&
+				    !fuse_dax_check_alignment(fc, arg->map_alignment))
+					ok = false;
+				if (arg->flags & FUSE_PERFILE_DAX)
+					fc->perfile_dax = 1;
 			}
 			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
 				fc->handle_killpriv_v2 = 1;
@@ -1145,6 +1147,8 @@ void fuse_send_init(struct fuse_mount *fm)
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		ia->in.flags |= FUSE_MAP_ALIGNMENT;
+	if (fm->fc->dax_mode == FUSE_DAX_INODE)
+		ia->in.flags |= FUSE_PERFILE_DAX;
 #endif
 	if (fm->fc->auto_submounts)
 		ia->in.flags |= FUSE_SUBMOUNTS;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 4/8] fuse: negotiate if server/client supports per-file DAX
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

Among the FUSE_INIT phase, server/client shall negotiate if supporting
per-file DAX.

Requirements for server:
- capable of handling SETFLAGS/FSSETXATTR ioctl and storing
  FS_DAX_FL/FS_XFLAG_DAX persistently.
- set FUSE_ATTR_DAX if the file capable of per-file DAX when replying
  FUSE_LOOKUP request accordingly.

Requirements for client:
- capable of handling per-file DAX when receiving FUSE_ATTR_DAX.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/inode.c  | 12 ++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index a23dd8d0c181..0b21e76a379a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -770,6 +770,9 @@ struct fuse_conn {
 	/* Propagate syncfs() to server */
 	unsigned int sync_fs:1;
 
+	/* Does the filesystem support per-file DAX? */
+	unsigned int perfile_dax:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0bc0d8af81e1..9d302079281c 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1087,10 +1087,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 					min_t(unsigned int, fc->max_pages_limit,
 					max_t(unsigned int, arg->max_pages, 1));
 			}
-			if (IS_ENABLED(CONFIG_FUSE_DAX) &&
-			    arg->flags & FUSE_MAP_ALIGNMENT &&
-			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
-				ok = false;
+			if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+				if (arg->flags & FUSE_MAP_ALIGNMENT &&
+				    !fuse_dax_check_alignment(fc, arg->map_alignment))
+					ok = false;
+				if (arg->flags & FUSE_PERFILE_DAX)
+					fc->perfile_dax = 1;
 			}
 			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
 				fc->handle_killpriv_v2 = 1;
@@ -1145,6 +1147,8 @@ void fuse_send_init(struct fuse_mount *fm)
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		ia->in.flags |= FUSE_MAP_ALIGNMENT;
+	if (fm->fc->dax_mode == FUSE_DAX_INODE)
+		ia->in.flags |= FUSE_PERFILE_DAX;
 #endif
 	if (fm->fc->auto_submounts)
 		ia->in.flags |= FUSE_SUBMOUNTS;
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 4/8] fuse: negotiate if server/client supports per-file DAX
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

Among the FUSE_INIT phase, server/client shall negotiate if supporting
per-file DAX.

Requirements for server:
- capable of handling SETFLAGS/FSSETXATTR ioctl and storing
  FS_DAX_FL/FS_XFLAG_DAX persistently.
- set FUSE_ATTR_DAX if the file capable of per-file DAX when replying
  FUSE_LOOKUP request accordingly.

Requirements for client:
- capable of handling per-file DAX when receiving FUSE_ATTR_DAX.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/inode.c  | 12 ++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index a23dd8d0c181..0b21e76a379a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -770,6 +770,9 @@ struct fuse_conn {
 	/* Propagate syncfs() to server */
 	unsigned int sync_fs:1;
 
+	/* Does the filesystem support per-file DAX? */
+	unsigned int perfile_dax:1;
+
 	/** The number of requests waiting for completion */
 	atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0bc0d8af81e1..9d302079281c 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1087,10 +1087,12 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 					min_t(unsigned int, fc->max_pages_limit,
 					max_t(unsigned int, arg->max_pages, 1));
 			}
-			if (IS_ENABLED(CONFIG_FUSE_DAX) &&
-			    arg->flags & FUSE_MAP_ALIGNMENT &&
-			    !fuse_dax_check_alignment(fc, arg->map_alignment)) {
-				ok = false;
+			if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+				if (arg->flags & FUSE_MAP_ALIGNMENT &&
+				    !fuse_dax_check_alignment(fc, arg->map_alignment))
+					ok = false;
+				if (arg->flags & FUSE_PERFILE_DAX)
+					fc->perfile_dax = 1;
 			}
 			if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
 				fc->handle_killpriv_v2 = 1;
@@ -1145,6 +1147,8 @@ void fuse_send_init(struct fuse_mount *fm)
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		ia->in.flags |= FUSE_MAP_ALIGNMENT;
+	if (fm->fc->dax_mode == FUSE_DAX_INODE)
+		ia->in.flags |= FUSE_PERFILE_DAX;
 #endif
 	if (fm->fc->auto_submounts)
 		ia->in.flags |= FUSE_SUBMOUNTS;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 5/8] fuse: enable per-file DAX
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:22   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

Enable per-file DAX if fuse server advertises that the file supports
that.

Currently the state whether the file enables DAX or not is initialized
only when inode is instantiated.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 12 ++++++++----
 fs/fuse/file.c   |  4 ++--
 fs/fuse/fuse_i.h |  4 ++--
 fs/fuse/inode.c  |  2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index fe4e9593a590..30833f8d37dd 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1339,7 +1339,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-static bool fuse_should_enable_dax(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	unsigned int dax_mode = fc->dax_mode;
@@ -1348,12 +1348,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
 	if (dax_mode == FUSE_DAX_NEVER)
 		return false;
 
-	return true;
+	if (dax_mode == FUSE_DAX_ALWAYS)
+		return true;
+
+	WARN_ON_ONCE(dax_mode != FUSE_DAX_INODE);
+	return fc->perfile_dax && (flags & FUSE_ATTR_DAX);
 }
 
-void fuse_dax_inode_init(struct inode *inode)
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 {
-	if (!fuse_should_enable_dax(inode))
+	if (!fuse_should_enable_dax(inode, flags))
 		return;
 
 	inode->i_flags |= S_DAX;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index ec48bc7ef0a5..1231128f8dd6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3148,7 +3148,7 @@ static const struct address_space_operations fuse_file_aops  = {
 	.write_end	= fuse_write_end,
 };
 
-void fuse_init_file_inode(struct inode *inode)
+void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 
@@ -3162,5 +3162,5 @@ void fuse_init_file_inode(struct inode *inode)
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
-		fuse_dax_inode_init(inode);
+		fuse_dax_inode_init(inode, flags);
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0b21e76a379a..7b7b4c208af2 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1006,7 +1006,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 /**
  * Initialize file operations on a regular file
  */
-void fuse_init_file_inode(struct inode *inode);
+void fuse_init_file_inode(struct inode *inode, unsigned int flags);
 
 /**
  * Initialize inode operations on regular files and special files
@@ -1258,7 +1258,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
 			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
-void fuse_dax_inode_init(struct inode *inode);
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9d302079281c..8080f78befed 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -281,7 +281,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	inode->i_ctime.tv_nsec = attr->ctimensec;
 	if (S_ISREG(inode->i_mode)) {
 		fuse_init_common(inode);
-		fuse_init_file_inode(inode);
+		fuse_init_file_inode(inode, attr->flags);
 	} else if (S_ISDIR(inode->i_mode))
 		fuse_init_dir(inode);
 	else if (S_ISLNK(inode->i_mode))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 5/8] fuse: enable per-file DAX
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

Enable per-file DAX if fuse server advertises that the file supports
that.

Currently the state whether the file enables DAX or not is initialized
only when inode is instantiated.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 12 ++++++++----
 fs/fuse/file.c   |  4 ++--
 fs/fuse/fuse_i.h |  4 ++--
 fs/fuse/inode.c  |  2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index fe4e9593a590..30833f8d37dd 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1339,7 +1339,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-static bool fuse_should_enable_dax(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	unsigned int dax_mode = fc->dax_mode;
@@ -1348,12 +1348,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
 	if (dax_mode == FUSE_DAX_NEVER)
 		return false;
 
-	return true;
+	if (dax_mode == FUSE_DAX_ALWAYS)
+		return true;
+
+	WARN_ON_ONCE(dax_mode != FUSE_DAX_INODE);
+	return fc->perfile_dax && (flags & FUSE_ATTR_DAX);
 }
 
-void fuse_dax_inode_init(struct inode *inode)
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 {
-	if (!fuse_should_enable_dax(inode))
+	if (!fuse_should_enable_dax(inode, flags))
 		return;
 
 	inode->i_flags |= S_DAX;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index ec48bc7ef0a5..1231128f8dd6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3148,7 +3148,7 @@ static const struct address_space_operations fuse_file_aops  = {
 	.write_end	= fuse_write_end,
 };
 
-void fuse_init_file_inode(struct inode *inode)
+void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 
@@ -3162,5 +3162,5 @@ void fuse_init_file_inode(struct inode *inode)
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
-		fuse_dax_inode_init(inode);
+		fuse_dax_inode_init(inode, flags);
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0b21e76a379a..7b7b4c208af2 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1006,7 +1006,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 /**
  * Initialize file operations on a regular file
  */
-void fuse_init_file_inode(struct inode *inode);
+void fuse_init_file_inode(struct inode *inode, unsigned int flags);
 
 /**
  * Initialize inode operations on regular files and special files
@@ -1258,7 +1258,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
 			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
-void fuse_dax_inode_init(struct inode *inode);
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9d302079281c..8080f78befed 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -281,7 +281,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	inode->i_ctime.tv_nsec = attr->ctimensec;
 	if (S_ISREG(inode->i_mode)) {
 		fuse_init_common(inode);
-		fuse_init_file_inode(inode);
+		fuse_init_file_inode(inode, attr->flags);
 	} else if (S_ISDIR(inode->i_mode))
 		fuse_init_dir(inode);
 	else if (S_ISLNK(inode->i_mode))
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 5/8] fuse: enable per-file DAX
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

Enable per-file DAX if fuse server advertises that the file supports
that.

Currently the state whether the file enables DAX or not is initialized
only when inode is instantiated.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 12 ++++++++----
 fs/fuse/file.c   |  4 ++--
 fs/fuse/fuse_i.h |  4 ++--
 fs/fuse/inode.c  |  2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index fe4e9593a590..30833f8d37dd 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1339,7 +1339,7 @@ static const struct address_space_operations fuse_dax_file_aops  = {
 	.invalidatepage	= noop_invalidatepage,
 };
 
-static bool fuse_should_enable_dax(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
 	unsigned int dax_mode = fc->dax_mode;
@@ -1348,12 +1348,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
 	if (dax_mode == FUSE_DAX_NEVER)
 		return false;
 
-	return true;
+	if (dax_mode == FUSE_DAX_ALWAYS)
+		return true;
+
+	WARN_ON_ONCE(dax_mode != FUSE_DAX_INODE);
+	return fc->perfile_dax && (flags & FUSE_ATTR_DAX);
 }
 
-void fuse_dax_inode_init(struct inode *inode)
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 {
-	if (!fuse_should_enable_dax(inode))
+	if (!fuse_should_enable_dax(inode, flags))
 		return;
 
 	inode->i_flags |= S_DAX;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index ec48bc7ef0a5..1231128f8dd6 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3148,7 +3148,7 @@ static const struct address_space_operations fuse_file_aops  = {
 	.write_end	= fuse_write_end,
 };
 
-void fuse_init_file_inode(struct inode *inode)
+void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 
@@ -3162,5 +3162,5 @@ void fuse_init_file_inode(struct inode *inode)
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
-		fuse_dax_inode_init(inode);
+		fuse_dax_inode_init(inode, flags);
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0b21e76a379a..7b7b4c208af2 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1006,7 +1006,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 /**
  * Initialize file operations on a regular file
  */
-void fuse_init_file_inode(struct inode *inode);
+void fuse_init_file_inode(struct inode *inode, unsigned int flags);
 
 /**
  * Initialize inode operations on regular files and special files
@@ -1258,7 +1258,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
 			struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
-void fuse_dax_inode_init(struct inode *inode);
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9d302079281c..8080f78befed 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -281,7 +281,7 @@ static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
 	inode->i_ctime.tv_nsec = attr->ctimensec;
 	if (S_ISREG(inode->i_mode)) {
 		fuse_init_common(inode);
-		fuse_init_file_inode(inode);
+		fuse_init_file_inode(inode, attr->flags);
 	} else if (S_ISDIR(inode->i_mode))
 		fuse_init_dir(inode);
 	else if (S_ISLNK(inode->i_mode))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:22   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

When the per-file DAX indication changes while the file is still
*opened*, it is quite complicated and maybe fragile to dynamically
change the DAX state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX indication changes, so that the inode instance will be
evicted and freed as soon as possible once the file is closed and the
last reference to the inode is put. And then when the file gets reopened
next time, the inode will reflect the new DAX state.

In summary, when the per-file DAX indication changes for an *opened*
file, the state of the file won't be updated until this file is closed
and reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 9 +++++++++
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 30833f8d37dd..f7ede0be4e00 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 	inode->i_data.a_ops = &fuse_dax_file_aops;
 }
 
+void fuse_dax_dontcache(struct inode *inode, bool newdax)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
+	if (fc->dax_mode == FUSE_DAX_INODE &&
+	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
+		d_mark_dontcache(inode);
+}
+
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
 {
 	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7b7b4c208af2..56fe1c4d2136 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_dontcache(struct inode *inode, bool newdax);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8080f78befed..8c9774c6a210 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 		if (inval)
 			invalidate_inode_pages2(inode->i_mapping);
 	}
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX))
+		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

When the per-file DAX indication changes while the file is still
*opened*, it is quite complicated and maybe fragile to dynamically
change the DAX state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX indication changes, so that the inode instance will be
evicted and freed as soon as possible once the file is closed and the
last reference to the inode is put. And then when the file gets reopened
next time, the inode will reflect the new DAX state.

In summary, when the per-file DAX indication changes for an *opened*
file, the state of the file won't be updated until this file is closed
and reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 9 +++++++++
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 30833f8d37dd..f7ede0be4e00 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 	inode->i_data.a_ops = &fuse_dax_file_aops;
 }
 
+void fuse_dax_dontcache(struct inode *inode, bool newdax)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
+	if (fc->dax_mode == FUSE_DAX_INODE &&
+	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
+		d_mark_dontcache(inode);
+}
+
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
 {
 	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7b7b4c208af2..56fe1c4d2136 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_dontcache(struct inode *inode, bool newdax);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8080f78befed..8c9774c6a210 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 		if (inval)
 			invalidate_inode_pages2(inode->i_mapping);
 	}
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX))
+		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

When the per-file DAX indication changes while the file is still
*opened*, it is quite complicated and maybe fragile to dynamically
change the DAX state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX indication changes, so that the inode instance will be
evicted and freed as soon as possible once the file is closed and the
last reference to the inode is put. And then when the file gets reopened
next time, the inode will reflect the new DAX state.

In summary, when the per-file DAX indication changes for an *opened*
file, the state of the file won't be updated until this file is closed
and reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/dax.c    | 9 +++++++++
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 30833f8d37dd..f7ede0be4e00 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 	inode->i_data.a_ops = &fuse_dax_file_aops;
 }
 
+void fuse_dax_dontcache(struct inode *inode, bool newdax)
+{
+	struct fuse_conn *fc = get_fuse_conn(inode);
+
+	if (fc->dax_mode == FUSE_DAX_INODE &&
+	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
+		d_mark_dontcache(inode);
+}
+
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
 {
 	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7b7b4c208af2..56fe1c4d2136 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_dontcache(struct inode *inode, bool newdax);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8080f78befed..8c9774c6a210 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 		if (inval)
 			invalidate_inode_pages2(inode->i_mapping);
 	}
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX))
+		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 7/8] fuse: support changing per-file DAX flag inside guest
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:22   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

Fuse client can enable or disable per-file DAX inside kernel/guest by
chattr(1). Similarly the new state won't be updated until the file is
closed and reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/ioctl.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index 546ea3d58fb4..a9ed53c5dbd1 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -469,8 +469,6 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 	if (fa->flags_valid) {
 		err = fuse_priv_ioctl(inode, ff, FS_IOC_SETFLAGS,
 				      &flags, sizeof(flags));
-		if (err)
-			goto cleanup;
 	} else {
 		memset(&xfa, 0, sizeof(xfa));
 		xfa.fsx_xflags = fa->fsx_xflags;
@@ -483,6 +481,19 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 				      &xfa, sizeof(xfa));
 	}
 
+	if (err)
+		goto cleanup;
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+		bool newdax;
+
+		if (fa->flags_valid)
+			newdax = flags & FS_DAX_FL;
+		else
+			newdax = fa->fsx_xflags & FS_XFLAG_DAX;
+		fuse_dax_dontcache(inode, newdax);
+	}
+
 cleanup:
 	fuse_priv_ioctl_cleanup(inode, ff);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 7/8] fuse: support changing per-file DAX flag inside guest
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

Fuse client can enable or disable per-file DAX inside kernel/guest by
chattr(1). Similarly the new state won't be updated until the file is
closed and reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/ioctl.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index 546ea3d58fb4..a9ed53c5dbd1 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -469,8 +469,6 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 	if (fa->flags_valid) {
 		err = fuse_priv_ioctl(inode, ff, FS_IOC_SETFLAGS,
 				      &flags, sizeof(flags));
-		if (err)
-			goto cleanup;
 	} else {
 		memset(&xfa, 0, sizeof(xfa));
 		xfa.fsx_xflags = fa->fsx_xflags;
@@ -483,6 +481,19 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 				      &xfa, sizeof(xfa));
 	}
 
+	if (err)
+		goto cleanup;
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+		bool newdax;
+
+		if (fa->flags_valid)
+			newdax = flags & FS_DAX_FL;
+		else
+			newdax = fa->fsx_xflags & FS_XFLAG_DAX;
+		fuse_dax_dontcache(inode, newdax);
+	}
+
 cleanup:
 	fuse_priv_ioctl_cleanup(inode, ff);
 
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 7/8] fuse: support changing per-file DAX flag inside guest
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

Fuse client can enable or disable per-file DAX inside kernel/guest by
chattr(1). Similarly the new state won't be updated until the file is
closed and reopened later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/ioctl.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index 546ea3d58fb4..a9ed53c5dbd1 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -469,8 +469,6 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 	if (fa->flags_valid) {
 		err = fuse_priv_ioctl(inode, ff, FS_IOC_SETFLAGS,
 				      &flags, sizeof(flags));
-		if (err)
-			goto cleanup;
 	} else {
 		memset(&xfa, 0, sizeof(xfa));
 		xfa.fsx_xflags = fa->fsx_xflags;
@@ -483,6 +481,19 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
 				      &xfa, sizeof(xfa));
 	}
 
+	if (err)
+		goto cleanup;
+
+	if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+		bool newdax;
+
+		if (fa->flags_valid)
+			newdax = flags & FS_DAX_FL;
+		else
+			newdax = fa->fsx_xflags & FS_XFLAG_DAX;
+		fuse_dax_dontcache(inode, newdax);
+	}
+
 cleanup:
 	fuse_priv_ioctl_cleanup(inode, ff);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 8/8] fuse: show '-o dax=inode' option only when FUSE server supports
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:22   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

Prior of this patch, the mount option will still show '-o dax=inode'
when FUSE server advertises that it doesn't support per-file DAX.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8c9774c6a210..7f09a964823f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -697,7 +697,7 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 		seq_puts(m, ",dax=always");
 	else if (fc->dax_mode == FUSE_DAX_NEVER)
 		seq_puts(m, ",dax=never");
-	else if (fc->dax_mode == FUSE_DAX_INODE)
+	else if ((fc->dax_mode == FUSE_DAX_INODE) && fc->perfile_dax)
 		seq_puts(m, ",dax=inode");
 #endif
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [PATCH v4 8/8] fuse: show '-o dax=inode' option only when FUSE server supports
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

Prior of this patch, the mount option will still show '-o dax=inode'
when FUSE server advertises that it doesn't support per-file DAX.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8c9774c6a210..7f09a964823f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -697,7 +697,7 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 		seq_puts(m, ",dax=always");
 	else if (fc->dax_mode == FUSE_DAX_NEVER)
 		seq_puts(m, ",dax=never");
-	else if (fc->dax_mode == FUSE_DAX_INODE)
+	else if ((fc->dax_mode == FUSE_DAX_INODE) && fc->perfile_dax)
 		seq_puts(m, ",dax=inode");
 #endif
 
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [PATCH v4 8/8] fuse: show '-o dax=inode' option only when FUSE server supports
@ 2021-08-17  2:22   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:22 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

Prior of this patch, the mount option will still show '-o dax=inode'
when FUSE server advertises that it doesn't support per-file DAX.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 fs/fuse/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 8c9774c6a210..7f09a964823f 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -697,7 +697,7 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 		seq_puts(m, ",dax=always");
 	else if (fc->dax_mode == FUSE_DAX_NEVER)
 		seq_puts(m, ",dax=never");
-	else if (fc->dax_mode == FUSE_DAX_INODE)
+	else if ((fc->dax_mode == FUSE_DAX_INODE) && fc->perfile_dax)
 		seq_puts(m, ",dax=inode");
 #endif
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 0/4] virtiofsd: support per-file DAX
  2021-08-17  2:22 ` Jeffle Xu
  (?)
@ 2021-08-17  2:23   ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

I mentioned in virtiofsd PATCH v1 that virtiofsd exits once ioctl() is
called. After depper investigation into this issue, I find that it is
because ioctl() is blocked out the whitelist of seccomp of virtiofsd.

To support ioctl, ioctl syscall shall be added into the whitelist (see
patch
1).

And this is the complete workable version for virtiofsd:
- virtiofsd now supports FUSE_IOCTL now, though currently only
  FS_IOC_G[S]ETFLAGS/FS_IOC_FSG[S]ETXATTR are supported.
- During FUSE_INIT, virtiofsd advertise support for per-file DAX only
  when the backend fs is ext4/xfs.
- FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR FUSE_IOCTL will be directed to host,
  so that FS_DAX_FL could be flushed to backed fs persistently.
- During FUSE_LOOKUP, virtiofsd will decide DAX shall be enabled for
  current file according to if this file is marked with FS_DAX_FL in the
  backend fs.


changes since v2/v3:
Patch 4 in v2 is incomplete by mistake and it will fail to be compiled.
I had ever sent a seperate patch 4 of v3. Now I send the whole complete
set in v4. Except for this, there's no other diferrence.

Jeffle Xu (4):
  virtiofsd: add .ioctl() support
  virtiofsd: expand fuse protocol to support per-file DAX
  virtiofsd: support per-file DAX negotiation in FUSE_INIT
  virtiofsd: support per-file DAX in FUSE_LOOKUP

 include/standard-headers/linux/fuse.h |   2 +
 tools/virtiofsd/fuse_common.h         |   5 ++
 tools/virtiofsd/fuse_lowlevel.c       |   6 ++
 tools/virtiofsd/passthrough_ll.c      | 125 ++++++++++++++++++++++++++
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 5 files changed, 139 insertions(+)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 0/4] virtiofsd: support per-file DAX
@ 2021-08-17  2:23   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

I mentioned in virtiofsd PATCH v1 that virtiofsd exits once ioctl() is
called. After depper investigation into this issue, I find that it is
because ioctl() is blocked out the whitelist of seccomp of virtiofsd.

To support ioctl, ioctl syscall shall be added into the whitelist (see
patch
1).

And this is the complete workable version for virtiofsd:
- virtiofsd now supports FUSE_IOCTL now, though currently only
  FS_IOC_G[S]ETFLAGS/FS_IOC_FSG[S]ETXATTR are supported.
- During FUSE_INIT, virtiofsd advertise support for per-file DAX only
  when the backend fs is ext4/xfs.
- FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR FUSE_IOCTL will be directed to host,
  so that FS_DAX_FL could be flushed to backed fs persistently.
- During FUSE_LOOKUP, virtiofsd will decide DAX shall be enabled for
  current file according to if this file is marked with FS_DAX_FL in the
  backend fs.


changes since v2/v3:
Patch 4 in v2 is incomplete by mistake and it will fail to be compiled.
I had ever sent a seperate patch 4 of v3. Now I send the whole complete
set in v4. Except for this, there's no other diferrence.

Jeffle Xu (4):
  virtiofsd: add .ioctl() support
  virtiofsd: expand fuse protocol to support per-file DAX
  virtiofsd: support per-file DAX negotiation in FUSE_INIT
  virtiofsd: support per-file DAX in FUSE_LOOKUP

 include/standard-headers/linux/fuse.h |   2 +
 tools/virtiofsd/fuse_common.h         |   5 ++
 tools/virtiofsd/fuse_lowlevel.c       |   6 ++
 tools/virtiofsd/passthrough_ll.c      | 125 ++++++++++++++++++++++++++
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 5 files changed, 139 insertions(+)

-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [Virtio-fs] [virtiofsd PATCH v4 0/4] virtiofsd: support per-file DAX
@ 2021-08-17  2:23   ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

I mentioned in virtiofsd PATCH v1 that virtiofsd exits once ioctl() is
called. After depper investigation into this issue, I find that it is
because ioctl() is blocked out the whitelist of seccomp of virtiofsd.

To support ioctl, ioctl syscall shall be added into the whitelist (see
patch
1).

And this is the complete workable version for virtiofsd:
- virtiofsd now supports FUSE_IOCTL now, though currently only
  FS_IOC_G[S]ETFLAGS/FS_IOC_FSG[S]ETXATTR are supported.
- During FUSE_INIT, virtiofsd advertise support for per-file DAX only
  when the backend fs is ext4/xfs.
- FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR FUSE_IOCTL will be directed to host,
  so that FS_DAX_FL could be flushed to backed fs persistently.
- During FUSE_LOOKUP, virtiofsd will decide DAX shall be enabled for
  current file according to if this file is marked with FS_DAX_FL in the
  backend fs.


changes since v2/v3:
Patch 4 in v2 is incomplete by mistake and it will fail to be compiled.
I had ever sent a seperate patch 4 of v3. Now I send the whole complete
set in v4. Except for this, there's no other diferrence.

Jeffle Xu (4):
  virtiofsd: add .ioctl() support
  virtiofsd: expand fuse protocol to support per-file DAX
  virtiofsd: support per-file DAX negotiation in FUSE_INIT
  virtiofsd: support per-file DAX in FUSE_LOOKUP

 include/standard-headers/linux/fuse.h |   2 +
 tools/virtiofsd/fuse_common.h         |   5 ++
 tools/virtiofsd/fuse_lowlevel.c       |   6 ++
 tools/virtiofsd/passthrough_ll.c      | 125 ++++++++++++++++++++++++++
 tools/virtiofsd/passthrough_seccomp.c |   1 +
 5 files changed, 139 insertions(+)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 1/4] virtiofsd: add .ioctl() support
  2021-08-17  2:23   ` Jeffle Xu
  (?)
@ 2021-08-17  2:23     ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

Add .ioctl() support for passthrough, in prep for the following support
for following per-file DAX feature.

Once advertising support for per-file DAX feature, virtiofsd should
support storing FS_DAX_FL flag persistently passed by
FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
FUSE_LOOKUP accordingly if the file is capable of per-file DAX.

When it comes to passthrough, it passes corresponding ioctls to host
directly. Currently only these ioctls that are needed for per-file DAX
feature, i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported. Later we can restrict
the flags/attributes allowed to be set to reinforce the security, or
extend the scope of allowed ioctls if it is really needed later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c      | 53 +++++++++++++++++++++++++++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 2 files changed, 54 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b76d878509..e170b17adb 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -54,6 +54,7 @@
 #include <sys/wait.h>
 #include <sys/xattr.h>
 #include <syslog.h>
+#include <linux/fs.h>
 
 #include "qemu/cutils.h"
 #include "passthrough_helpers.h"
@@ -2105,6 +2106,57 @@ out:
     fuse_reply_err(req, saverr);
 }
 
+static void lo_ioctl(fuse_req_t req, fuse_ino_t ino, unsigned int cmd, void *arg,
+                  struct fuse_file_info *fi, unsigned flags, const void *in_buf,
+                  size_t in_bufsz, size_t out_bufsz)
+{
+    int fd = lo_fi_fd(req, fi);
+    int res;
+    int saverr = ENOSYS;
+
+    fuse_log(FUSE_LOG_DEBUG, "lo_ioctl(ino=%" PRIu64 ", cmd=0x%x, flags=0x%x, "
+	     "in_bufsz = %lu, out_bufsz = %lu)\n",
+	     ino, cmd, flags, in_bufsz, out_bufsz);
+
+    /* unrestricted ioctl is not supported yet */
+    if (flags & FUSE_IOCTL_UNRESTRICTED)
+        goto out;
+
+    /*
+     * Currently only those ioctls needed to support per-file DAX feature,
+     * i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
+     * FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported.
+     */
+    if (cmd == FS_IOC_SETFLAGS || cmd == FS_IOC_FSSETXATTR) {
+        res = ioctl(fd, cmd, in_buf);
+        if (res < 0)
+            goto out_err;
+
+	fuse_reply_ioctl(req, 0, NULL, 0);
+    }
+    else if (cmd == FS_IOC_GETFLAGS || cmd == FS_IOC_FSGETXATTR) {
+	/* reused for 'unsigned int' for FS_IOC_GETFLAGS */
+	struct fsxattr attr;
+
+        res = ioctl(fd, cmd, &attr);
+        if (res < 0)
+            goto out_err;
+
+        fuse_reply_ioctl(req, 0, &attr, out_bufsz);
+    }
+    else {
+	fuse_log(FUSE_LOG_DEBUG, "Unsupported ioctl 0x%x\n", cmd);
+	goto out;
+    }
+
+    return;
+
+out_err:
+	saverr = errno;
+out:
+	fuse_reply_err(req, saverr);
+}
+
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
                         struct fuse_file_info *fi)
 {
@@ -3279,6 +3331,7 @@ static struct fuse_lowlevel_ops lo_oper = {
     .create = lo_create,
     .getlk = lo_getlk,
     .setlk = lo_setlk,
+    .ioctl = lo_ioctl,
     .open = lo_open,
     .release = lo_release,
     .flush = lo_flush,
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index 62441cfcdb..2a5f7614fc 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -62,6 +62,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(gettid),
     SCMP_SYS(gettimeofday),
     SCMP_SYS(getxattr),
+    SCMP_SYS(ioctl),
     SCMP_SYS(linkat),
     SCMP_SYS(listxattr),
     SCMP_SYS(lseek),
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 1/4] virtiofsd: add .ioctl() support
@ 2021-08-17  2:23     ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

Add .ioctl() support for passthrough, in prep for the following support
for following per-file DAX feature.

Once advertising support for per-file DAX feature, virtiofsd should
support storing FS_DAX_FL flag persistently passed by
FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
FUSE_LOOKUP accordingly if the file is capable of per-file DAX.

When it comes to passthrough, it passes corresponding ioctls to host
directly. Currently only these ioctls that are needed for per-file DAX
feature, i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported. Later we can restrict
the flags/attributes allowed to be set to reinforce the security, or
extend the scope of allowed ioctls if it is really needed later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c      | 53 +++++++++++++++++++++++++++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 2 files changed, 54 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b76d878509..e170b17adb 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -54,6 +54,7 @@
 #include <sys/wait.h>
 #include <sys/xattr.h>
 #include <syslog.h>
+#include <linux/fs.h>
 
 #include "qemu/cutils.h"
 #include "passthrough_helpers.h"
@@ -2105,6 +2106,57 @@ out:
     fuse_reply_err(req, saverr);
 }
 
+static void lo_ioctl(fuse_req_t req, fuse_ino_t ino, unsigned int cmd, void *arg,
+                  struct fuse_file_info *fi, unsigned flags, const void *in_buf,
+                  size_t in_bufsz, size_t out_bufsz)
+{
+    int fd = lo_fi_fd(req, fi);
+    int res;
+    int saverr = ENOSYS;
+
+    fuse_log(FUSE_LOG_DEBUG, "lo_ioctl(ino=%" PRIu64 ", cmd=0x%x, flags=0x%x, "
+	     "in_bufsz = %lu, out_bufsz = %lu)\n",
+	     ino, cmd, flags, in_bufsz, out_bufsz);
+
+    /* unrestricted ioctl is not supported yet */
+    if (flags & FUSE_IOCTL_UNRESTRICTED)
+        goto out;
+
+    /*
+     * Currently only those ioctls needed to support per-file DAX feature,
+     * i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
+     * FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported.
+     */
+    if (cmd == FS_IOC_SETFLAGS || cmd == FS_IOC_FSSETXATTR) {
+        res = ioctl(fd, cmd, in_buf);
+        if (res < 0)
+            goto out_err;
+
+	fuse_reply_ioctl(req, 0, NULL, 0);
+    }
+    else if (cmd == FS_IOC_GETFLAGS || cmd == FS_IOC_FSGETXATTR) {
+	/* reused for 'unsigned int' for FS_IOC_GETFLAGS */
+	struct fsxattr attr;
+
+        res = ioctl(fd, cmd, &attr);
+        if (res < 0)
+            goto out_err;
+
+        fuse_reply_ioctl(req, 0, &attr, out_bufsz);
+    }
+    else {
+	fuse_log(FUSE_LOG_DEBUG, "Unsupported ioctl 0x%x\n", cmd);
+	goto out;
+    }
+
+    return;
+
+out_err:
+	saverr = errno;
+out:
+	fuse_reply_err(req, saverr);
+}
+
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
                         struct fuse_file_info *fi)
 {
@@ -3279,6 +3331,7 @@ static struct fuse_lowlevel_ops lo_oper = {
     .create = lo_create,
     .getlk = lo_getlk,
     .setlk = lo_setlk,
+    .ioctl = lo_ioctl,
     .open = lo_open,
     .release = lo_release,
     .flush = lo_flush,
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index 62441cfcdb..2a5f7614fc 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -62,6 +62,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(gettid),
     SCMP_SYS(gettimeofday),
     SCMP_SYS(getxattr),
+    SCMP_SYS(ioctl),
     SCMP_SYS(linkat),
     SCMP_SYS(listxattr),
     SCMP_SYS(lseek),
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [virtiofsd PATCH v4 1/4] virtiofsd: add .ioctl() support
@ 2021-08-17  2:23     ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

Add .ioctl() support for passthrough, in prep for the following support
for following per-file DAX feature.

Once advertising support for per-file DAX feature, virtiofsd should
support storing FS_DAX_FL flag persistently passed by
FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
FUSE_LOOKUP accordingly if the file is capable of per-file DAX.

When it comes to passthrough, it passes corresponding ioctls to host
directly. Currently only these ioctls that are needed for per-file DAX
feature, i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported. Later we can restrict
the flags/attributes allowed to be set to reinforce the security, or
extend the scope of allowed ioctls if it is really needed later.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c      | 53 +++++++++++++++++++++++++++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 2 files changed, 54 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b76d878509..e170b17adb 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -54,6 +54,7 @@
 #include <sys/wait.h>
 #include <sys/xattr.h>
 #include <syslog.h>
+#include <linux/fs.h>
 
 #include "qemu/cutils.h"
 #include "passthrough_helpers.h"
@@ -2105,6 +2106,57 @@ out:
     fuse_reply_err(req, saverr);
 }
 
+static void lo_ioctl(fuse_req_t req, fuse_ino_t ino, unsigned int cmd, void *arg,
+                  struct fuse_file_info *fi, unsigned flags, const void *in_buf,
+                  size_t in_bufsz, size_t out_bufsz)
+{
+    int fd = lo_fi_fd(req, fi);
+    int res;
+    int saverr = ENOSYS;
+
+    fuse_log(FUSE_LOG_DEBUG, "lo_ioctl(ino=%" PRIu64 ", cmd=0x%x, flags=0x%x, "
+	     "in_bufsz = %lu, out_bufsz = %lu)\n",
+	     ino, cmd, flags, in_bufsz, out_bufsz);
+
+    /* unrestricted ioctl is not supported yet */
+    if (flags & FUSE_IOCTL_UNRESTRICTED)
+        goto out;
+
+    /*
+     * Currently only those ioctls needed to support per-file DAX feature,
+     * i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
+     * FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported.
+     */
+    if (cmd == FS_IOC_SETFLAGS || cmd == FS_IOC_FSSETXATTR) {
+        res = ioctl(fd, cmd, in_buf);
+        if (res < 0)
+            goto out_err;
+
+	fuse_reply_ioctl(req, 0, NULL, 0);
+    }
+    else if (cmd == FS_IOC_GETFLAGS || cmd == FS_IOC_FSGETXATTR) {
+	/* reused for 'unsigned int' for FS_IOC_GETFLAGS */
+	struct fsxattr attr;
+
+        res = ioctl(fd, cmd, &attr);
+        if (res < 0)
+            goto out_err;
+
+        fuse_reply_ioctl(req, 0, &attr, out_bufsz);
+    }
+    else {
+	fuse_log(FUSE_LOG_DEBUG, "Unsupported ioctl 0x%x\n", cmd);
+	goto out;
+    }
+
+    return;
+
+out_err:
+	saverr = errno;
+out:
+	fuse_reply_err(req, saverr);
+}
+
 static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
                         struct fuse_file_info *fi)
 {
@@ -3279,6 +3331,7 @@ static struct fuse_lowlevel_ops lo_oper = {
     .create = lo_create,
     .getlk = lo_getlk,
     .setlk = lo_setlk,
+    .ioctl = lo_ioctl,
     .open = lo_open,
     .release = lo_release,
     .flush = lo_flush,
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index 62441cfcdb..2a5f7614fc 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -62,6 +62,7 @@ static const int syscall_allowlist[] = {
     SCMP_SYS(gettid),
     SCMP_SYS(gettimeofday),
     SCMP_SYS(getxattr),
+    SCMP_SYS(ioctl),
     SCMP_SYS(linkat),
     SCMP_SYS(listxattr),
     SCMP_SYS(lseek),
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 2/4] virtiofsd: expand fuse protocol to support per-file DAX
  2021-08-17  2:23   ` Jeffle Xu
  (?)
@ 2021-08-17  2:23     ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/standard-headers/linux/fuse.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index 950d7edb7e..7bd006ffcb 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -356,6 +356,7 @@ struct fuse_file_lock {
 #define FUSE_MAP_ALIGNMENT	(1 << 26)
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
+#define FUSE_PERFILE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -440,6 +441,7 @@ struct fuse_file_lock {
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 2/4] virtiofsd: expand fuse protocol to support per-file DAX
@ 2021-08-17  2:23     ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/standard-headers/linux/fuse.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index 950d7edb7e..7bd006ffcb 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -356,6 +356,7 @@ struct fuse_file_lock {
 #define FUSE_MAP_ALIGNMENT	(1 << 26)
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
+#define FUSE_PERFILE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -440,6 +441,7 @@ struct fuse_file_lock {
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [virtiofsd PATCH v4 2/4] virtiofsd: expand fuse protocol to support per-file DAX
@ 2021-08-17  2:23     ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 include/standard-headers/linux/fuse.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h
index 950d7edb7e..7bd006ffcb 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -356,6 +356,7 @@ struct fuse_file_lock {
 #define FUSE_MAP_ALIGNMENT	(1 << 26)
 #define FUSE_SUBMOUNTS		(1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2	(1 << 28)
+#define FUSE_PERFILE_DAX	(1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -440,6 +441,7 @@ struct fuse_file_lock {
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
  */
 #define FUSE_ATTR_SUBMOUNT      (1 << 0)
+#define FUSE_ATTR_DAX		(1 << 1)
 
 /**
  * Open flags
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
  2021-08-17  2:23   ` Jeffle Xu
  (?)
@ 2021-08-17  2:23     ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

In FUSE_INIT negotiating phase, server/client should advertise if it
supports per-file DAX.

Once advertising support for per-file DAX feature, virtiofsd should
support storing FS_DAX_FL flag persistently passed by
FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
FUSE_LOOKUP accordingly if the file is capable of per-file DAX.

Currently only ext4/xfs since linux kernel v5.8 support storing
FS_DAX_FL flag persistently, and thus advertise support for per-file
DAX feature only when the backend fs type is ext4 and xfs.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/fuse_common.h    |  5 +++++
 tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
 tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 8a75729be9..ee6fc64c23 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -372,6 +372,11 @@ struct fuse_file_info {
  */
 #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
 
+/**
+ * Indicates support for per-file DAX.
+ */
+#define FUSE_CAP_PERFILE_DAX (1 << 29)
+
 /**
  * Ioctl flags
  *
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 50fc5c8d5a..04a4f17423 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
         se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
     }
+    if (arg->flags & FUSE_PERFILE_DAX) {
+        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
+    }
 #ifdef HAVE_SPLICE
 #ifdef HAVE_VMSPLICE
     se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
@@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     if (se->conn.want & FUSE_CAP_POSIX_ACL) {
         outarg.flags |= FUSE_POSIX_ACL;
     }
+    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
+        outarg.flags |= FUSE_PERFILE_DAX;
+    }
     outarg.max_readahead = se->conn.max_readahead;
     outarg.max_write = se->conn.max_write;
     if (se->conn.max_background >= (1 << 16)) {
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e170b17adb..5b6228210f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -53,8 +53,10 @@
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <sys/xattr.h>
+#include <sys/vfs.h>
 #include <syslog.h>
 #include <linux/fs.h>
+#include <linux/magic.h>
 
 #include "qemu/cutils.h"
 #include "passthrough_helpers.h"
@@ -136,6 +138,13 @@ enum {
     SANDBOX_CHROOT,
 };
 
+/* capability of storing DAX flag persistently */
+enum {
+    DAX_CAP_NONE,  /* not supported */
+    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
+    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
+};
+
 typedef struct xattr_map_entry {
     char *key;
     char *prepend;
@@ -161,6 +170,7 @@ struct lo_data {
     int readdirplus_clear;
     int allow_direct_io;
     int announce_submounts;
+    int perfile_dax_cap; /* capability of backend fs */
     bool use_statx;
     struct lo_inode root;
     GHashTable *inodes; /* protected by lo->mutex */
@@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
         conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
         lo->killpriv_v2 = 0;
     }
+
+    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
+        conn->want |= FUSE_CAP_PERFILE_DAX;
+    }
 }
 
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
@@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
     int fd, res;
     struct stat stat;
     uint64_t mnt_id;
+    struct statfs statfs;
 
     fd = open("/", O_PATH);
     if (fd == -1) {
@@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
         root->posix_locks = g_hash_table_new_full(
             g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
     }
+
+    /*
+     * Currently only ext4/xfs since linux kernel v5.8 support storing
+     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
+     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
+     * FS_IOC_FSG[S]ETXATTR ioctl.
+     */
+    res = fstatfs(fd, &statfs);
+    if (!res) {
+	if (statfs.f_type == EXT4_SUPER_MAGIC)
+	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
+	else if (statfs.f_type == XFS_SUPER_MAGIC)
+	    lo->perfile_dax_cap = DAX_CAP_XATTR;
+    }
 }
 
 static guint lo_key_hash(gconstpointer key)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-17  2:23     ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

In FUSE_INIT negotiating phase, server/client should advertise if it
supports per-file DAX.

Once advertising support for per-file DAX feature, virtiofsd should
support storing FS_DAX_FL flag persistently passed by
FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
FUSE_LOOKUP accordingly if the file is capable of per-file DAX.

Currently only ext4/xfs since linux kernel v5.8 support storing
FS_DAX_FL flag persistently, and thus advertise support for per-file
DAX feature only when the backend fs type is ext4 and xfs.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/fuse_common.h    |  5 +++++
 tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
 tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 8a75729be9..ee6fc64c23 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -372,6 +372,11 @@ struct fuse_file_info {
  */
 #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
 
+/**
+ * Indicates support for per-file DAX.
+ */
+#define FUSE_CAP_PERFILE_DAX (1 << 29)
+
 /**
  * Ioctl flags
  *
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 50fc5c8d5a..04a4f17423 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
         se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
     }
+    if (arg->flags & FUSE_PERFILE_DAX) {
+        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
+    }
 #ifdef HAVE_SPLICE
 #ifdef HAVE_VMSPLICE
     se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
@@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     if (se->conn.want & FUSE_CAP_POSIX_ACL) {
         outarg.flags |= FUSE_POSIX_ACL;
     }
+    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
+        outarg.flags |= FUSE_PERFILE_DAX;
+    }
     outarg.max_readahead = se->conn.max_readahead;
     outarg.max_write = se->conn.max_write;
     if (se->conn.max_background >= (1 << 16)) {
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e170b17adb..5b6228210f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -53,8 +53,10 @@
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <sys/xattr.h>
+#include <sys/vfs.h>
 #include <syslog.h>
 #include <linux/fs.h>
+#include <linux/magic.h>
 
 #include "qemu/cutils.h"
 #include "passthrough_helpers.h"
@@ -136,6 +138,13 @@ enum {
     SANDBOX_CHROOT,
 };
 
+/* capability of storing DAX flag persistently */
+enum {
+    DAX_CAP_NONE,  /* not supported */
+    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
+    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
+};
+
 typedef struct xattr_map_entry {
     char *key;
     char *prepend;
@@ -161,6 +170,7 @@ struct lo_data {
     int readdirplus_clear;
     int allow_direct_io;
     int announce_submounts;
+    int perfile_dax_cap; /* capability of backend fs */
     bool use_statx;
     struct lo_inode root;
     GHashTable *inodes; /* protected by lo->mutex */
@@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
         conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
         lo->killpriv_v2 = 0;
     }
+
+    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
+        conn->want |= FUSE_CAP_PERFILE_DAX;
+    }
 }
 
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
@@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
     int fd, res;
     struct stat stat;
     uint64_t mnt_id;
+    struct statfs statfs;
 
     fd = open("/", O_PATH);
     if (fd == -1) {
@@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
         root->posix_locks = g_hash_table_new_full(
             g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
     }
+
+    /*
+     * Currently only ext4/xfs since linux kernel v5.8 support storing
+     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
+     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
+     * FS_IOC_FSG[S]ETXATTR ioctl.
+     */
+    res = fstatfs(fd, &statfs);
+    if (!res) {
+	if (statfs.f_type == EXT4_SUPER_MAGIC)
+	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
+	else if (statfs.f_type == XFS_SUPER_MAGIC)
+	    lo->perfile_dax_cap = DAX_CAP_XATTR;
+    }
 }
 
 static guint lo_key_hash(gconstpointer key)
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-17  2:23     ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

In FUSE_INIT negotiating phase, server/client should advertise if it
supports per-file DAX.

Once advertising support for per-file DAX feature, virtiofsd should
support storing FS_DAX_FL flag persistently passed by
FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
FUSE_LOOKUP accordingly if the file is capable of per-file DAX.

Currently only ext4/xfs since linux kernel v5.8 support storing
FS_DAX_FL flag persistently, and thus advertise support for per-file
DAX feature only when the backend fs type is ext4 and xfs.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/fuse_common.h    |  5 +++++
 tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
 tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 8a75729be9..ee6fc64c23 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -372,6 +372,11 @@ struct fuse_file_info {
  */
 #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
 
+/**
+ * Indicates support for per-file DAX.
+ */
+#define FUSE_CAP_PERFILE_DAX (1 << 29)
+
 /**
  * Ioctl flags
  *
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 50fc5c8d5a..04a4f17423 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
         se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
     }
+    if (arg->flags & FUSE_PERFILE_DAX) {
+        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
+    }
 #ifdef HAVE_SPLICE
 #ifdef HAVE_VMSPLICE
     se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
@@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
     if (se->conn.want & FUSE_CAP_POSIX_ACL) {
         outarg.flags |= FUSE_POSIX_ACL;
     }
+    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
+        outarg.flags |= FUSE_PERFILE_DAX;
+    }
     outarg.max_readahead = se->conn.max_readahead;
     outarg.max_write = se->conn.max_write;
     if (se->conn.max_background >= (1 << 16)) {
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e170b17adb..5b6228210f 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -53,8 +53,10 @@
 #include <sys/syscall.h>
 #include <sys/wait.h>
 #include <sys/xattr.h>
+#include <sys/vfs.h>
 #include <syslog.h>
 #include <linux/fs.h>
+#include <linux/magic.h>
 
 #include "qemu/cutils.h"
 #include "passthrough_helpers.h"
@@ -136,6 +138,13 @@ enum {
     SANDBOX_CHROOT,
 };
 
+/* capability of storing DAX flag persistently */
+enum {
+    DAX_CAP_NONE,  /* not supported */
+    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
+    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
+};
+
 typedef struct xattr_map_entry {
     char *key;
     char *prepend;
@@ -161,6 +170,7 @@ struct lo_data {
     int readdirplus_clear;
     int allow_direct_io;
     int announce_submounts;
+    int perfile_dax_cap; /* capability of backend fs */
     bool use_statx;
     struct lo_inode root;
     GHashTable *inodes; /* protected by lo->mutex */
@@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
         conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
         lo->killpriv_v2 = 0;
     }
+
+    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
+        conn->want |= FUSE_CAP_PERFILE_DAX;
+    }
 }
 
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
@@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
     int fd, res;
     struct stat stat;
     uint64_t mnt_id;
+    struct statfs statfs;
 
     fd = open("/", O_PATH);
     if (fd == -1) {
@@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
         root->posix_locks = g_hash_table_new_full(
             g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
     }
+
+    /*
+     * Currently only ext4/xfs since linux kernel v5.8 support storing
+     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
+     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
+     * FS_IOC_FSG[S]ETXATTR ioctl.
+     */
+    res = fstatfs(fd, &statfs);
+    if (!res) {
+	if (statfs.f_type == EXT4_SUPER_MAGIC)
+	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
+	else if (statfs.f_type == XFS_SUPER_MAGIC)
+	    lo->perfile_dax_cap = DAX_CAP_XATTR;
+    }
 }
 
 static guint lo_key_hash(gconstpointer key)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
  2021-08-17  2:23   ` Jeffle Xu
  (?)
@ 2021-08-17  2:23     ` Jeffle Xu
  -1 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtualization, virtio-fs, joseph.qi, bo.liu

For passthrough, when the corresponding virtiofs in guest is mounted
with '-o dax=inode', advertise that the file is capable of per-file
DAX if the inode in the backend fs is marked with FS_DAX_FL flag.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 5b6228210f..4cbd904248 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -171,6 +171,7 @@ struct lo_data {
     int allow_direct_io;
     int announce_submounts;
     int perfile_dax_cap; /* capability of backend fs */
+    bool perfile_dax; /* enable per-file DAX or not */
     bool use_statx;
     struct lo_inode root;
     GHashTable *inodes; /* protected by lo->mutex */
@@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
 
     if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
         conn->want |= FUSE_CAP_PERFILE_DAX;
+	lo->perfile_dax = 1;
+    }
+    else {
+	lo->perfile_dax = 0;
     }
 }
 
@@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
     return 0;
 }
 
+/*
+ * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
+ * enabled for this file.
+ */
+static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
+				 const char *name)
+{
+    int res, fd;
+    int ret = false;;
+    unsigned int attr;
+    struct fsxattr xattr;
+
+    if (!lo->perfile_dax)
+	return false;
+
+    /* Open file without O_PATH, so that ioctl can be called. */
+    fd = openat(dir->fd, name, O_NOFOLLOW);
+    if (fd == -1)
+        return false;
+
+    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
+        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
+        if (!res && (attr & FS_DAX_FL))
+	    ret = true;
+    }
+    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
+	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
+	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
+	    ret = true;
+    }
+
+    close(fd);
+    return ret;
+}
+
 /*
  * Increments nlookup on the inode on success. unref_inode_lolocked() must be
  * called eventually to decrement nlookup again. If inodep is non-NULL, the
@@ -1038,6 +1078,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
+    if (lo_should_enable_dax(lo, dir, name))
+	e->attr_flags |= FUSE_ATTR_DAX;
+
     inode = lo_find(lo, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-17  2:23     ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, bo.liu, joseph.qi, virtualization

For passthrough, when the corresponding virtiofs in guest is mounted
with '-o dax=inode', advertise that the file is capable of per-file
DAX if the inode in the backend fs is marked with FS_DAX_FL flag.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 5b6228210f..4cbd904248 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -171,6 +171,7 @@ struct lo_data {
     int allow_direct_io;
     int announce_submounts;
     int perfile_dax_cap; /* capability of backend fs */
+    bool perfile_dax; /* enable per-file DAX or not */
     bool use_statx;
     struct lo_inode root;
     GHashTable *inodes; /* protected by lo->mutex */
@@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
 
     if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
         conn->want |= FUSE_CAP_PERFILE_DAX;
+	lo->perfile_dax = 1;
+    }
+    else {
+	lo->perfile_dax = 0;
     }
 }
 
@@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
     return 0;
 }
 
+/*
+ * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
+ * enabled for this file.
+ */
+static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
+				 const char *name)
+{
+    int res, fd;
+    int ret = false;;
+    unsigned int attr;
+    struct fsxattr xattr;
+
+    if (!lo->perfile_dax)
+	return false;
+
+    /* Open file without O_PATH, so that ioctl can be called. */
+    fd = openat(dir->fd, name, O_NOFOLLOW);
+    if (fd == -1)
+        return false;
+
+    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
+        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
+        if (!res && (attr & FS_DAX_FL))
+	    ret = true;
+    }
+    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
+	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
+	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
+	    ret = true;
+    }
+
+    close(fd);
+    return ret;
+}
+
 /*
  * Increments nlookup on the inode on success. unref_inode_lolocked() must be
  * called eventually to decrement nlookup again. If inodep is non-NULL, the
@@ -1038,6 +1078,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
+    if (lo_should_enable_dax(lo, dir, name))
+	e->attr_flags |= FUSE_ATTR_DAX;
+
     inode = lo_find(lo, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
-- 
2.27.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 151+ messages in thread

* [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-17  2:23     ` Jeffle Xu
  0 siblings, 0 replies; 151+ messages in thread
From: Jeffle Xu @ 2021-08-17  2:23 UTC (permalink / raw)
  To: vgoyal, stefanha, miklos
  Cc: linux-fsdevel, virtio-fs, joseph.qi, virtualization

For passthrough, when the corresponding virtiofs in guest is mounted
with '-o dax=inode', advertise that the file is capable of per-file
DAX if the inode in the backend fs is marked with FS_DAX_FL flag.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
---
 tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 5b6228210f..4cbd904248 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -171,6 +171,7 @@ struct lo_data {
     int allow_direct_io;
     int announce_submounts;
     int perfile_dax_cap; /* capability of backend fs */
+    bool perfile_dax; /* enable per-file DAX or not */
     bool use_statx;
     struct lo_inode root;
     GHashTable *inodes; /* protected by lo->mutex */
@@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
 
     if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
         conn->want |= FUSE_CAP_PERFILE_DAX;
+	lo->perfile_dax = 1;
+    }
+    else {
+	lo->perfile_dax = 0;
     }
 }
 
@@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
     return 0;
 }
 
+/*
+ * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
+ * enabled for this file.
+ */
+static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
+				 const char *name)
+{
+    int res, fd;
+    int ret = false;;
+    unsigned int attr;
+    struct fsxattr xattr;
+
+    if (!lo->perfile_dax)
+	return false;
+
+    /* Open file without O_PATH, so that ioctl can be called. */
+    fd = openat(dir->fd, name, O_NOFOLLOW);
+    if (fd == -1)
+        return false;
+
+    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
+        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
+        if (!res && (attr & FS_DAX_FL))
+	    ret = true;
+    }
+    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
+	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
+	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
+	    ret = true;
+    }
+
+    close(fd);
+    return ret;
+}
+
 /*
  * Increments nlookup on the inode on success. unref_inode_lolocked() must be
  * called eventually to decrement nlookup again. If inodep is non-NULL, the
@@ -1038,6 +1078,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
         e->attr_flags |= FUSE_ATTR_SUBMOUNT;
     }
 
+    if (lo_should_enable_dax(lo, dir, name))
+	e->attr_flags |= FUSE_ATTR_DAX;
+
     inode = lo_find(lo, &e->attr, mnt_id);
     if (inode) {
         close(newfd);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17  2:22 ` Jeffle Xu
@ 2021-08-17  8:06   ` Miklos Szeredi
  -1 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-17  8:06 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: Vivek Goyal, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo

On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].

Can you please explain the background of this change in detail?

Why would an admin want to enable DAX for a particular virtiofs file
and not for others?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17  8:06   ` Miklos Szeredi
  0 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-17  8:06 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal

On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>
> This patchset adds support of per-file DAX for virtiofs, which is
> inspired by Ira Weiny's work on ext4[1] and xfs[2].

Can you please explain the background of this change in detail?

Why would an admin want to enable DAX for a particular virtiofs file
and not for others?

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17  8:06   ` [Virtio-fs] " Miklos Szeredi
  (?)
@ 2021-08-17  9:32     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17  9:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeffle Xu, virtualization, virtio-fs-list, Joseph Qi,
	linux-fsdevel, Vivek Goyal

* Miklos Szeredi (miklos@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >
> > This patchset adds support of per-file DAX for virtiofs, which is
> > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Can you please explain the background of this change in detail?
> 
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

Where we're contending on virtiofs dax cache size it makes a lot of
sense; it's quite expensive for us to map something into the cache
(especially if we push something else out), so selectively DAXing files
that are expected to be hot could help reduce cache churn.

Dave

> Thanks,
> Miklos
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17  9:32     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17  9:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel, Vivek Goyal

* Miklos Szeredi (miklos@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >
> > This patchset adds support of per-file DAX for virtiofs, which is
> > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Can you please explain the background of this change in detail?
> 
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

Where we're contending on virtiofs dax cache size it makes a lot of
sense; it's quite expensive for us to map something into the cache
(especially if we push something else out), so selectively DAXing files
that are expected to be hot could help reduce cache churn.

Dave

> Thanks,
> Miklos
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17  9:32     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17  9:32 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel,
	Jeffle Xu, Vivek Goyal

* Miklos Szeredi (miklos@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >
> > This patchset adds support of per-file DAX for virtiofs, which is
> > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Can you please explain the background of this change in detail?
> 
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

Where we're contending on virtiofs dax cache size it makes a lot of
sense; it's quite expensive for us to map something into the cache
(especially if we push something else out), so selectively DAXing files
that are expected to be hot could help reduce cache churn.

Dave

> Thanks,
> Miklos
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17  9:32     ` Dr. David Alan Gilbert
@ 2021-08-17 10:09       ` Miklos Szeredi
  -1 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-17 10:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Jeffle Xu, virtualization, virtio-fs-list, Joseph Qi,
	linux-fsdevel, Vivek Goyal

On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > >
> > > This patchset adds support of per-file DAX for virtiofs, which is
> > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >
> > Can you please explain the background of this change in detail?
> >
> > Why would an admin want to enable DAX for a particular virtiofs file
> > and not for others?
>
> Where we're contending on virtiofs dax cache size it makes a lot of
> sense; it's quite expensive for us to map something into the cache
> (especially if we push something else out), so selectively DAXing files
> that are expected to be hot could help reduce cache churn.

If this is a performance issue, it should be fixed in a way that
doesn't require hand tuning like you suggest, I think.

I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
can help understand the virtiofs case as well.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 10:09       ` Miklos Szeredi
  0 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-17 10:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel,
	Jeffle Xu, Vivek Goyal

On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
<dgilbert@redhat.com> wrote:
>
> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > >
> > > This patchset adds support of per-file DAX for virtiofs, which is
> > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >
> > Can you please explain the background of this change in detail?
> >
> > Why would an admin want to enable DAX for a particular virtiofs file
> > and not for others?
>
> Where we're contending on virtiofs dax cache size it makes a lot of
> sense; it's quite expensive for us to map something into the cache
> (especially if we push something else out), so selectively DAXing files
> that are expected to be hot could help reduce cache churn.

If this is a performance issue, it should be fixed in a way that
doesn't require hand tuning like you suggest, I think.

I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
can help understand the virtiofs case as well.

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
  2021-08-17  2:22   ` Jeffle Xu
  (?)
@ 2021-08-17 10:26     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 10:26 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> When the per-file DAX indication changes while the file is still
> *opened*, it is quite complicated and maybe fragile to dynamically
> change the DAX state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the

                                                     ^^^^^^^^^^
typo as DONT ?

Dave

> per-file DAX indication changes, so that the inode instance will be
> evicted and freed as soon as possible once the file is closed and the
> last reference to the inode is put. And then when the file gets reopened
> next time, the inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX indication changes for an *opened*
> file, the state of the file won't be updated until this file is closed
> and reopened later.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 30833f8d37dd..f7ede0be4e00 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +	if (fc->dax_mode == FUSE_DAX_INODE &&
> +	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
> +		d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>  {
>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7b7b4c208af2..56fe1c4d2136 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, bool newdax);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 8080f78befed..8c9774c6a210 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>  		if (inval)
>  			invalidate_inode_pages2(inode->i_mapping);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
> +		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
@ 2021-08-17 10:26     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 10:26 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> When the per-file DAX indication changes while the file is still
> *opened*, it is quite complicated and maybe fragile to dynamically
> change the DAX state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the

                                                     ^^^^^^^^^^
typo as DONT ?

Dave

> per-file DAX indication changes, so that the inode instance will be
> evicted and freed as soon as possible once the file is closed and the
> last reference to the inode is put. And then when the file gets reopened
> next time, the inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX indication changes for an *opened*
> file, the state of the file won't be updated until this file is closed
> and reopened later.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 30833f8d37dd..f7ede0be4e00 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +	if (fc->dax_mode == FUSE_DAX_INODE &&
> +	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
> +		d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>  {
>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7b7b4c208af2..56fe1c4d2136 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, bool newdax);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 8080f78befed..8c9774c6a210 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>  		if (inval)
>  			invalidate_inode_pages2(inode->i_mapping);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
> +		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
@ 2021-08-17 10:26     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 10:26 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> When the per-file DAX indication changes while the file is still
> *opened*, it is quite complicated and maybe fragile to dynamically
> change the DAX state.
> 
> Hence mark the inode and corresponding dentries as DONE_CACHE once the

                                                     ^^^^^^^^^^
typo as DONT ?

Dave

> per-file DAX indication changes, so that the inode instance will be
> evicted and freed as soon as possible once the file is closed and the
> last reference to the inode is put. And then when the file gets reopened
> next time, the inode will reflect the new DAX state.
> 
> In summary, when the per-file DAX indication changes for an *opened*
> file, the state of the file won't be updated until this file is closed
> and reopened later.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/fuse/dax.c    | 9 +++++++++
>  fs/fuse/fuse_i.h | 1 +
>  fs/fuse/inode.c  | 3 +++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 30833f8d37dd..f7ede0be4e00 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>  }
>  
> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
> +{
> +	struct fuse_conn *fc = get_fuse_conn(inode);
> +
> +	if (fc->dax_mode == FUSE_DAX_INODE &&
> +	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
> +		d_mark_dontcache(inode);
> +}
> +
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>  {
>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7b7b4c208af2..56fe1c4d2136 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>  void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_dontcache(struct inode *inode, bool newdax);
>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 8080f78befed..8c9774c6a210 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>  		if (inval)
>  			invalidate_inode_pages2(inode->i_mapping);
>  	}
> +
> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
> +		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
>  }
>  
>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 10:09       ` Miklos Szeredi
  (?)
@ 2021-08-17 10:37         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 10:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeffle Xu, virtualization, virtio-fs-list, Joseph Qi,
	linux-fsdevel, Vivek Goyal

* Miklos Szeredi (miklos@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Miklos Szeredi (miklos@szeredi.hu) wrote:
> > > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > > >
> > > > This patchset adds support of per-file DAX for virtiofs, which is
> > > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > >
> > > Can you please explain the background of this change in detail?
> > >
> > > Why would an admin want to enable DAX for a particular virtiofs file
> > > and not for others?
> >
> > Where we're contending on virtiofs dax cache size it makes a lot of
> > sense; it's quite expensive for us to map something into the cache
> > (especially if we push something else out), so selectively DAXing files
> > that are expected to be hot could help reduce cache churn.
> 
> If this is a performance issue, it should be fixed in a way that
> doesn't require hand tuning like you suggest, I think.

I'd agree that would be nice; however:
  a) It looks like other filesystems already gave something admin
selectable
  b) Trying to write clever heuristics is only going to work in some
cases; being able to say 'DAX this directory' might work better in
practice.

> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> can help understand the virtiofs case as well.

Yep, I don't understand the case with real nvdimm hardware.

Dave

> Thanks,
> Miklos
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 10:37         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 10:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel, Vivek Goyal

* Miklos Szeredi (miklos@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Miklos Szeredi (miklos@szeredi.hu) wrote:
> > > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > > >
> > > > This patchset adds support of per-file DAX for virtiofs, which is
> > > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > >
> > > Can you please explain the background of this change in detail?
> > >
> > > Why would an admin want to enable DAX for a particular virtiofs file
> > > and not for others?
> >
> > Where we're contending on virtiofs dax cache size it makes a lot of
> > sense; it's quite expensive for us to map something into the cache
> > (especially if we push something else out), so selectively DAXing files
> > that are expected to be hot could help reduce cache churn.
> 
> If this is a performance issue, it should be fixed in a way that
> doesn't require hand tuning like you suggest, I think.

I'd agree that would be nice; however:
  a) It looks like other filesystems already gave something admin
selectable
  b) Trying to write clever heuristics is only going to work in some
cases; being able to say 'DAX this directory' might work better in
practice.

> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> can help understand the virtiofs case as well.

Yep, I don't understand the case with real nvdimm hardware.

Dave

> Thanks,
> Miklos
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 10:37         ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 10:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel,
	Jeffle Xu, Vivek Goyal

* Miklos Szeredi (miklos@szeredi.hu) wrote:
> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
> >
> > * Miklos Szeredi (miklos@szeredi.hu) wrote:
> > > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > > >
> > > > This patchset adds support of per-file DAX for virtiofs, which is
> > > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > >
> > > Can you please explain the background of this change in detail?
> > >
> > > Why would an admin want to enable DAX for a particular virtiofs file
> > > and not for others?
> >
> > Where we're contending on virtiofs dax cache size it makes a lot of
> > sense; it's quite expensive for us to map something into the cache
> > (especially if we push something else out), so selectively DAXing files
> > that are expected to be hot could help reduce cache churn.
> 
> If this is a performance issue, it should be fixed in a way that
> doesn't require hand tuning like you suggest, I think.

I'd agree that would be nice; however:
  a) It looks like other filesystems already gave something admin
selectable
  b) Trying to write clever heuristics is only going to work in some
cases; being able to say 'DAX this directory' might work better in
practice.

> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> can help understand the virtiofs case as well.

Yep, I don't understand the case with real nvdimm hardware.

Dave

> Thanks,
> Miklos
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17  8:06   ` [Virtio-fs] " Miklos Szeredi
  (?)
@ 2021-08-17 12:39     ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 12:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jeffle Xu, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo

On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >
> > This patchset adds support of per-file DAX for virtiofs, which is
> > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Can you please explain the background of this change in detail?
> 
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

Initially I thought that they needed it because they are downloading
files on the fly from server. So they don't want to enable dax on the file
till file is completely downloaded. But later I realized that they should
be able to block in FUSE_SETUPMAPPING call and make sure associated
file section has been downloaded before returning and solve the problem.
So that can't be the primary reason.

Other reason mentioned I think was that only certain files benefit
from DAX. But not much details are there after that. It will be nice
to hear a more concrete use case and more details about this usage.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 12:39     ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 12:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel, Liu Bo,
	Stefan Hajnoczi

On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >
> > This patchset adds support of per-file DAX for virtiofs, which is
> > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Can you please explain the background of this change in detail?
> 
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

Initially I thought that they needed it because they are downloading
files on the fly from server. So they don't want to enable dax on the file
till file is completely downloaded. But later I realized that they should
be able to block in FUSE_SETUPMAPPING call and make sure associated
file section has been downloaded before returning and solve the problem.
So that can't be the primary reason.

Other reason mentioned I think was that only certain files benefit
from DAX. But not much details are there after that. It will be nice
to hear a more concrete use case and more details about this usage.

Thanks
Vivek

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 12:39     ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 12:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel, Jeffle Xu

On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >
> > This patchset adds support of per-file DAX for virtiofs, which is
> > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> 
> Can you please explain the background of this change in detail?
> 
> Why would an admin want to enable DAX for a particular virtiofs file
> and not for others?

Initially I thought that they needed it because they are downloading
files on the fly from server. So they don't want to enable dax on the file
till file is completely downloaded. But later I realized that they should
be able to block in FUSE_SETUPMAPPING call and make sure associated
file section has been downloaded before returning and solve the problem.
So that can't be the primary reason.

Other reason mentioned I think was that only certain files benefit
from DAX. But not much details are there after that. It will be nice
to hear a more concrete use case and more details about this usage.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17  9:32     ` Dr. David Alan Gilbert
  (?)
@ 2021-08-17 12:40       ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 12:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Miklos Szeredi, Jeffle Xu, virtualization, virtio-fs-list,
	Joseph Qi, linux-fsdevel

On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > >
> > > This patchset adds support of per-file DAX for virtiofs, which is
> > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > 
> > Can you please explain the background of this change in detail?
> > 
> > Why would an admin want to enable DAX for a particular virtiofs file
> > and not for others?
> 
> Where we're contending on virtiofs dax cache size it makes a lot of
> sense; it's quite expensive for us to map something into the cache
> (especially if we push something else out), so selectively DAXing files
> that are expected to be hot could help reduce cache churn.

In that case probaly we should just make DAX window larger. I assume
that selecting which files to turn DAX on, will itself will not be
a trivial. Not sure what heuristics are being deployed to determine
that. Will like to know more about it.

Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 12:40       ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 12:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Joseph Qi, Miklos Szeredi, virtualization, virtio-fs-list, linux-fsdevel

On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > >
> > > This patchset adds support of per-file DAX for virtiofs, which is
> > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > 
> > Can you please explain the background of this change in detail?
> > 
> > Why would an admin want to enable DAX for a particular virtiofs file
> > and not for others?
> 
> Where we're contending on virtiofs dax cache size it makes a lot of
> sense; it's quite expensive for us to map something into the cache
> (especially if we push something else out), so selectively DAXing files
> that are expected to be hot could help reduce cache churn.

In that case probaly we should just make DAX window larger. I assume
that selecting which files to turn DAX on, will itself will not be
a trivial. Not sure what heuristics are being deployed to determine
that. Will like to know more about it.

Vivek

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 12:40       ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 12:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Joseph Qi, Miklos Szeredi, virtualization, virtio-fs-list,
	linux-fsdevel, Jeffle Xu

On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> > On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > >
> > > This patchset adds support of per-file DAX for virtiofs, which is
> > > inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > 
> > Can you please explain the background of this change in detail?
> > 
> > Why would an admin want to enable DAX for a particular virtiofs file
> > and not for others?
> 
> Where we're contending on virtiofs dax cache size it makes a lot of
> sense; it's quite expensive for us to map something into the cache
> (especially if we push something else out), so selectively DAXing files
> that are expected to be hot could help reduce cache churn.

In that case probaly we should just make DAX window larger. I assume
that selecting which files to turn DAX on, will itself will not be
a trivial. Not sure what heuristics are being deployed to determine
that. Will like to know more about it.

Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 10:09       ` Miklos Szeredi
  (?)
@ 2021-08-17 13:08         ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:08 UTC (permalink / raw)
  To: Miklos Szeredi, Dr. David Alan Gilbert
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal



On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>>
>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>
>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>
>>> Can you please explain the background of this change in detail?
>>>
>>> Why would an admin want to enable DAX for a particular virtiofs file
>>> and not for others?
>>
>> Where we're contending on virtiofs dax cache size it makes a lot of
>> sense; it's quite expensive for us to map something into the cache
>> (especially if we push something else out), so selectively DAXing files
>> that are expected to be hot could help reduce cache churn.
> 
> If this is a performance issue, it should be fixed in a way that
> doesn't require hand tuning like you suggest, I think.
> 
> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> can help understand the virtiofs case as well.
> 

Some hints why ext4/xfs support per-file DAX can be found [1] and [2].

"Boaz Harrosh wondered why someone might want to turn DAX off for a
persistent memory device. Hellwig said that the performance "could
suck"; Williams noted that the page cache could be useful for some
applications as well. Jan Kara pointed out that reads from persistent
memory are close to DRAM speed, but that writes are not; the page cache
could be helpful for frequent writes. Applications need to change to
fully take advantage of DAX, Williams said; part of the promise of
adding a flag is that users can do DAX on smaller granularities than a
full filesystem."

In summary, page cache is preferable in some cases, and thus more fine
grained way of DAX control is needed.


As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
may compete for limited DAX window resource.

Besides, supporting DAX for small files can be expensive. Small files
can consume DAX window resource rapidly, and if small files are accessed
only once, the cost of mmap/munmap on host can not be ignored.


[1]
https://lore.kernel.org/lkml/20200428002142.404144-1-ira.weiny@intel.com/
[2] https://lwn.net/Articles/787973/

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 13:08         ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:08 UTC (permalink / raw)
  To: Miklos Szeredi, Dr. David Alan Gilbert
  Cc: virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal, virtualization



On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>>
>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>
>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>
>>> Can you please explain the background of this change in detail?
>>>
>>> Why would an admin want to enable DAX for a particular virtiofs file
>>> and not for others?
>>
>> Where we're contending on virtiofs dax cache size it makes a lot of
>> sense; it's quite expensive for us to map something into the cache
>> (especially if we push something else out), so selectively DAXing files
>> that are expected to be hot could help reduce cache churn.
> 
> If this is a performance issue, it should be fixed in a way that
> doesn't require hand tuning like you suggest, I think.
> 
> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> can help understand the virtiofs case as well.
> 

Some hints why ext4/xfs support per-file DAX can be found [1] and [2].

"Boaz Harrosh wondered why someone might want to turn DAX off for a
persistent memory device. Hellwig said that the performance "could
suck"; Williams noted that the page cache could be useful for some
applications as well. Jan Kara pointed out that reads from persistent
memory are close to DRAM speed, but that writes are not; the page cache
could be helpful for frequent writes. Applications need to change to
fully take advantage of DAX, Williams said; part of the promise of
adding a flag is that users can do DAX on smaller granularities than a
full filesystem."

In summary, page cache is preferable in some cases, and thus more fine
grained way of DAX control is needed.


As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
may compete for limited DAX window resource.

Besides, supporting DAX for small files can be expensive. Small files
can consume DAX window resource rapidly, and if small files are accessed
only once, the cost of mmap/munmap on host can not be ignored.


[1]
https://lore.kernel.org/lkml/20200428002142.404144-1-ira.weiny@intel.com/
[2] https://lwn.net/Articles/787973/

-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 13:08         ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:08 UTC (permalink / raw)
  To: Miklos Szeredi, Dr. David Alan Gilbert
  Cc: virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal, virtualization



On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> <dgilbert@redhat.com> wrote:
>>
>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>
>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>
>>> Can you please explain the background of this change in detail?
>>>
>>> Why would an admin want to enable DAX for a particular virtiofs file
>>> and not for others?
>>
>> Where we're contending on virtiofs dax cache size it makes a lot of
>> sense; it's quite expensive for us to map something into the cache
>> (especially if we push something else out), so selectively DAXing files
>> that are expected to be hot could help reduce cache churn.
> 
> If this is a performance issue, it should be fixed in a way that
> doesn't require hand tuning like you suggest, I think.
> 
> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> can help understand the virtiofs case as well.
> 

Some hints why ext4/xfs support per-file DAX can be found [1] and [2].

"Boaz Harrosh wondered why someone might want to turn DAX off for a
persistent memory device. Hellwig said that the performance "could
suck"; Williams noted that the page cache could be useful for some
applications as well. Jan Kara pointed out that reads from persistent
memory are close to DRAM speed, but that writes are not; the page cache
could be helpful for frequent writes. Applications need to change to
fully take advantage of DAX, Williams said; part of the promise of
adding a flag is that users can do DAX on smaller granularities than a
full filesystem."

In summary, page cache is preferable in some cases, and thus more fine
grained way of DAX control is needed.


As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
may compete for limited DAX window resource.

Besides, supporting DAX for small files can be expensive. Small files
can consume DAX window resource rapidly, and if small files are accessed
only once, the cost of mmap/munmap on host can not be ignored.


[1]
https://lore.kernel.org/lkml/20200428002142.404144-1-ira.weiny@intel.com/
[2] https://lwn.net/Articles/787973/

-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 12:39     ` Vivek Goyal
  (?)
@ 2021-08-17 13:22       ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:22 UTC (permalink / raw)
  To: Vivek Goyal, Miklos Szeredi
  Cc: Stefan Hajnoczi, linux-fsdevel, virtualization, virtio-fs-list,
	Joseph Qi, Liu Bo



On 8/17/21 8:39 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>
>>> This patchset adds support of per-file DAX for virtiofs, which is
>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>
>> Can you please explain the background of this change in detail?
>>
>> Why would an admin want to enable DAX for a particular virtiofs file
>> and not for others?
> 
> Initially I thought that they needed it because they are downloading
> files on the fly from server. So they don't want to enable dax on the file
> till file is completely downloaded. 

Right, it's our initial requirement.


> But later I realized that they should
> be able to block in FUSE_SETUPMAPPING call and make sure associated
> file section has been downloaded before returning and solve the problem.
> So that can't be the primary reason.

Saying we want to access 4KB of one file inside guest, if it goes
through FUSE request routine, then the fuse daemon only need to download
this 4KB from remote server. But if it goes through DAX, then the fuse
daemon need to download the whole DAX window (e.g., 2MB) from remote
server, so called amplification. Maybe we could decrease the DAX window
size, but it's a trade off.

> 
> Other reason mentioned I think was that only certain files benefit
> from DAX. But not much details are there after that. It will be nice
> to hear a more concrete use case and more details about this usage.
> 

Apart from our internal requirement, more fine grained control for DAX
shall be general and more flexible. Glad to hear more discussion from
community.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 13:22       ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:22 UTC (permalink / raw)
  To: Vivek Goyal, Miklos Szeredi
  Cc: virtualization, virtio-fs-list, Joseph Qi, Liu Bo,
	Stefan Hajnoczi, linux-fsdevel



On 8/17/21 8:39 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>
>>> This patchset adds support of per-file DAX for virtiofs, which is
>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>
>> Can you please explain the background of this change in detail?
>>
>> Why would an admin want to enable DAX for a particular virtiofs file
>> and not for others?
> 
> Initially I thought that they needed it because they are downloading
> files on the fly from server. So they don't want to enable dax on the file
> till file is completely downloaded. 

Right, it's our initial requirement.


> But later I realized that they should
> be able to block in FUSE_SETUPMAPPING call and make sure associated
> file section has been downloaded before returning and solve the problem.
> So that can't be the primary reason.

Saying we want to access 4KB of one file inside guest, if it goes
through FUSE request routine, then the fuse daemon only need to download
this 4KB from remote server. But if it goes through DAX, then the fuse
daemon need to download the whole DAX window (e.g., 2MB) from remote
server, so called amplification. Maybe we could decrease the DAX window
size, but it's a trade off.

> 
> Other reason mentioned I think was that only certain files benefit
> from DAX. But not much details are there after that. It will be nice
> to hear a more concrete use case and more details about this usage.
> 

Apart from our internal requirement, more fine grained control for DAX
shall be general and more flexible. Glad to hear more discussion from
community.


-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 13:22       ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:22 UTC (permalink / raw)
  To: Vivek Goyal, Miklos Szeredi
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel



On 8/17/21 8:39 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>
>>> This patchset adds support of per-file DAX for virtiofs, which is
>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>
>> Can you please explain the background of this change in detail?
>>
>> Why would an admin want to enable DAX for a particular virtiofs file
>> and not for others?
> 
> Initially I thought that they needed it because they are downloading
> files on the fly from server. So they don't want to enable dax on the file
> till file is completely downloaded. 

Right, it's our initial requirement.


> But later I realized that they should
> be able to block in FUSE_SETUPMAPPING call and make sure associated
> file section has been downloaded before returning and solve the problem.
> So that can't be the primary reason.

Saying we want to access 4KB of one file inside guest, if it goes
through FUSE request routine, then the fuse daemon only need to download
this 4KB from remote server. But if it goes through DAX, then the fuse
daemon need to download the whole DAX window (e.g., 2MB) from remote
server, so called amplification. Maybe we could decrease the DAX window
size, but it's a trade off.

> 
> Other reason mentioned I think was that only certain files benefit
> from DAX. But not much details are there after that. It will be nice
> to hear a more concrete use case and more details about this usage.
> 

Apart from our internal requirement, more fine grained control for DAX
shall be general and more flexible. Glad to hear more discussion from
community.


-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
  2021-08-17 10:26     ` Dr. David Alan Gilbert
  (?)
@ 2021-08-17 13:23       ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization



On 8/17/21 6:26 PM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> When the per-file DAX indication changes while the file is still
>> *opened*, it is quite complicated and maybe fragile to dynamically
>> change the DAX state.
>>
>> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> 
>                                                      ^^^^^^^^^^
> typo as DONT ?
> 

Thanks. I will fix it.

> 
>> per-file DAX indication changes, so that the inode instance will be
>> evicted and freed as soon as possible once the file is closed and the
>> last reference to the inode is put. And then when the file gets reopened
>> next time, the inode will reflect the new DAX state.
>>
>> In summary, when the per-file DAX indication changes for an *opened*
>> file, the state of the file won't be updated until this file is closed
>> and reopened later.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/fuse/dax.c    | 9 +++++++++
>>  fs/fuse/fuse_i.h | 1 +
>>  fs/fuse/inode.c  | 3 +++
>>  3 files changed, 13 insertions(+)
>>
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index 30833f8d37dd..f7ede0be4e00 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>>  }
>>  
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
>> +{
>> +	struct fuse_conn *fc = get_fuse_conn(inode);
>> +
>> +	if (fc->dax_mode == FUSE_DAX_INODE &&
>> +	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
>> +		d_mark_dontcache(inode);
>> +}
>> +
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>>  {
>>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index 7b7b4c208af2..56fe1c4d2136 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>>  void fuse_dax_inode_cleanup(struct inode *inode);
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax);
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>>  
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 8080f78befed..8c9774c6a210 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>>  		if (inval)
>>  			invalidate_inode_pages2(inode->i_mapping);
>>  	}
>> +
>> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
>> +		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
>>  }
>>  
>>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
>> -- 
>> 2.27.0
>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
@ 2021-08-17 13:23       ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal



On 8/17/21 6:26 PM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> When the per-file DAX indication changes while the file is still
>> *opened*, it is quite complicated and maybe fragile to dynamically
>> change the DAX state.
>>
>> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> 
>                                                      ^^^^^^^^^^
> typo as DONT ?
> 

Thanks. I will fix it.

> 
>> per-file DAX indication changes, so that the inode instance will be
>> evicted and freed as soon as possible once the file is closed and the
>> last reference to the inode is put. And then when the file gets reopened
>> next time, the inode will reflect the new DAX state.
>>
>> In summary, when the per-file DAX indication changes for an *opened*
>> file, the state of the file won't be updated until this file is closed
>> and reopened later.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/fuse/dax.c    | 9 +++++++++
>>  fs/fuse/fuse_i.h | 1 +
>>  fs/fuse/inode.c  | 3 +++
>>  3 files changed, 13 insertions(+)
>>
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index 30833f8d37dd..f7ede0be4e00 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>>  }
>>  
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
>> +{
>> +	struct fuse_conn *fc = get_fuse_conn(inode);
>> +
>> +	if (fc->dax_mode == FUSE_DAX_INODE &&
>> +	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
>> +		d_mark_dontcache(inode);
>> +}
>> +
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>>  {
>>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index 7b7b4c208af2..56fe1c4d2136 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>>  void fuse_dax_inode_cleanup(struct inode *inode);
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax);
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>>  
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 8080f78befed..8c9774c6a210 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>>  		if (inval)
>>  			invalidate_inode_pages2(inode->i_mapping);
>>  	}
>> +
>> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
>> +		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
>>  }
>>  
>>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
>> -- 
>> 2.27.0
>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>>

-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes
@ 2021-08-17 13:23       ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-17 13:23 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal



On 8/17/21 6:26 PM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> When the per-file DAX indication changes while the file is still
>> *opened*, it is quite complicated and maybe fragile to dynamically
>> change the DAX state.
>>
>> Hence mark the inode and corresponding dentries as DONE_CACHE once the
> 
>                                                      ^^^^^^^^^^
> typo as DONT ?
> 

Thanks. I will fix it.

> 
>> per-file DAX indication changes, so that the inode instance will be
>> evicted and freed as soon as possible once the file is closed and the
>> last reference to the inode is put. And then when the file gets reopened
>> next time, the inode will reflect the new DAX state.
>>
>> In summary, when the per-file DAX indication changes for an *opened*
>> file, the state of the file won't be updated until this file is closed
>> and reopened later.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  fs/fuse/dax.c    | 9 +++++++++
>>  fs/fuse/fuse_i.h | 1 +
>>  fs/fuse/inode.c  | 3 +++
>>  3 files changed, 13 insertions(+)
>>
>> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
>> index 30833f8d37dd..f7ede0be4e00 100644
>> --- a/fs/fuse/dax.c
>> +++ b/fs/fuse/dax.c
>> @@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
>>  	inode->i_data.a_ops = &fuse_dax_file_aops;
>>  }
>>  
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax)
>> +{
>> +	struct fuse_conn *fc = get_fuse_conn(inode);
>> +
>> +	if (fc->dax_mode == FUSE_DAX_INODE &&
>> +	    fc->perfile_dax && (!!IS_DAX(inode) != newdax))
>> +		d_mark_dontcache(inode);
>> +}
>> +
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
>>  {
>>  	if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index 7b7b4c208af2..56fe1c4d2136 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
>>  bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
>>  void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
>>  void fuse_dax_inode_cleanup(struct inode *inode);
>> +void fuse_dax_dontcache(struct inode *inode, bool newdax);
>>  bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>>  void fuse_dax_cancel_work(struct fuse_conn *fc);
>>  
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 8080f78befed..8c9774c6a210 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
>>  		if (inval)
>>  			invalidate_inode_pages2(inode->i_mapping);
>>  	}
>> +
>> +	if (IS_ENABLED(CONFIG_FUSE_DAX))
>> +		fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
>>  }
>>  
>>  static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
>> -- 
>> 2.27.0
>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>>

-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 13:22       ` JeffleXu
@ 2021-08-17 14:08         ` Miklos Szeredi
  -1 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-17 14:08 UTC (permalink / raw)
  To: JeffleXu
  Cc: Vivek Goyal, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo

On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 8/17/21 8:39 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> >> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>
> >>> This patchset adds support of per-file DAX for virtiofs, which is
> >>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>
> >> Can you please explain the background of this change in detail?
> >>
> >> Why would an admin want to enable DAX for a particular virtiofs file
> >> and not for others?
> >
> > Initially I thought that they needed it because they are downloading
> > files on the fly from server. So they don't want to enable dax on the file
> > till file is completely downloaded.
>
> Right, it's our initial requirement.
>
>
> > But later I realized that they should
> > be able to block in FUSE_SETUPMAPPING call and make sure associated
> > file section has been downloaded before returning and solve the problem.
> > So that can't be the primary reason.
>
> Saying we want to access 4KB of one file inside guest, if it goes
> through FUSE request routine, then the fuse daemon only need to download
> this 4KB from remote server. But if it goes through DAX, then the fuse
> daemon need to download the whole DAX window (e.g., 2MB) from remote
> server, so called amplification. Maybe we could decrease the DAX window
> size, but it's a trade off.

That could be achieved with a plain fuse filesystem on the host (which
will get 4k READ requests for accesses to mapped area inside guest).
Since this can be done selectively for files which are not yet
downloaded, the extra layer wouldn't be a performance problem.

Is there a reason why that wouldn't work?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 14:08         ` Miklos Szeredi
  0 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-17 14:08 UTC (permalink / raw)
  To: JeffleXu
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal

On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 8/17/21 8:39 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> >> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>
> >>> This patchset adds support of per-file DAX for virtiofs, which is
> >>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>
> >> Can you please explain the background of this change in detail?
> >>
> >> Why would an admin want to enable DAX for a particular virtiofs file
> >> and not for others?
> >
> > Initially I thought that they needed it because they are downloading
> > files on the fly from server. So they don't want to enable dax on the file
> > till file is completely downloaded.
>
> Right, it's our initial requirement.
>
>
> > But later I realized that they should
> > be able to block in FUSE_SETUPMAPPING call and make sure associated
> > file section has been downloaded before returning and solve the problem.
> > So that can't be the primary reason.
>
> Saying we want to access 4KB of one file inside guest, if it goes
> through FUSE request routine, then the fuse daemon only need to download
> this 4KB from remote server. But if it goes through DAX, then the fuse
> daemon need to download the whole DAX window (e.g., 2MB) from remote
> server, so called amplification. Maybe we could decrease the DAX window
> size, but it's a trade off.

That could be achieved with a plain fuse filesystem on the host (which
will get 4k READ requests for accesses to mapped area inside guest).
Since this can be done selectively for files which are not yet
downloaded, the extra layer wouldn't be a performance problem.

Is there a reason why that wouldn't work?

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 13:08         ` JeffleXu
@ 2021-08-17 14:11           ` Miklos Szeredi
  -1 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-17 14:11 UTC (permalink / raw)
  To: JeffleXu
  Cc: Dr. David Alan Gilbert, virtualization, virtio-fs-list,
	Joseph Qi, linux-fsdevel, Vivek Goyal

On Tue, 17 Aug 2021 at 15:08, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> > On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> >>
> >> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>
> >>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> >
> > If this is a performance issue, it should be fixed in a way that
> > doesn't require hand tuning like you suggest, I think.
> >
> > I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> > can help understand the virtiofs case as well.
> >
>
> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
>
> "Boaz Harrosh wondered why someone might want to turn DAX off for a
> persistent memory device. Hellwig said that the performance "could
> suck"; Williams noted that the page cache could be useful for some
> applications as well. Jan Kara pointed out that reads from persistent
> memory are close to DRAM speed, but that writes are not; the page cache
> could be helpful for frequent writes. Applications need to change to
> fully take advantage of DAX, Williams said; part of the promise of
> adding a flag is that users can do DAX on smaller granularities than a
> full filesystem."
>
> In summary, page cache is preferable in some cases, and thus more fine
> grained way of DAX control is needed.

Hmm, okay, very frequent overwrites could be problematic for directly
mapped nvram.

>
> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> may compete for limited DAX window resource.
>
> Besides, supporting DAX for small files can be expensive. Small files
> can consume DAX window resource rapidly, and if small files are accessed
> only once, the cost of mmap/munmap on host can not be ignored.

That's a good point.   Maybe we should disable DAX for file sizes much
smaller than the chunk size?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 14:11           ` Miklos Szeredi
  0 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-17 14:11 UTC (permalink / raw)
  To: JeffleXu
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal

On Tue, 17 Aug 2021 at 15:08, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> > On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> >>
> >> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>
> >>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> >
> > If this is a performance issue, it should be fixed in a way that
> > doesn't require hand tuning like you suggest, I think.
> >
> > I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> > can help understand the virtiofs case as well.
> >
>
> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
>
> "Boaz Harrosh wondered why someone might want to turn DAX off for a
> persistent memory device. Hellwig said that the performance "could
> suck"; Williams noted that the page cache could be useful for some
> applications as well. Jan Kara pointed out that reads from persistent
> memory are close to DRAM speed, but that writes are not; the page cache
> could be helpful for frequent writes. Applications need to change to
> fully take advantage of DAX, Williams said; part of the promise of
> adding a flag is that users can do DAX on smaller granularities than a
> full filesystem."
>
> In summary, page cache is preferable in some cases, and thus more fine
> grained way of DAX control is needed.

Hmm, okay, very frequent overwrites could be problematic for directly
mapped nvram.

>
> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> may compete for limited DAX window resource.
>
> Besides, supporting DAX for small files can be expensive. Small files
> can consume DAX window resource rapidly, and if small files are accessed
> only once, the cost of mmap/munmap on host can not be ignored.

That's a good point.   Maybe we should disable DAX for file sizes much
smaller than the chunk size?

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 13:08         ` JeffleXu
  (?)
@ 2021-08-17 14:54           ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 14:54 UTC (permalink / raw)
  To: JeffleXu
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, Joseph Qi, linux-fsdevel

On Tue, Aug 17, 2021 at 09:08:35PM +0800, JeffleXu wrote:
> 
> 
> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> > On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> >>
> >> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>
> >>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> > 
> > If this is a performance issue, it should be fixed in a way that
> > doesn't require hand tuning like you suggest, I think.
> > 
> > I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> > can help understand the virtiofs case as well.
> > 
> 
> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
> 
> "Boaz Harrosh wondered why someone might want to turn DAX off for a
> persistent memory device. Hellwig said that the performance "could
> suck"; Williams noted that the page cache could be useful for some
> applications as well. Jan Kara pointed out that reads from persistent
> memory are close to DRAM speed, but that writes are not; the page cache
> could be helpful for frequent writes. Applications need to change to
> fully take advantage of DAX, Williams said; part of the promise of
> adding a flag is that users can do DAX on smaller granularities than a
> full filesystem."
> 
> In summary, page cache is preferable in some cases, and thus more fine
> grained way of DAX control is needed.

In case of virtiofs, we are using page cache on host. So this probably
is not a factor for us. Writes will go in page cache of host.

> 
> 
> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> may compete for limited DAX window resource.
> 
> Besides, supporting DAX for small files can be expensive. Small files
> can consume DAX window resource rapidly, and if small files are accessed
> only once, the cost of mmap/munmap on host can not be ignored.

W.r.r access pattern, same applies to large files also. So if a section
of large file is accessed only once, it will consume dax window as well
and will have to be reclaimed.

Dax in virtiofs provides speed gain only if map file once and access
it multiple times. If that pattern does not hold true, then dax does
not seem to provide speed gains and in fact might be slower than
non-dax.

So if there is a pattern where we know some files are accessed repeatedly
while others are not, then enabling/disabling dax selectively will make
sense. Question is how many workloads really know that and how will
you make that decision. Do you have any data to back that up.

W.r.t small file, is that a real concern. If that file is being accessed
mutliple times, then we will still see the speed gain. Only down side
is that there is little wastage of resources because our minimum dax
mapping granularity is 2MB. I am wondering can we handle that by
supporting other dax mapping granularities as well. say 256K and let
users choose it.

Thanks
Vivek
> 
> 
> [1]
> https://lore.kernel.org/lkml/20200428002142.404144-1-ira.weiny@intel.com/
> [2] https://lwn.net/Articles/787973/
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 14:54           ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 14:54 UTC (permalink / raw)
  To: JeffleXu
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, Joseph Qi, linux-fsdevel

On Tue, Aug 17, 2021 at 09:08:35PM +0800, JeffleXu wrote:
> 
> 
> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> > On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> >>
> >> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>
> >>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> > 
> > If this is a performance issue, it should be fixed in a way that
> > doesn't require hand tuning like you suggest, I think.
> > 
> > I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> > can help understand the virtiofs case as well.
> > 
> 
> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
> 
> "Boaz Harrosh wondered why someone might want to turn DAX off for a
> persistent memory device. Hellwig said that the performance "could
> suck"; Williams noted that the page cache could be useful for some
> applications as well. Jan Kara pointed out that reads from persistent
> memory are close to DRAM speed, but that writes are not; the page cache
> could be helpful for frequent writes. Applications need to change to
> fully take advantage of DAX, Williams said; part of the promise of
> adding a flag is that users can do DAX on smaller granularities than a
> full filesystem."
> 
> In summary, page cache is preferable in some cases, and thus more fine
> grained way of DAX control is needed.

In case of virtiofs, we are using page cache on host. So this probably
is not a factor for us. Writes will go in page cache of host.

> 
> 
> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> may compete for limited DAX window resource.
> 
> Besides, supporting DAX for small files can be expensive. Small files
> can consume DAX window resource rapidly, and if small files are accessed
> only once, the cost of mmap/munmap on host can not be ignored.

W.r.r access pattern, same applies to large files also. So if a section
of large file is accessed only once, it will consume dax window as well
and will have to be reclaimed.

Dax in virtiofs provides speed gain only if map file once and access
it multiple times. If that pattern does not hold true, then dax does
not seem to provide speed gains and in fact might be slower than
non-dax.

So if there is a pattern where we know some files are accessed repeatedly
while others are not, then enabling/disabling dax selectively will make
sense. Question is how many workloads really know that and how will
you make that decision. Do you have any data to back that up.

W.r.t small file, is that a real concern. If that file is being accessed
mutliple times, then we will still see the speed gain. Only down side
is that there is little wastage of resources because our minimum dax
mapping granularity is 2MB. I am wondering can we handle that by
supporting other dax mapping granularities as well. say 256K and let
users choose it.

Thanks
Vivek
> 
> 
> [1]
> https://lore.kernel.org/lkml/20200428002142.404144-1-ira.weiny@intel.com/
> [2] https://lwn.net/Articles/787973/
> 
> -- 
> Thanks,
> Jeffle
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 14:54           ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 14:54 UTC (permalink / raw)
  To: JeffleXu
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel

On Tue, Aug 17, 2021 at 09:08:35PM +0800, JeffleXu wrote:
> 
> 
> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
> > On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
> > <dgilbert@redhat.com> wrote:
> >>
> >> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>
> >>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> > 
> > If this is a performance issue, it should be fixed in a way that
> > doesn't require hand tuning like you suggest, I think.
> > 
> > I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
> > can help understand the virtiofs case as well.
> > 
> 
> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
> 
> "Boaz Harrosh wondered why someone might want to turn DAX off for a
> persistent memory device. Hellwig said that the performance "could
> suck"; Williams noted that the page cache could be useful for some
> applications as well. Jan Kara pointed out that reads from persistent
> memory are close to DRAM speed, but that writes are not; the page cache
> could be helpful for frequent writes. Applications need to change to
> fully take advantage of DAX, Williams said; part of the promise of
> adding a flag is that users can do DAX on smaller granularities than a
> full filesystem."
> 
> In summary, page cache is preferable in some cases, and thus more fine
> grained way of DAX control is needed.

In case of virtiofs, we are using page cache on host. So this probably
is not a factor for us. Writes will go in page cache of host.

> 
> 
> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> may compete for limited DAX window resource.
> 
> Besides, supporting DAX for small files can be expensive. Small files
> can consume DAX window resource rapidly, and if small files are accessed
> only once, the cost of mmap/munmap on host can not be ignored.

W.r.r access pattern, same applies to large files also. So if a section
of large file is accessed only once, it will consume dax window as well
and will have to be reclaimed.

Dax in virtiofs provides speed gain only if map file once and access
it multiple times. If that pattern does not hold true, then dax does
not seem to provide speed gains and in fact might be slower than
non-dax.

So if there is a pattern where we know some files are accessed repeatedly
while others are not, then enabling/disabling dax selectively will make
sense. Question is how many workloads really know that and how will
you make that decision. Do you have any data to back that up.

W.r.t small file, is that a real concern. If that file is being accessed
mutliple times, then we will still see the speed gain. Only down side
is that there is little wastage of resources because our minimum dax
mapping granularity is 2MB. I am wondering can we handle that by
supporting other dax mapping granularities as well. say 256K and let
users choose it.

Thanks
Vivek
> 
> 
> [1]
> https://lore.kernel.org/lkml/20200428002142.404144-1-ira.weiny@intel.com/
> [2] https://lwn.net/Articles/787973/
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 13:22       ` JeffleXu
  (?)
@ 2021-08-17 14:57         ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 14:57 UTC (permalink / raw)
  To: JeffleXu
  Cc: Miklos Szeredi, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo

On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
> 
> 
> On 8/17/21 8:39 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> >> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>
> >>> This patchset adds support of per-file DAX for virtiofs, which is
> >>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>
> >> Can you please explain the background of this change in detail?
> >>
> >> Why would an admin want to enable DAX for a particular virtiofs file
> >> and not for others?
> > 
> > Initially I thought that they needed it because they are downloading
> > files on the fly from server. So they don't want to enable dax on the file
> > till file is completely downloaded. 
> 
> Right, it's our initial requirement.
> 
> 
> > But later I realized that they should
> > be able to block in FUSE_SETUPMAPPING call and make sure associated
> > file section has been downloaded before returning and solve the problem.
> > So that can't be the primary reason.
> 
> Saying we want to access 4KB of one file inside guest, if it goes
> through FUSE request routine, then the fuse daemon only need to download
> this 4KB from remote server. But if it goes through DAX, then the fuse
> daemon need to download the whole DAX window (e.g., 2MB) from remote
> server, so called amplification. Maybe we could decrease the DAX window
> size, but it's a trade off.

Downloading 2MB chunk should not be a big issue (IMHO). And if this
turns out to be real concern, we could experiment with a smaller
mapping granularity.

> 
> > 
> > Other reason mentioned I think was that only certain files benefit
> > from DAX. But not much details are there after that. It will be nice
> > to hear a more concrete use case and more details about this usage.
> > 
> 
> Apart from our internal requirement, more fine grained control for DAX
> shall be general and more flexible. Glad to hear more discussion from
> community.

Sure it will be more general and flexible. But there needs to be 1-2
good concrete use cases to justify additional complexity. And I don't
think that so far a good use case has come forward.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 14:57         ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 14:57 UTC (permalink / raw)
  To: JeffleXu
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi,
	Liu Bo, Stefan Hajnoczi, linux-fsdevel

On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
> 
> 
> On 8/17/21 8:39 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> >> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>
> >>> This patchset adds support of per-file DAX for virtiofs, which is
> >>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>
> >> Can you please explain the background of this change in detail?
> >>
> >> Why would an admin want to enable DAX for a particular virtiofs file
> >> and not for others?
> > 
> > Initially I thought that they needed it because they are downloading
> > files on the fly from server. So they don't want to enable dax on the file
> > till file is completely downloaded. 
> 
> Right, it's our initial requirement.
> 
> 
> > But later I realized that they should
> > be able to block in FUSE_SETUPMAPPING call and make sure associated
> > file section has been downloaded before returning and solve the problem.
> > So that can't be the primary reason.
> 
> Saying we want to access 4KB of one file inside guest, if it goes
> through FUSE request routine, then the fuse daemon only need to download
> this 4KB from remote server. But if it goes through DAX, then the fuse
> daemon need to download the whole DAX window (e.g., 2MB) from remote
> server, so called amplification. Maybe we could decrease the DAX window
> size, but it's a trade off.

Downloading 2MB chunk should not be a big issue (IMHO). And if this
turns out to be real concern, we could experiment with a smaller
mapping granularity.

> 
> > 
> > Other reason mentioned I think was that only certain files benefit
> > from DAX. But not much details are there after that. It will be nice
> > to hear a more concrete use case and more details about this usage.
> > 
> 
> Apart from our internal requirement, more fine grained control for DAX
> shall be general and more flexible. Glad to hear more discussion from
> community.

Sure it will be more general and flexible. But there needs to be 1-2
good concrete use cases to justify additional complexity. And I don't
think that so far a good use case has come forward.

Thanks
Vivek

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 14:57         ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 14:57 UTC (permalink / raw)
  To: JeffleXu
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel

On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
> 
> 
> On 8/17/21 8:39 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> >> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>
> >>> This patchset adds support of per-file DAX for virtiofs, which is
> >>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>
> >> Can you please explain the background of this change in detail?
> >>
> >> Why would an admin want to enable DAX for a particular virtiofs file
> >> and not for others?
> > 
> > Initially I thought that they needed it because they are downloading
> > files on the fly from server. So they don't want to enable dax on the file
> > till file is completely downloaded. 
> 
> Right, it's our initial requirement.
> 
> 
> > But later I realized that they should
> > be able to block in FUSE_SETUPMAPPING call and make sure associated
> > file section has been downloaded before returning and solve the problem.
> > So that can't be the primary reason.
> 
> Saying we want to access 4KB of one file inside guest, if it goes
> through FUSE request routine, then the fuse daemon only need to download
> this 4KB from remote server. But if it goes through DAX, then the fuse
> daemon need to download the whole DAX window (e.g., 2MB) from remote
> server, so called amplification. Maybe we could decrease the DAX window
> size, but it's a trade off.

Downloading 2MB chunk should not be a big issue (IMHO). And if this
turns out to be real concern, we could experiment with a smaller
mapping granularity.

> 
> > 
> > Other reason mentioned I think was that only certain files benefit
> > from DAX. But not much details are there after that. It will be nice
> > to hear a more concrete use case and more details about this usage.
> > 
> 
> Apart from our internal requirement, more fine grained control for DAX
> shall be general and more flexible. Glad to hear more discussion from
> community.

Sure it will be more general and flexible. But there needs to be 1-2
good concrete use cases to justify additional complexity. And I don't
think that so far a good use case has come forward.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 14:11           ` Miklos Szeredi
  (?)
@ 2021-08-17 15:19             ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 15:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: JeffleXu, Dr. David Alan Gilbert, virtualization, virtio-fs-list,
	Joseph Qi, linux-fsdevel

On Tue, Aug 17, 2021 at 04:11:14PM +0200, Miklos Szeredi wrote:

[..]
> > As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> > may compete for limited DAX window resource.
> >
> > Besides, supporting DAX for small files can be expensive. Small files
> > can consume DAX window resource rapidly, and if small files are accessed
> > only once, the cost of mmap/munmap on host can not be ignored.
> 
> That's a good point.   Maybe we should disable DAX for file sizes much
> smaller than the chunk size?

This indeed seems like a valid concern. 2MB chunk size will consume
512 struct page entries. If an entry is 64 bytes in size, then that's
32K RAM used to access 4K bytes of file. Does not sound like good usage
of resources.

If we end up selectively disabling dax based on file size, two things
come to me mind.

- Will be good if it is users can opt-in for this behavior. There
  might be a class of users who always want to enable dax on all
  files.

- Secondly, we will have to figure out how to do it safely in the
  event of shared filesystem where file size can change suddenly.
  Will need to make sure change from dax to no-dax and vice-versa
  is safe w.r.t page cache and other paths.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 15:19             ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 15:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, linux-fsdevel

On Tue, Aug 17, 2021 at 04:11:14PM +0200, Miklos Szeredi wrote:

[..]
> > As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> > may compete for limited DAX window resource.
> >
> > Besides, supporting DAX for small files can be expensive. Small files
> > can consume DAX window resource rapidly, and if small files are accessed
> > only once, the cost of mmap/munmap on host can not be ignored.
> 
> That's a good point.   Maybe we should disable DAX for file sizes much
> smaller than the chunk size?

This indeed seems like a valid concern. 2MB chunk size will consume
512 struct page entries. If an entry is 64 bytes in size, then that's
32K RAM used to access 4K bytes of file. Does not sound like good usage
of resources.

If we end up selectively disabling dax based on file size, two things
come to me mind.

- Will be good if it is users can opt-in for this behavior. There
  might be a class of users who always want to enable dax on all
  files.

- Secondly, we will have to figure out how to do it safely in the
  event of shared filesystem where file size can change suddenly.
  Will need to make sure change from dax to no-dax and vice-versa
  is safe w.r.t page cache and other paths.

Thanks
Vivek

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-17 15:19             ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-17 15:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel, JeffleXu

On Tue, Aug 17, 2021 at 04:11:14PM +0200, Miklos Szeredi wrote:

[..]
> > As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
> > may compete for limited DAX window resource.
> >
> > Besides, supporting DAX for small files can be expensive. Small files
> > can consume DAX window resource rapidly, and if small files are accessed
> > only once, the cost of mmap/munmap on host can not be ignored.
> 
> That's a good point.   Maybe we should disable DAX for file sizes much
> smaller than the chunk size?

This indeed seems like a valid concern. 2MB chunk size will consume
512 struct page entries. If an entry is 64 bytes in size, then that's
32K RAM used to access 4K bytes of file. Does not sound like good usage
of resources.

If we end up selectively disabling dax based on file size, two things
come to me mind.

- Will be good if it is users can opt-in for this behavior. There
  might be a class of users who always want to enable dax on all
  files.

- Secondly, we will have to figure out how to do it safely in the
  event of shared filesystem where file size can change suddenly.
  Will need to make sure change from dax to no-dax and vice-versa
  is safe w.r.t page cache and other paths.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
  2021-08-17  2:23     ` Jeffle Xu
  (?)
@ 2021-08-17 17:15       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 17:15 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> In FUSE_INIT negotiating phase, server/client should advertise if it
> supports per-file DAX.
> 
> Once advertising support for per-file DAX feature, virtiofsd should
> support storing FS_DAX_FL flag persistently passed by
> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> 
> Currently only ext4/xfs since linux kernel v5.8 support storing
> FS_DAX_FL flag persistently, and thus advertise support for per-file
> DAX feature only when the backend fs type is ext4 and xfs.

I'm a little worried about the meaning of the flags we're storing and
the fact we're storing them in the normal host DAX flags.

Doesn't this mean that we're using a single host flag to mean:
  a) It can be mapped as DAX on the host if it was a real DAX device
  b) We can map it as DAX inside the guest with virtiofs?

what happens when we're using usernamespaces for the guest?

Dave


> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_common.h    |  5 +++++
>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
>  3 files changed, 40 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index 8a75729be9..ee6fc64c23 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -372,6 +372,11 @@ struct fuse_file_info {
>   */
>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
>  
> +/**
> + * Indicates support for per-file DAX.
> + */
> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> +
>  /**
>   * Ioctl flags
>   *
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 50fc5c8d5a..04a4f17423 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>      }
> +    if (arg->flags & FUSE_PERFILE_DAX) {
> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> +    }
>  #ifdef HAVE_SPLICE
>  #ifdef HAVE_VMSPLICE
>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
>          outarg.flags |= FUSE_POSIX_ACL;
>      }
> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> +        outarg.flags |= FUSE_PERFILE_DAX;
> +    }
>      outarg.max_readahead = se->conn.max_readahead;
>      outarg.max_write = se->conn.max_write;
>      if (se->conn.max_background >= (1 << 16)) {
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index e170b17adb..5b6228210f 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -53,8 +53,10 @@
>  #include <sys/syscall.h>
>  #include <sys/wait.h>
>  #include <sys/xattr.h>
> +#include <sys/vfs.h>
>  #include <syslog.h>
>  #include <linux/fs.h>
> +#include <linux/magic.h>
>  
>  #include "qemu/cutils.h"
>  #include "passthrough_helpers.h"
> @@ -136,6 +138,13 @@ enum {
>      SANDBOX_CHROOT,
>  };
>  
> +/* capability of storing DAX flag persistently */
> +enum {
> +    DAX_CAP_NONE,  /* not supported */
> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> +};
> +
>  typedef struct xattr_map_entry {
>      char *key;
>      char *prepend;
> @@ -161,6 +170,7 @@ struct lo_data {
>      int readdirplus_clear;
>      int allow_direct_io;
>      int announce_submounts;
> +    int perfile_dax_cap; /* capability of backend fs */
>      bool use_statx;
>      struct lo_inode root;
>      GHashTable *inodes; /* protected by lo->mutex */
> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
>          lo->killpriv_v2 = 0;
>      }
> +
> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> +        conn->want |= FUSE_CAP_PERFILE_DAX;
> +    }
>  }
>  
>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>      int fd, res;
>      struct stat stat;
>      uint64_t mnt_id;
> +    struct statfs statfs;
>  
>      fd = open("/", O_PATH);
>      if (fd == -1) {
> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>          root->posix_locks = g_hash_table_new_full(
>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
>      }
> +
> +    /*
> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> +     * FS_IOC_FSG[S]ETXATTR ioctl.
> +     */
> +    res = fstatfs(fd, &statfs);
> +    if (!res) {
> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> +    }
>  }
>  
>  static guint lo_key_hash(gconstpointer key)
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-17 17:15       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 17:15 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> In FUSE_INIT negotiating phase, server/client should advertise if it
> supports per-file DAX.
> 
> Once advertising support for per-file DAX feature, virtiofsd should
> support storing FS_DAX_FL flag persistently passed by
> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> 
> Currently only ext4/xfs since linux kernel v5.8 support storing
> FS_DAX_FL flag persistently, and thus advertise support for per-file
> DAX feature only when the backend fs type is ext4 and xfs.

I'm a little worried about the meaning of the flags we're storing and
the fact we're storing them in the normal host DAX flags.

Doesn't this mean that we're using a single host flag to mean:
  a) It can be mapped as DAX on the host if it was a real DAX device
  b) We can map it as DAX inside the guest with virtiofs?

what happens when we're using usernamespaces for the guest?

Dave


> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_common.h    |  5 +++++
>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
>  3 files changed, 40 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index 8a75729be9..ee6fc64c23 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -372,6 +372,11 @@ struct fuse_file_info {
>   */
>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
>  
> +/**
> + * Indicates support for per-file DAX.
> + */
> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> +
>  /**
>   * Ioctl flags
>   *
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 50fc5c8d5a..04a4f17423 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>      }
> +    if (arg->flags & FUSE_PERFILE_DAX) {
> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> +    }
>  #ifdef HAVE_SPLICE
>  #ifdef HAVE_VMSPLICE
>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
>          outarg.flags |= FUSE_POSIX_ACL;
>      }
> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> +        outarg.flags |= FUSE_PERFILE_DAX;
> +    }
>      outarg.max_readahead = se->conn.max_readahead;
>      outarg.max_write = se->conn.max_write;
>      if (se->conn.max_background >= (1 << 16)) {
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index e170b17adb..5b6228210f 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -53,8 +53,10 @@
>  #include <sys/syscall.h>
>  #include <sys/wait.h>
>  #include <sys/xattr.h>
> +#include <sys/vfs.h>
>  #include <syslog.h>
>  #include <linux/fs.h>
> +#include <linux/magic.h>
>  
>  #include "qemu/cutils.h"
>  #include "passthrough_helpers.h"
> @@ -136,6 +138,13 @@ enum {
>      SANDBOX_CHROOT,
>  };
>  
> +/* capability of storing DAX flag persistently */
> +enum {
> +    DAX_CAP_NONE,  /* not supported */
> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> +};
> +
>  typedef struct xattr_map_entry {
>      char *key;
>      char *prepend;
> @@ -161,6 +170,7 @@ struct lo_data {
>      int readdirplus_clear;
>      int allow_direct_io;
>      int announce_submounts;
> +    int perfile_dax_cap; /* capability of backend fs */
>      bool use_statx;
>      struct lo_inode root;
>      GHashTable *inodes; /* protected by lo->mutex */
> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
>          lo->killpriv_v2 = 0;
>      }
> +
> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> +        conn->want |= FUSE_CAP_PERFILE_DAX;
> +    }
>  }
>  
>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>      int fd, res;
>      struct stat stat;
>      uint64_t mnt_id;
> +    struct statfs statfs;
>  
>      fd = open("/", O_PATH);
>      if (fd == -1) {
> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>          root->posix_locks = g_hash_table_new_full(
>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
>      }
> +
> +    /*
> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> +     * FS_IOC_FSG[S]ETXATTR ioctl.
> +     */
> +    res = fstatfs(fd, &statfs);
> +    if (!res) {
> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> +    }
>  }
>  
>  static guint lo_key_hash(gconstpointer key)
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-17 17:15       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 17:15 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> In FUSE_INIT negotiating phase, server/client should advertise if it
> supports per-file DAX.
> 
> Once advertising support for per-file DAX feature, virtiofsd should
> support storing FS_DAX_FL flag persistently passed by
> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> 
> Currently only ext4/xfs since linux kernel v5.8 support storing
> FS_DAX_FL flag persistently, and thus advertise support for per-file
> DAX feature only when the backend fs type is ext4 and xfs.

I'm a little worried about the meaning of the flags we're storing and
the fact we're storing them in the normal host DAX flags.

Doesn't this mean that we're using a single host flag to mean:
  a) It can be mapped as DAX on the host if it was a real DAX device
  b) We can map it as DAX inside the guest with virtiofs?

what happens when we're using usernamespaces for the guest?

Dave


> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_common.h    |  5 +++++
>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
>  3 files changed, 40 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> index 8a75729be9..ee6fc64c23 100644
> --- a/tools/virtiofsd/fuse_common.h
> +++ b/tools/virtiofsd/fuse_common.h
> @@ -372,6 +372,11 @@ struct fuse_file_info {
>   */
>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
>  
> +/**
> + * Indicates support for per-file DAX.
> + */
> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> +
>  /**
>   * Ioctl flags
>   *
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index 50fc5c8d5a..04a4f17423 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>      }
> +    if (arg->flags & FUSE_PERFILE_DAX) {
> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> +    }
>  #ifdef HAVE_SPLICE
>  #ifdef HAVE_VMSPLICE
>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
>          outarg.flags |= FUSE_POSIX_ACL;
>      }
> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> +        outarg.flags |= FUSE_PERFILE_DAX;
> +    }
>      outarg.max_readahead = se->conn.max_readahead;
>      outarg.max_write = se->conn.max_write;
>      if (se->conn.max_background >= (1 << 16)) {
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index e170b17adb..5b6228210f 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -53,8 +53,10 @@
>  #include <sys/syscall.h>
>  #include <sys/wait.h>
>  #include <sys/xattr.h>
> +#include <sys/vfs.h>
>  #include <syslog.h>
>  #include <linux/fs.h>
> +#include <linux/magic.h>
>  
>  #include "qemu/cutils.h"
>  #include "passthrough_helpers.h"
> @@ -136,6 +138,13 @@ enum {
>      SANDBOX_CHROOT,
>  };
>  
> +/* capability of storing DAX flag persistently */
> +enum {
> +    DAX_CAP_NONE,  /* not supported */
> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> +};
> +
>  typedef struct xattr_map_entry {
>      char *key;
>      char *prepend;
> @@ -161,6 +170,7 @@ struct lo_data {
>      int readdirplus_clear;
>      int allow_direct_io;
>      int announce_submounts;
> +    int perfile_dax_cap; /* capability of backend fs */
>      bool use_statx;
>      struct lo_inode root;
>      GHashTable *inodes; /* protected by lo->mutex */
> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
>          lo->killpriv_v2 = 0;
>      }
> +
> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> +        conn->want |= FUSE_CAP_PERFILE_DAX;
> +    }
>  }
>  
>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>      int fd, res;
>      struct stat stat;
>      uint64_t mnt_id;
> +    struct statfs statfs;
>  
>      fd = open("/", O_PATH);
>      if (fd == -1) {
> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>          root->posix_locks = g_hash_table_new_full(
>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
>      }
> +
> +    /*
> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> +     * FS_IOC_FSG[S]ETXATTR ioctl.
> +     */
> +    res = fstatfs(fd, &statfs);
> +    if (!res) {
> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> +    }
>  }
>  
>  static guint lo_key_hash(gconstpointer key)
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
  2021-08-17  2:23     ` Jeffle Xu
  (?)
@ 2021-08-17 19:00       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 19:00 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> For passthrough, when the corresponding virtiofs in guest is mounted
> with '-o dax=inode', advertise that the file is capable of per-file
> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 5b6228210f..4cbd904248 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -171,6 +171,7 @@ struct lo_data {
>      int allow_direct_io;
>      int announce_submounts;
>      int perfile_dax_cap; /* capability of backend fs */
> +    bool perfile_dax; /* enable per-file DAX or not */
>      bool use_statx;
>      struct lo_inode root;
>      GHashTable *inodes; /* protected by lo->mutex */
> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>  
>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>          conn->want |= FUSE_CAP_PERFILE_DAX;
> +	lo->perfile_dax = 1;
> +    }
> +    else {
> +	lo->perfile_dax = 0;
>      }
>  }
>  
> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> + * enabled for this file.
> + */
> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> +				 const char *name)
> +{
> +    int res, fd;
> +    int ret = false;;
> +    unsigned int attr;
> +    struct fsxattr xattr;
> +
> +    if (!lo->perfile_dax)
> +	return false;
> +
> +    /* Open file without O_PATH, so that ioctl can be called. */
> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> +    if (fd == -1)
> +        return false;

Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
might stumble into a /dev node or something else we're not allowed to
open?

> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
> +        if (!res && (attr & FS_DAX_FL))
> +	    ret = true;
> +    }
> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
> +	    ret = true;
> +    }

This all looks pretty expensive for each lookup.

Dave


> +    close(fd);
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
> @@ -1038,6 +1078,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> +    if (lo_should_enable_dax(lo, dir, name))
> +	e->attr_flags |= FUSE_ATTR_DAX;
> +
>      inode = lo_find(lo, &e->attr, mnt_id);
>      if (inode) {
>          close(newfd);
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-17 19:00       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 19:00 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> For passthrough, when the corresponding virtiofs in guest is mounted
> with '-o dax=inode', advertise that the file is capable of per-file
> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 5b6228210f..4cbd904248 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -171,6 +171,7 @@ struct lo_data {
>      int allow_direct_io;
>      int announce_submounts;
>      int perfile_dax_cap; /* capability of backend fs */
> +    bool perfile_dax; /* enable per-file DAX or not */
>      bool use_statx;
>      struct lo_inode root;
>      GHashTable *inodes; /* protected by lo->mutex */
> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>  
>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>          conn->want |= FUSE_CAP_PERFILE_DAX;
> +	lo->perfile_dax = 1;
> +    }
> +    else {
> +	lo->perfile_dax = 0;
>      }
>  }
>  
> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> + * enabled for this file.
> + */
> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> +				 const char *name)
> +{
> +    int res, fd;
> +    int ret = false;;
> +    unsigned int attr;
> +    struct fsxattr xattr;
> +
> +    if (!lo->perfile_dax)
> +	return false;
> +
> +    /* Open file without O_PATH, so that ioctl can be called. */
> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> +    if (fd == -1)
> +        return false;

Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
might stumble into a /dev node or something else we're not allowed to
open?

> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
> +        if (!res && (attr & FS_DAX_FL))
> +	    ret = true;
> +    }
> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
> +	    ret = true;
> +    }

This all looks pretty expensive for each lookup.

Dave


> +    close(fd);
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
> @@ -1038,6 +1078,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> +    if (lo_should_enable_dax(lo, dir, name))
> +	e->attr_flags |= FUSE_ATTR_DAX;
> +
>      inode = lo_find(lo, &e->attr, mnt_id);
>      if (inode) {
>          close(newfd);
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-17 19:00       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-17 19:00 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal

* Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> For passthrough, when the corresponding virtiofs in guest is mounted
> with '-o dax=inode', advertise that the file is capable of per-file
> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index 5b6228210f..4cbd904248 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -171,6 +171,7 @@ struct lo_data {
>      int allow_direct_io;
>      int announce_submounts;
>      int perfile_dax_cap; /* capability of backend fs */
> +    bool perfile_dax; /* enable per-file DAX or not */
>      bool use_statx;
>      struct lo_inode root;
>      GHashTable *inodes; /* protected by lo->mutex */
> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>  
>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>          conn->want |= FUSE_CAP_PERFILE_DAX;
> +	lo->perfile_dax = 1;
> +    }
> +    else {
> +	lo->perfile_dax = 0;
>      }
>  }
>  
> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>      return 0;
>  }
>  
> +/*
> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> + * enabled for this file.
> + */
> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> +				 const char *name)
> +{
> +    int res, fd;
> +    int ret = false;;
> +    unsigned int attr;
> +    struct fsxattr xattr;
> +
> +    if (!lo->perfile_dax)
> +	return false;
> +
> +    /* Open file without O_PATH, so that ioctl can be called. */
> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> +    if (fd == -1)
> +        return false;

Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
might stumble into a /dev node or something else we're not allowed to
open?

> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
> +        if (!res && (attr & FS_DAX_FL))
> +	    ret = true;
> +    }
> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
> +	    ret = true;
> +    }

This all looks pretty expensive for each lookup.

Dave


> +    close(fd);
> +    return ret;
> +}
> +
>  /*
>   * Increments nlookup on the inode on success. unref_inode_lolocked() must be
>   * called eventually to decrement nlookup again. If inodep is non-NULL, the
> @@ -1038,6 +1078,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
>          e->attr_flags |= FUSE_ATTR_SUBMOUNT;
>      }
>  
> +    if (lo_should_enable_dax(lo, dir, name))
> +	e->attr_flags |= FUSE_ATTR_DAX;
> +
>      inode = lo_find(lo, &e->attr, mnt_id);
>      if (inode) {
>          close(newfd);
> -- 
> 2.27.0
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 14:08         ` [Virtio-fs] " Miklos Szeredi
  (?)
@ 2021-08-18  3:39           ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  3:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo



On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded.
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> That could be achieved with a plain fuse filesystem on the host (which
> will get 4k READ requests for accesses to mapped area inside guest).
> Since this can be done selectively for files which are not yet
> downloaded, the extra layer wouldn't be a performance problem.

I'm not sure if I fully understand your idea. Then in this case, host
daemon only prepares 4KB while guest thinks that the whole DAX window
(e.g., 2MB) has been fully mapped. Then when guest really accesses the
remained part (2MB - 4KB), page fault is triggered, and now host daemon
is responsible for downloading the remained part?

> 
> Is there a reason why that wouldn't work?
> 
> Thanks,
> Miklos
> 

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18  3:39           ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  3:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: virtualization, virtio-fs-list, Joseph Qi, Liu Bo,
	Stefan Hajnoczi, linux-fsdevel, Vivek Goyal



On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded.
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> That could be achieved with a plain fuse filesystem on the host (which
> will get 4k READ requests for accesses to mapped area inside guest).
> Since this can be done selectively for files which are not yet
> downloaded, the extra layer wouldn't be a performance problem.

I'm not sure if I fully understand your idea. Then in this case, host
daemon only prepares 4KB while guest thinks that the whole DAX window
(e.g., 2MB) has been fully mapped. Then when guest really accesses the
remained part (2MB - 4KB), page fault is triggered, and now host daemon
is responsible for downloading the remained part?

> 
> Is there a reason why that wouldn't work?
> 
> Thanks,
> Miklos
> 

-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18  3:39           ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  3:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal



On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded.
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> That could be achieved with a plain fuse filesystem on the host (which
> will get 4k READ requests for accesses to mapped area inside guest).
> Since this can be done selectively for files which are not yet
> downloaded, the extra layer wouldn't be a performance problem.

I'm not sure if I fully understand your idea. Then in this case, host
daemon only prepares 4KB while guest thinks that the whole DAX window
(e.g., 2MB) has been fully mapped. Then when guest really accesses the
remained part (2MB - 4KB), page fault is triggered, and now host daemon
is responsible for downloading the remained part?

> 
> Is there a reason why that wouldn't work?
> 
> Thanks,
> Miklos
> 

-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-18  3:39           ` JeffleXu
@ 2021-08-18  5:08             ` Miklos Szeredi
  -1 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-18  5:08 UTC (permalink / raw)
  To: JeffleXu
  Cc: Vivek Goyal, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo

On Wed, 18 Aug 2021 at 05:40, JeffleXu <jefflexu@linux.alibaba.com> wrote:

> I'm not sure if I fully understand your idea. Then in this case, host
> daemon only prepares 4KB while guest thinks that the whole DAX window
> (e.g., 2MB) has been fully mapped. Then when guest really accesses the
> remained part (2MB - 4KB), page fault is triggered, and now host daemon
> is responsible for downloading the remained part?

Yes.  Mapping an area just means setting up the page tables, it does
not result in actual data transfer.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18  5:08             ` Miklos Szeredi
  0 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-08-18  5:08 UTC (permalink / raw)
  To: JeffleXu
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal

On Wed, 18 Aug 2021 at 05:40, JeffleXu <jefflexu@linux.alibaba.com> wrote:

> I'm not sure if I fully understand your idea. Then in this case, host
> daemon only prepares 4KB while guest thinks that the whole DAX window
> (e.g., 2MB) has been fully mapped. Then when guest really accesses the
> remained part (2MB - 4KB), page fault is triggered, and now host daemon
> is responsible for downloading the remained part?

Yes.  Mapping an area just means setting up the page tables, it does
not result in actual data transfer.

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 14:54           ` Vivek Goyal
  (?)
@ 2021-08-18  5:10             ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, Joseph Qi, linux-fsdevel



On 8/17/21 10:54 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:08:35PM +0800, JeffleXu wrote:
>>
>>
>> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
>>> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
>>> <dgilbert@redhat.com> wrote:
>>>>
>>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>>
>>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>>
>>>>> Can you please explain the background of this change in detail?
>>>>>
>>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>>> and not for others?
>>>>
>>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>>> sense; it's quite expensive for us to map something into the cache
>>>> (especially if we push something else out), so selectively DAXing files
>>>> that are expected to be hot could help reduce cache churn.
>>>
>>> If this is a performance issue, it should be fixed in a way that
>>> doesn't require hand tuning like you suggest, I think.
>>>
>>> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
>>> can help understand the virtiofs case as well.
>>>
>>
>> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
>>
>> "Boaz Harrosh wondered why someone might want to turn DAX off for a
>> persistent memory device. Hellwig said that the performance "could
>> suck"; Williams noted that the page cache could be useful for some
>> applications as well. Jan Kara pointed out that reads from persistent
>> memory are close to DRAM speed, but that writes are not; the page cache
>> could be helpful for frequent writes. Applications need to change to
>> fully take advantage of DAX, Williams said; part of the promise of
>> adding a flag is that users can do DAX on smaller granularities than a
>> full filesystem."
>>
>> In summary, page cache is preferable in some cases, and thus more fine
>> grained way of DAX control is needed.
> 
> In case of virtiofs, we are using page cache on host. So this probably
> is not a factor for us. Writes will go in page cache of host.
> 
>>
>>
>> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
>> may compete for limited DAX window resource.
>>
>> Besides, supporting DAX for small files can be expensive. Small files
>> can consume DAX window resource rapidly, and if small files are accessed
>> only once, the cost of mmap/munmap on host can not be ignored.
> 
> W.r.r access pattern, same applies to large files also. So if a section
> of large file is accessed only once, it will consume dax window as well
> and will have to be reclaimed.
> 
> Dax in virtiofs provides speed gain only if map file once and access
> it multiple times. If that pattern does not hold true, then dax does
> not seem to provide speed gains and in fact might be slower than
> non-dax.
> 
> So if there is a pattern where we know some files are accessed repeatedly
> while others are not, then enabling/disabling dax selectively will make
> sense. Question is how many workloads really know that and how will
> you make that decision. Do you have any data to back that up.

There's no precise performance data yet. Empirically, small files used
to have worse performance with dax, while frequently accessed files
(such as .so libraries) behave better with dax.

> 
> W.r.t small file, is that a real concern. If that file is being accessed
> mutliple times, then we will still see the speed gain. Only down side
> is that there is little wastage of resources because our minimum dax
> mapping granularity is 2MB. I am wondering can we handle that by
> supporting other dax mapping granularities as well. say 256K and let
> users choose it.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18  5:10             ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, Joseph Qi, linux-fsdevel



On 8/17/21 10:54 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:08:35PM +0800, JeffleXu wrote:
>>
>>
>> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
>>> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
>>> <dgilbert@redhat.com> wrote:
>>>>
>>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>>
>>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>>
>>>>> Can you please explain the background of this change in detail?
>>>>>
>>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>>> and not for others?
>>>>
>>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>>> sense; it's quite expensive for us to map something into the cache
>>>> (especially if we push something else out), so selectively DAXing files
>>>> that are expected to be hot could help reduce cache churn.
>>>
>>> If this is a performance issue, it should be fixed in a way that
>>> doesn't require hand tuning like you suggest, I think.
>>>
>>> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
>>> can help understand the virtiofs case as well.
>>>
>>
>> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
>>
>> "Boaz Harrosh wondered why someone might want to turn DAX off for a
>> persistent memory device. Hellwig said that the performance "could
>> suck"; Williams noted that the page cache could be useful for some
>> applications as well. Jan Kara pointed out that reads from persistent
>> memory are close to DRAM speed, but that writes are not; the page cache
>> could be helpful for frequent writes. Applications need to change to
>> fully take advantage of DAX, Williams said; part of the promise of
>> adding a flag is that users can do DAX on smaller granularities than a
>> full filesystem."
>>
>> In summary, page cache is preferable in some cases, and thus more fine
>> grained way of DAX control is needed.
> 
> In case of virtiofs, we are using page cache on host. So this probably
> is not a factor for us. Writes will go in page cache of host.
> 
>>
>>
>> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
>> may compete for limited DAX window resource.
>>
>> Besides, supporting DAX for small files can be expensive. Small files
>> can consume DAX window resource rapidly, and if small files are accessed
>> only once, the cost of mmap/munmap on host can not be ignored.
> 
> W.r.r access pattern, same applies to large files also. So if a section
> of large file is accessed only once, it will consume dax window as well
> and will have to be reclaimed.
> 
> Dax in virtiofs provides speed gain only if map file once and access
> it multiple times. If that pattern does not hold true, then dax does
> not seem to provide speed gains and in fact might be slower than
> non-dax.
> 
> So if there is a pattern where we know some files are accessed repeatedly
> while others are not, then enabling/disabling dax selectively will make
> sense. Question is how many workloads really know that and how will
> you make that decision. Do you have any data to back that up.

There's no precise performance data yet. Empirically, small files used
to have worse performance with dax, while frequently accessed files
(such as .so libraries) behave better with dax.

> 
> W.r.t small file, is that a real concern. If that file is being accessed
> mutliple times, then we will still see the speed gain. Only down side
> is that there is little wastage of resources because our minimum dax
> mapping granularity is 2MB. I am wondering can we handle that by
> supporting other dax mapping granularities as well. say 256K and let
> users choose it.


-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18  5:10             ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:10 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel



On 8/17/21 10:54 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:08:35PM +0800, JeffleXu wrote:
>>
>>
>> On 8/17/21 6:09 PM, Miklos Szeredi wrote:
>>> On Tue, 17 Aug 2021 at 11:32, Dr. David Alan Gilbert
>>> <dgilbert@redhat.com> wrote:
>>>>
>>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>>
>>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>>
>>>>> Can you please explain the background of this change in detail?
>>>>>
>>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>>> and not for others?
>>>>
>>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>>> sense; it's quite expensive for us to map something into the cache
>>>> (especially if we push something else out), so selectively DAXing files
>>>> that are expected to be hot could help reduce cache churn.
>>>
>>> If this is a performance issue, it should be fixed in a way that
>>> doesn't require hand tuning like you suggest, I think.
>>>
>>> I'm not sure what the  ext4/xfs case for per-file DAX is.  Maybe that
>>> can help understand the virtiofs case as well.
>>>
>>
>> Some hints why ext4/xfs support per-file DAX can be found [1] and [2].
>>
>> "Boaz Harrosh wondered why someone might want to turn DAX off for a
>> persistent memory device. Hellwig said that the performance "could
>> suck"; Williams noted that the page cache could be useful for some
>> applications as well. Jan Kara pointed out that reads from persistent
>> memory are close to DRAM speed, but that writes are not; the page cache
>> could be helpful for frequent writes. Applications need to change to
>> fully take advantage of DAX, Williams said; part of the promise of
>> adding a flag is that users can do DAX on smaller granularities than a
>> full filesystem."
>>
>> In summary, page cache is preferable in some cases, and thus more fine
>> grained way of DAX control is needed.
> 
> In case of virtiofs, we are using page cache on host. So this probably
> is not a factor for us. Writes will go in page cache of host.
> 
>>
>>
>> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
>> may compete for limited DAX window resource.
>>
>> Besides, supporting DAX for small files can be expensive. Small files
>> can consume DAX window resource rapidly, and if small files are accessed
>> only once, the cost of mmap/munmap on host can not be ignored.
> 
> W.r.r access pattern, same applies to large files also. So if a section
> of large file is accessed only once, it will consume dax window as well
> and will have to be reclaimed.
> 
> Dax in virtiofs provides speed gain only if map file once and access
> it multiple times. If that pattern does not hold true, then dax does
> not seem to provide speed gains and in fact might be slower than
> non-dax.
> 
> So if there is a pattern where we know some files are accessed repeatedly
> while others are not, then enabling/disabling dax selectively will make
> sense. Question is how many workloads really know that and how will
> you make that decision. Do you have any data to back that up.

There's no precise performance data yet. Empirically, small files used
to have worse performance with dax, while frequently accessed files
(such as .so libraries) behave better with dax.

> 
> W.r.t small file, is that a real concern. If that file is being accessed
> mutliple times, then we will still see the speed gain. Only down side
> is that there is little wastage of resources because our minimum dax
> mapping granularity is 2MB. I am wondering can we handle that by
> supporting other dax mapping granularities as well. say 256K and let
> users choose it.


-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 14:57         ` Vivek Goyal
  (?)
@ 2021-08-18  5:20           ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:20 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo



On 8/17/21 10:57 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded. 
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> Downloading 2MB chunk should not be a big issue (IMHO). 

Then the latency increases. Latency really matters in our use case.


> And if this
> turns out to be real concern, we could experiment with a smaller
> mapping granularity.
> 


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18  5:20           ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:20 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi,
	Liu Bo, Stefan Hajnoczi, linux-fsdevel



On 8/17/21 10:57 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded. 
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> Downloading 2MB chunk should not be a big issue (IMHO). 

Then the latency increases. Latency really matters in our use case.


> And if this
> turns out to be real concern, we could experiment with a smaller
> mapping granularity.
> 


-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18  5:20           ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:20 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel



On 8/17/21 10:57 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded. 
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> Downloading 2MB chunk should not be a big issue (IMHO). 

Then the latency increases. Latency really matters in our use case.


> And if this
> turns out to be real concern, we could experiment with a smaller
> mapping granularity.
> 


-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
  2021-08-17 17:15       ` Dr. David Alan Gilbert
  (?)
@ 2021-08-18  5:28         ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization



On 8/18/21 1:15 AM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> In FUSE_INIT negotiating phase, server/client should advertise if it
>> supports per-file DAX.
>>
>> Once advertising support for per-file DAX feature, virtiofsd should
>> support storing FS_DAX_FL flag persistently passed by
>> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
>> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
>>
>> Currently only ext4/xfs since linux kernel v5.8 support storing
>> FS_DAX_FL flag persistently, and thus advertise support for per-file
>> DAX feature only when the backend fs type is ext4 and xfs.
> 
> I'm a little worried about the meaning of the flags we're storing and
> the fact we're storing them in the normal host DAX flags.
> 
> Doesn't this mean that we're using a single host flag to mean:
>   a) It can be mapped as DAX on the host if it was a real DAX device
>   b) We can map it as DAX inside the guest with virtiofs?

Yes the side effect is that the host file is also dax enabled if the
backend fs is built upon real nvdimm device.

The rationale here is that, fuse daemon shall be capable of *marking*
the file as dax capable *persistently*, so that it can be informed that
this file is capable of dax later.

I'm not sure if xattr (extent attribute) is a better option for this?


> 
> what happens when we're using usernamespaces for the guest?
> 
> Dave
> 
> 
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  tools/virtiofsd/fuse_common.h    |  5 +++++
>>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
>>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
>>  3 files changed, 40 insertions(+)
>>
>> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
>> index 8a75729be9..ee6fc64c23 100644
>> --- a/tools/virtiofsd/fuse_common.h
>> +++ b/tools/virtiofsd/fuse_common.h
>> @@ -372,6 +372,11 @@ struct fuse_file_info {
>>   */
>>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
>>  
>> +/**
>> + * Indicates support for per-file DAX.
>> + */
>> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
>> +
>>  /**
>>   * Ioctl flags
>>   *
>> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
>> index 50fc5c8d5a..04a4f17423 100644
>> --- a/tools/virtiofsd/fuse_lowlevel.c
>> +++ b/tools/virtiofsd/fuse_lowlevel.c
>> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>>      }
>> +    if (arg->flags & FUSE_PERFILE_DAX) {
>> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
>> +    }
>>  #ifdef HAVE_SPLICE
>>  #ifdef HAVE_VMSPLICE
>>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
>> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
>>          outarg.flags |= FUSE_POSIX_ACL;
>>      }
>> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
>> +        outarg.flags |= FUSE_PERFILE_DAX;
>> +    }
>>      outarg.max_readahead = se->conn.max_readahead;
>>      outarg.max_write = se->conn.max_write;
>>      if (se->conn.max_background >= (1 << 16)) {
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index e170b17adb..5b6228210f 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -53,8 +53,10 @@
>>  #include <sys/syscall.h>
>>  #include <sys/wait.h>
>>  #include <sys/xattr.h>
>> +#include <sys/vfs.h>
>>  #include <syslog.h>
>>  #include <linux/fs.h>
>> +#include <linux/magic.h>
>>  
>>  #include "qemu/cutils.h"
>>  #include "passthrough_helpers.h"
>> @@ -136,6 +138,13 @@ enum {
>>      SANDBOX_CHROOT,
>>  };
>>  
>> +/* capability of storing DAX flag persistently */
>> +enum {
>> +    DAX_CAP_NONE,  /* not supported */
>> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
>> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
>> +};
>> +
>>  typedef struct xattr_map_entry {
>>      char *key;
>>      char *prepend;
>> @@ -161,6 +170,7 @@ struct lo_data {
>>      int readdirplus_clear;
>>      int allow_direct_io;
>>      int announce_submounts;
>> +    int perfile_dax_cap; /* capability of backend fs */
>>      bool use_statx;
>>      struct lo_inode root;
>>      GHashTable *inodes; /* protected by lo->mutex */
>> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
>>          lo->killpriv_v2 = 0;
>>      }
>> +
>> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>> +        conn->want |= FUSE_CAP_PERFILE_DAX;
>> +    }
>>  }
>>  
>>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
>> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>>      int fd, res;
>>      struct stat stat;
>>      uint64_t mnt_id;
>> +    struct statfs statfs;
>>  
>>      fd = open("/", O_PATH);
>>      if (fd == -1) {
>> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>>          root->posix_locks = g_hash_table_new_full(
>>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
>>      }
>> +
>> +    /*
>> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
>> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
>> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
>> +     * FS_IOC_FSG[S]ETXATTR ioctl.
>> +     */
>> +    res = fstatfs(fd, &statfs);
>> +    if (!res) {
>> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
>> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
>> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
>> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
>> +    }
>>  }
>>  
>>  static guint lo_key_hash(gconstpointer key)
>> -- 
>> 2.27.0
>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>>

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-18  5:28         ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal



On 8/18/21 1:15 AM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> In FUSE_INIT negotiating phase, server/client should advertise if it
>> supports per-file DAX.
>>
>> Once advertising support for per-file DAX feature, virtiofsd should
>> support storing FS_DAX_FL flag persistently passed by
>> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
>> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
>>
>> Currently only ext4/xfs since linux kernel v5.8 support storing
>> FS_DAX_FL flag persistently, and thus advertise support for per-file
>> DAX feature only when the backend fs type is ext4 and xfs.
> 
> I'm a little worried about the meaning of the flags we're storing and
> the fact we're storing them in the normal host DAX flags.
> 
> Doesn't this mean that we're using a single host flag to mean:
>   a) It can be mapped as DAX on the host if it was a real DAX device
>   b) We can map it as DAX inside the guest with virtiofs?

Yes the side effect is that the host file is also dax enabled if the
backend fs is built upon real nvdimm device.

The rationale here is that, fuse daemon shall be capable of *marking*
the file as dax capable *persistently*, so that it can be informed that
this file is capable of dax later.

I'm not sure if xattr (extent attribute) is a better option for this?


> 
> what happens when we're using usernamespaces for the guest?
> 
> Dave
> 
> 
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  tools/virtiofsd/fuse_common.h    |  5 +++++
>>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
>>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
>>  3 files changed, 40 insertions(+)
>>
>> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
>> index 8a75729be9..ee6fc64c23 100644
>> --- a/tools/virtiofsd/fuse_common.h
>> +++ b/tools/virtiofsd/fuse_common.h
>> @@ -372,6 +372,11 @@ struct fuse_file_info {
>>   */
>>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
>>  
>> +/**
>> + * Indicates support for per-file DAX.
>> + */
>> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
>> +
>>  /**
>>   * Ioctl flags
>>   *
>> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
>> index 50fc5c8d5a..04a4f17423 100644
>> --- a/tools/virtiofsd/fuse_lowlevel.c
>> +++ b/tools/virtiofsd/fuse_lowlevel.c
>> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>>      }
>> +    if (arg->flags & FUSE_PERFILE_DAX) {
>> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
>> +    }
>>  #ifdef HAVE_SPLICE
>>  #ifdef HAVE_VMSPLICE
>>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
>> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
>>          outarg.flags |= FUSE_POSIX_ACL;
>>      }
>> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
>> +        outarg.flags |= FUSE_PERFILE_DAX;
>> +    }
>>      outarg.max_readahead = se->conn.max_readahead;
>>      outarg.max_write = se->conn.max_write;
>>      if (se->conn.max_background >= (1 << 16)) {
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index e170b17adb..5b6228210f 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -53,8 +53,10 @@
>>  #include <sys/syscall.h>
>>  #include <sys/wait.h>
>>  #include <sys/xattr.h>
>> +#include <sys/vfs.h>
>>  #include <syslog.h>
>>  #include <linux/fs.h>
>> +#include <linux/magic.h>
>>  
>>  #include "qemu/cutils.h"
>>  #include "passthrough_helpers.h"
>> @@ -136,6 +138,13 @@ enum {
>>      SANDBOX_CHROOT,
>>  };
>>  
>> +/* capability of storing DAX flag persistently */
>> +enum {
>> +    DAX_CAP_NONE,  /* not supported */
>> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
>> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
>> +};
>> +
>>  typedef struct xattr_map_entry {
>>      char *key;
>>      char *prepend;
>> @@ -161,6 +170,7 @@ struct lo_data {
>>      int readdirplus_clear;
>>      int allow_direct_io;
>>      int announce_submounts;
>> +    int perfile_dax_cap; /* capability of backend fs */
>>      bool use_statx;
>>      struct lo_inode root;
>>      GHashTable *inodes; /* protected by lo->mutex */
>> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
>>          lo->killpriv_v2 = 0;
>>      }
>> +
>> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>> +        conn->want |= FUSE_CAP_PERFILE_DAX;
>> +    }
>>  }
>>  
>>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
>> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>>      int fd, res;
>>      struct stat stat;
>>      uint64_t mnt_id;
>> +    struct statfs statfs;
>>  
>>      fd = open("/", O_PATH);
>>      if (fd == -1) {
>> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>>          root->posix_locks = g_hash_table_new_full(
>>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
>>      }
>> +
>> +    /*
>> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
>> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
>> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
>> +     * FS_IOC_FSG[S]ETXATTR ioctl.
>> +     */
>> +    res = fstatfs(fd, &statfs);
>> +    if (!res) {
>> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
>> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
>> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
>> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
>> +    }
>>  }
>>  
>>  static guint lo_key_hash(gconstpointer key)
>> -- 
>> 2.27.0
>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>>

-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-18  5:28         ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal



On 8/18/21 1:15 AM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> In FUSE_INIT negotiating phase, server/client should advertise if it
>> supports per-file DAX.
>>
>> Once advertising support for per-file DAX feature, virtiofsd should
>> support storing FS_DAX_FL flag persistently passed by
>> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
>> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
>>
>> Currently only ext4/xfs since linux kernel v5.8 support storing
>> FS_DAX_FL flag persistently, and thus advertise support for per-file
>> DAX feature only when the backend fs type is ext4 and xfs.
> 
> I'm a little worried about the meaning of the flags we're storing and
> the fact we're storing them in the normal host DAX flags.
> 
> Doesn't this mean that we're using a single host flag to mean:
>   a) It can be mapped as DAX on the host if it was a real DAX device
>   b) We can map it as DAX inside the guest with virtiofs?

Yes the side effect is that the host file is also dax enabled if the
backend fs is built upon real nvdimm device.

The rationale here is that, fuse daemon shall be capable of *marking*
the file as dax capable *persistently*, so that it can be informed that
this file is capable of dax later.

I'm not sure if xattr (extent attribute) is a better option for this?


> 
> what happens when we're using usernamespaces for the guest?
> 
> Dave
> 
> 
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  tools/virtiofsd/fuse_common.h    |  5 +++++
>>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
>>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
>>  3 files changed, 40 insertions(+)
>>
>> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
>> index 8a75729be9..ee6fc64c23 100644
>> --- a/tools/virtiofsd/fuse_common.h
>> +++ b/tools/virtiofsd/fuse_common.h
>> @@ -372,6 +372,11 @@ struct fuse_file_info {
>>   */
>>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
>>  
>> +/**
>> + * Indicates support for per-file DAX.
>> + */
>> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
>> +
>>  /**
>>   * Ioctl flags
>>   *
>> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
>> index 50fc5c8d5a..04a4f17423 100644
>> --- a/tools/virtiofsd/fuse_lowlevel.c
>> +++ b/tools/virtiofsd/fuse_lowlevel.c
>> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
>>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
>>      }
>> +    if (arg->flags & FUSE_PERFILE_DAX) {
>> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
>> +    }
>>  #ifdef HAVE_SPLICE
>>  #ifdef HAVE_VMSPLICE
>>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
>> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
>>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
>>          outarg.flags |= FUSE_POSIX_ACL;
>>      }
>> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
>> +        outarg.flags |= FUSE_PERFILE_DAX;
>> +    }
>>      outarg.max_readahead = se->conn.max_readahead;
>>      outarg.max_write = se->conn.max_write;
>>      if (se->conn.max_background >= (1 << 16)) {
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index e170b17adb..5b6228210f 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -53,8 +53,10 @@
>>  #include <sys/syscall.h>
>>  #include <sys/wait.h>
>>  #include <sys/xattr.h>
>> +#include <sys/vfs.h>
>>  #include <syslog.h>
>>  #include <linux/fs.h>
>> +#include <linux/magic.h>
>>  
>>  #include "qemu/cutils.h"
>>  #include "passthrough_helpers.h"
>> @@ -136,6 +138,13 @@ enum {
>>      SANDBOX_CHROOT,
>>  };
>>  
>> +/* capability of storing DAX flag persistently */
>> +enum {
>> +    DAX_CAP_NONE,  /* not supported */
>> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
>> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
>> +};
>> +
>>  typedef struct xattr_map_entry {
>>      char *key;
>>      char *prepend;
>> @@ -161,6 +170,7 @@ struct lo_data {
>>      int readdirplus_clear;
>>      int allow_direct_io;
>>      int announce_submounts;
>> +    int perfile_dax_cap; /* capability of backend fs */
>>      bool use_statx;
>>      struct lo_inode root;
>>      GHashTable *inodes; /* protected by lo->mutex */
>> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
>>          lo->killpriv_v2 = 0;
>>      }
>> +
>> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>> +        conn->want |= FUSE_CAP_PERFILE_DAX;
>> +    }
>>  }
>>  
>>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
>> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>>      int fd, res;
>>      struct stat stat;
>>      uint64_t mnt_id;
>> +    struct statfs statfs;
>>  
>>      fd = open("/", O_PATH);
>>      if (fd == -1) {
>> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
>>          root->posix_locks = g_hash_table_new_full(
>>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
>>      }
>> +
>> +    /*
>> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
>> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
>> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
>> +     * FS_IOC_FSG[S]ETXATTR ioctl.
>> +     */
>> +    res = fstatfs(fd, &statfs);
>> +    if (!res) {
>> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
>> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
>> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
>> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
>> +    }
>>  }
>>  
>>  static guint lo_key_hash(gconstpointer key)
>> -- 
>> 2.27.0
>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>>

-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
  2021-08-17 19:00       ` Dr. David Alan Gilbert
  (?)
@ 2021-08-18  5:46         ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization



On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> For passthrough, when the corresponding virtiofs in guest is mounted
>> with '-o dax=inode', advertise that the file is capable of per-file
>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>>  1 file changed, 43 insertions(+)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 5b6228210f..4cbd904248 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -171,6 +171,7 @@ struct lo_data {
>>      int allow_direct_io;
>>      int announce_submounts;
>>      int perfile_dax_cap; /* capability of backend fs */
>> +    bool perfile_dax; /* enable per-file DAX or not */
>>      bool use_statx;
>>      struct lo_inode root;
>>      GHashTable *inodes; /* protected by lo->mutex */
>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>  
>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>>          conn->want |= FUSE_CAP_PERFILE_DAX;
>> +	lo->perfile_dax = 1;
>> +    }
>> +    else {
>> +	lo->perfile_dax = 0;
>>      }
>>  }
>>  
>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>      return 0;
>>  }
>>  
>> +/*
>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
>> + * enabled for this file.
>> + */
>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
>> +				 const char *name)
>> +{
>> +    int res, fd;
>> +    int ret = false;;
>> +    unsigned int attr;
>> +    struct fsxattr xattr;
>> +
>> +    if (!lo->perfile_dax)
>> +	return false;
>> +
>> +    /* Open file without O_PATH, so that ioctl can be called. */
>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
>> +    if (fd == -1)
>> +        return false;
> 
> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> might stumble into a /dev node or something else we're not allowed to
> open?

As far as I know, virtiofsd will pivot_root/chroot to the source
directory, and can only access files inside the source directory
specified by "-o source=". Then where do these unexpected files come
from? Besides, fd opened without O_PATH here is temporary and used for
FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
function returns.

> 
>> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
>> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
>> +        if (!res && (attr & FS_DAX_FL))
>> +	    ret = true;
>> +    }
>> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
>> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
>> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
>> +	    ret = true;
>> +    }
> 
> This all looks pretty expensive for each lookup.

Yes. it can be somehow optimized if we can agree on the way of storing
the dax flag persistently.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-18  5:46         ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal



On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> For passthrough, when the corresponding virtiofs in guest is mounted
>> with '-o dax=inode', advertise that the file is capable of per-file
>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>>  1 file changed, 43 insertions(+)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 5b6228210f..4cbd904248 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -171,6 +171,7 @@ struct lo_data {
>>      int allow_direct_io;
>>      int announce_submounts;
>>      int perfile_dax_cap; /* capability of backend fs */
>> +    bool perfile_dax; /* enable per-file DAX or not */
>>      bool use_statx;
>>      struct lo_inode root;
>>      GHashTable *inodes; /* protected by lo->mutex */
>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>  
>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>>          conn->want |= FUSE_CAP_PERFILE_DAX;
>> +	lo->perfile_dax = 1;
>> +    }
>> +    else {
>> +	lo->perfile_dax = 0;
>>      }
>>  }
>>  
>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>      return 0;
>>  }
>>  
>> +/*
>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
>> + * enabled for this file.
>> + */
>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
>> +				 const char *name)
>> +{
>> +    int res, fd;
>> +    int ret = false;;
>> +    unsigned int attr;
>> +    struct fsxattr xattr;
>> +
>> +    if (!lo->perfile_dax)
>> +	return false;
>> +
>> +    /* Open file without O_PATH, so that ioctl can be called. */
>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
>> +    if (fd == -1)
>> +        return false;
> 
> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> might stumble into a /dev node or something else we're not allowed to
> open?

As far as I know, virtiofsd will pivot_root/chroot to the source
directory, and can only access files inside the source directory
specified by "-o source=". Then where do these unexpected files come
from? Besides, fd opened without O_PATH here is temporary and used for
FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
function returns.

> 
>> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
>> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
>> +        if (!res && (attr & FS_DAX_FL))
>> +	    ret = true;
>> +    }
>> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
>> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
>> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
>> +	    ret = true;
>> +    }
> 
> This all looks pretty expensive for each lookup.

Yes. it can be somehow optimized if we can agree on the way of storing
the dax flag persistently.

-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-18  5:46         ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-18  5:46 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal



On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>> For passthrough, when the corresponding virtiofs in guest is mounted
>> with '-o dax=inode', advertise that the file is capable of per-file
>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
>>
>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>> ---
>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>>  1 file changed, 43 insertions(+)
>>
>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>> index 5b6228210f..4cbd904248 100644
>> --- a/tools/virtiofsd/passthrough_ll.c
>> +++ b/tools/virtiofsd/passthrough_ll.c
>> @@ -171,6 +171,7 @@ struct lo_data {
>>      int allow_direct_io;
>>      int announce_submounts;
>>      int perfile_dax_cap; /* capability of backend fs */
>> +    bool perfile_dax; /* enable per-file DAX or not */
>>      bool use_statx;
>>      struct lo_inode root;
>>      GHashTable *inodes; /* protected by lo->mutex */
>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>  
>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>>          conn->want |= FUSE_CAP_PERFILE_DAX;
>> +	lo->perfile_dax = 1;
>> +    }
>> +    else {
>> +	lo->perfile_dax = 0;
>>      }
>>  }
>>  
>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>      return 0;
>>  }
>>  
>> +/*
>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
>> + * enabled for this file.
>> + */
>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
>> +				 const char *name)
>> +{
>> +    int res, fd;
>> +    int ret = false;;
>> +    unsigned int attr;
>> +    struct fsxattr xattr;
>> +
>> +    if (!lo->perfile_dax)
>> +	return false;
>> +
>> +    /* Open file without O_PATH, so that ioctl can be called. */
>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
>> +    if (fd == -1)
>> +        return false;
> 
> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> might stumble into a /dev node or something else we're not allowed to
> open?

As far as I know, virtiofsd will pivot_root/chroot to the source
directory, and can only access files inside the source directory
specified by "-o source=". Then where do these unexpected files come
from? Besides, fd opened without O_PATH here is temporary and used for
FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
function returns.

> 
>> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
>> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
>> +        if (!res && (attr & FS_DAX_FL))
>> +	    ret = true;
>> +    }
>> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
>> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
>> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
>> +	    ret = true;
>> +    }
> 
> This all looks pretty expensive for each lookup.

Yes. it can be somehow optimized if we can agree on the way of storing
the dax flag persistently.

-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-18  5:08             ` [Virtio-fs] " Miklos Szeredi
  (?)
@ 2021-08-18 16:58               ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 16:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: JeffleXu, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo

On Wed, Aug 18, 2021 at 07:08:24AM +0200, Miklos Szeredi wrote:
> On Wed, 18 Aug 2021 at 05:40, JeffleXu <jefflexu@linux.alibaba.com> wrote:
> 
> > I'm not sure if I fully understand your idea. Then in this case, host
> > daemon only prepares 4KB while guest thinks that the whole DAX window
> > (e.g., 2MB) has been fully mapped. Then when guest really accesses the
> > remained part (2MB - 4KB), page fault is triggered, and now host daemon
> > is responsible for downloading the remained part?
> 
> Yes.  Mapping an area just means setting up the page tables, it does
> not result in actual data transfer.

But daemon will not get the page fault (its the host kernel which
will handle it). And host kernel does not know that file chunk 
needs to be downloaded.

- Either we somehow figure out user fault handling and somehow
  qemu/virtiofsd get to handle the page fault then they can
  download file.

- Or we download the 2MB chunk at the FUSE_SETUPMAPPING time so
  that later kernel fault can handle it.

Am I missing something.

Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18 16:58               ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 16:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel, Liu Bo,
	Stefan Hajnoczi

On Wed, Aug 18, 2021 at 07:08:24AM +0200, Miklos Szeredi wrote:
> On Wed, 18 Aug 2021 at 05:40, JeffleXu <jefflexu@linux.alibaba.com> wrote:
> 
> > I'm not sure if I fully understand your idea. Then in this case, host
> > daemon only prepares 4KB while guest thinks that the whole DAX window
> > (e.g., 2MB) has been fully mapped. Then when guest really accesses the
> > remained part (2MB - 4KB), page fault is triggered, and now host daemon
> > is responsible for downloading the remained part?
> 
> Yes.  Mapping an area just means setting up the page tables, it does
> not result in actual data transfer.

But daemon will not get the page fault (its the host kernel which
will handle it). And host kernel does not know that file chunk 
needs to be downloaded.

- Either we somehow figure out user fault handling and somehow
  qemu/virtiofsd get to handle the page fault then they can
  download file.

- Or we download the 2MB chunk at the FUSE_SETUPMAPPING time so
  that later kernel fault can handle it.

Am I missing something.

Vivek

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-18 16:58               ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 16:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Joseph Qi, virtualization, virtio-fs-list, linux-fsdevel, JeffleXu

On Wed, Aug 18, 2021 at 07:08:24AM +0200, Miklos Szeredi wrote:
> On Wed, 18 Aug 2021 at 05:40, JeffleXu <jefflexu@linux.alibaba.com> wrote:
> 
> > I'm not sure if I fully understand your idea. Then in this case, host
> > daemon only prepares 4KB while guest thinks that the whole DAX window
> > (e.g., 2MB) has been fully mapped. Then when guest really accesses the
> > remained part (2MB - 4KB), page fault is triggered, and now host daemon
> > is responsible for downloading the remained part?
> 
> Yes.  Mapping an area just means setting up the page tables, it does
> not result in actual data transfer.

But daemon will not get the page fault (its the host kernel which
will handle it). And host kernel does not know that file chunk 
needs to be downloaded.

- Either we somehow figure out user fault handling and somehow
  qemu/virtiofsd get to handle the page fault then they can
  download file.

- Or we download the 2MB chunk at the FUSE_SETUPMAPPING time so
  that later kernel fault can handle it.

Am I missing something.

Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
  2021-08-17 17:15       ` Dr. David Alan Gilbert
  (?)
@ 2021-08-18 17:30         ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 17:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Jeffle Xu, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization

On Tue, Aug 17, 2021 at 06:15:58PM +0100, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> > In FUSE_INIT negotiating phase, server/client should advertise if it
> > supports per-file DAX.
> > 
> > Once advertising support for per-file DAX feature, virtiofsd should
> > support storing FS_DAX_FL flag persistently passed by
> > FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> > FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> > 
> > Currently only ext4/xfs since linux kernel v5.8 support storing
> > FS_DAX_FL flag persistently, and thus advertise support for per-file
> > DAX feature only when the backend fs type is ext4 and xfs.
> 
> I'm a little worried about the meaning of the flags we're storing and
> the fact we're storing them in the normal host DAX flags.
> 
> Doesn't this mean that we're using a single host flag to mean:
>   a) It can be mapped as DAX on the host if it was a real DAX device
>   b) We can map it as DAX inside the guest with virtiofs?

That's how passthrough filesystem is. Every attribute is passthrough.
So if guest sets something, host sees it same way. (file uid/gid, 
file mode bits, xattrs etc.). Only exception now seems to be remapping
of xattrs if users choses to do so.

> 
> what happens when we're using usernamespaces for the guest?

I don't think file attrs are namespaced. So if virtiofsd has permission
do to so, it will be just able to set attrs on file.

Vivek

> 
> Dave
> 
> 
> > Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/fuse_common.h    |  5 +++++
> >  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
> >  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
> >  3 files changed, 40 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index 8a75729be9..ee6fc64c23 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -372,6 +372,11 @@ struct fuse_file_info {
> >   */
> >  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
> >  
> > +/**
> > + * Indicates support for per-file DAX.
> > + */
> > +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> > +
> >  /**
> >   * Ioctl flags
> >   *
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 50fc5c8d5a..04a4f17423 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
> >          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> >      }
> > +    if (arg->flags & FUSE_PERFILE_DAX) {
> > +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> > +    }
> >  #ifdef HAVE_SPLICE
> >  #ifdef HAVE_VMSPLICE
> >      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> > @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
> >          outarg.flags |= FUSE_POSIX_ACL;
> >      }
> > +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> > +        outarg.flags |= FUSE_PERFILE_DAX;
> > +    }
> >      outarg.max_readahead = se->conn.max_readahead;
> >      outarg.max_write = se->conn.max_write;
> >      if (se->conn.max_background >= (1 << 16)) {
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index e170b17adb..5b6228210f 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -53,8 +53,10 @@
> >  #include <sys/syscall.h>
> >  #include <sys/wait.h>
> >  #include <sys/xattr.h>
> > +#include <sys/vfs.h>
> >  #include <syslog.h>
> >  #include <linux/fs.h>
> > +#include <linux/magic.h>
> >  
> >  #include "qemu/cutils.h"
> >  #include "passthrough_helpers.h"
> > @@ -136,6 +138,13 @@ enum {
> >      SANDBOX_CHROOT,
> >  };
> >  
> > +/* capability of storing DAX flag persistently */
> > +enum {
> > +    DAX_CAP_NONE,  /* not supported */
> > +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> > +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> > +};
> > +
> >  typedef struct xattr_map_entry {
> >      char *key;
> >      char *prepend;
> > @@ -161,6 +170,7 @@ struct lo_data {
> >      int readdirplus_clear;
> >      int allow_direct_io;
> >      int announce_submounts;
> > +    int perfile_dax_cap; /* capability of backend fs */
> >      bool use_statx;
> >      struct lo_inode root;
> >      GHashTable *inodes; /* protected by lo->mutex */
> > @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> >          lo->killpriv_v2 = 0;
> >      }
> > +
> > +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> > +        conn->want |= FUSE_CAP_PERFILE_DAX;
> > +    }
> >  }
> >  
> >  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> > @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >      int fd, res;
> >      struct stat stat;
> >      uint64_t mnt_id;
> > +    struct statfs statfs;
> >  
> >      fd = open("/", O_PATH);
> >      if (fd == -1) {
> > @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >          root->posix_locks = g_hash_table_new_full(
> >              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
> >      }
> > +
> > +    /*
> > +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> > +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> > +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> > +     * FS_IOC_FSG[S]ETXATTR ioctl.
> > +     */
> > +    res = fstatfs(fd, &statfs);
> > +    if (!res) {
> > +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> > +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> > +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> > +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> > +    }
> >  }
> >  
> >  static guint lo_key_hash(gconstpointer key)
> > -- 
> > 2.27.0
> > 
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-18 17:30         ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 17:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: joseph.qi, miklos, virtualization, virtio-fs, linux-fsdevel, stefanha

On Tue, Aug 17, 2021 at 06:15:58PM +0100, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> > In FUSE_INIT negotiating phase, server/client should advertise if it
> > supports per-file DAX.
> > 
> > Once advertising support for per-file DAX feature, virtiofsd should
> > support storing FS_DAX_FL flag persistently passed by
> > FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> > FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> > 
> > Currently only ext4/xfs since linux kernel v5.8 support storing
> > FS_DAX_FL flag persistently, and thus advertise support for per-file
> > DAX feature only when the backend fs type is ext4 and xfs.
> 
> I'm a little worried about the meaning of the flags we're storing and
> the fact we're storing them in the normal host DAX flags.
> 
> Doesn't this mean that we're using a single host flag to mean:
>   a) It can be mapped as DAX on the host if it was a real DAX device
>   b) We can map it as DAX inside the guest with virtiofs?

That's how passthrough filesystem is. Every attribute is passthrough.
So if guest sets something, host sees it same way. (file uid/gid, 
file mode bits, xattrs etc.). Only exception now seems to be remapping
of xattrs if users choses to do so.

> 
> what happens when we're using usernamespaces for the guest?

I don't think file attrs are namespaced. So if virtiofsd has permission
do to so, it will be just able to set attrs on file.

Vivek

> 
> Dave
> 
> 
> > Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/fuse_common.h    |  5 +++++
> >  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
> >  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
> >  3 files changed, 40 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index 8a75729be9..ee6fc64c23 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -372,6 +372,11 @@ struct fuse_file_info {
> >   */
> >  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
> >  
> > +/**
> > + * Indicates support for per-file DAX.
> > + */
> > +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> > +
> >  /**
> >   * Ioctl flags
> >   *
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 50fc5c8d5a..04a4f17423 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
> >          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> >      }
> > +    if (arg->flags & FUSE_PERFILE_DAX) {
> > +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> > +    }
> >  #ifdef HAVE_SPLICE
> >  #ifdef HAVE_VMSPLICE
> >      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> > @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
> >          outarg.flags |= FUSE_POSIX_ACL;
> >      }
> > +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> > +        outarg.flags |= FUSE_PERFILE_DAX;
> > +    }
> >      outarg.max_readahead = se->conn.max_readahead;
> >      outarg.max_write = se->conn.max_write;
> >      if (se->conn.max_background >= (1 << 16)) {
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index e170b17adb..5b6228210f 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -53,8 +53,10 @@
> >  #include <sys/syscall.h>
> >  #include <sys/wait.h>
> >  #include <sys/xattr.h>
> > +#include <sys/vfs.h>
> >  #include <syslog.h>
> >  #include <linux/fs.h>
> > +#include <linux/magic.h>
> >  
> >  #include "qemu/cutils.h"
> >  #include "passthrough_helpers.h"
> > @@ -136,6 +138,13 @@ enum {
> >      SANDBOX_CHROOT,
> >  };
> >  
> > +/* capability of storing DAX flag persistently */
> > +enum {
> > +    DAX_CAP_NONE,  /* not supported */
> > +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> > +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> > +};
> > +
> >  typedef struct xattr_map_entry {
> >      char *key;
> >      char *prepend;
> > @@ -161,6 +170,7 @@ struct lo_data {
> >      int readdirplus_clear;
> >      int allow_direct_io;
> >      int announce_submounts;
> > +    int perfile_dax_cap; /* capability of backend fs */
> >      bool use_statx;
> >      struct lo_inode root;
> >      GHashTable *inodes; /* protected by lo->mutex */
> > @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> >          lo->killpriv_v2 = 0;
> >      }
> > +
> > +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> > +        conn->want |= FUSE_CAP_PERFILE_DAX;
> > +    }
> >  }
> >  
> >  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> > @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >      int fd, res;
> >      struct stat stat;
> >      uint64_t mnt_id;
> > +    struct statfs statfs;
> >  
> >      fd = open("/", O_PATH);
> >      if (fd == -1) {
> > @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >          root->posix_locks = g_hash_table_new_full(
> >              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
> >      }
> > +
> > +    /*
> > +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> > +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> > +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> > +     * FS_IOC_FSG[S]ETXATTR ioctl.
> > +     */
> > +    res = fstatfs(fd, &statfs);
> > +    if (!res) {
> > +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> > +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> > +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> > +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> > +    }
> >  }
> >  
> >  static guint lo_key_hash(gconstpointer key)
> > -- 
> > 2.27.0
> > 
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-18 17:30         ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 17:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: joseph.qi, miklos, virtualization, virtio-fs, linux-fsdevel, Jeffle Xu

On Tue, Aug 17, 2021 at 06:15:58PM +0100, Dr. David Alan Gilbert wrote:
> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> > In FUSE_INIT negotiating phase, server/client should advertise if it
> > supports per-file DAX.
> > 
> > Once advertising support for per-file DAX feature, virtiofsd should
> > support storing FS_DAX_FL flag persistently passed by
> > FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> > FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> > 
> > Currently only ext4/xfs since linux kernel v5.8 support storing
> > FS_DAX_FL flag persistently, and thus advertise support for per-file
> > DAX feature only when the backend fs type is ext4 and xfs.
> 
> I'm a little worried about the meaning of the flags we're storing and
> the fact we're storing them in the normal host DAX flags.
> 
> Doesn't this mean that we're using a single host flag to mean:
>   a) It can be mapped as DAX on the host if it was a real DAX device
>   b) We can map it as DAX inside the guest with virtiofs?

That's how passthrough filesystem is. Every attribute is passthrough.
So if guest sets something, host sees it same way. (file uid/gid, 
file mode bits, xattrs etc.). Only exception now seems to be remapping
of xattrs if users choses to do so.

> 
> what happens when we're using usernamespaces for the guest?

I don't think file attrs are namespaced. So if virtiofsd has permission
do to so, it will be just able to set attrs on file.

Vivek

> 
> Dave
> 
> 
> > Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/fuse_common.h    |  5 +++++
> >  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
> >  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
> >  3 files changed, 40 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> > index 8a75729be9..ee6fc64c23 100644
> > --- a/tools/virtiofsd/fuse_common.h
> > +++ b/tools/virtiofsd/fuse_common.h
> > @@ -372,6 +372,11 @@ struct fuse_file_info {
> >   */
> >  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
> >  
> > +/**
> > + * Indicates support for per-file DAX.
> > + */
> > +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> > +
> >  /**
> >   * Ioctl flags
> >   *
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index 50fc5c8d5a..04a4f17423 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
> >          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> >      }
> > +    if (arg->flags & FUSE_PERFILE_DAX) {
> > +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> > +    }
> >  #ifdef HAVE_SPLICE
> >  #ifdef HAVE_VMSPLICE
> >      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> > @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
> >          outarg.flags |= FUSE_POSIX_ACL;
> >      }
> > +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> > +        outarg.flags |= FUSE_PERFILE_DAX;
> > +    }
> >      outarg.max_readahead = se->conn.max_readahead;
> >      outarg.max_write = se->conn.max_write;
> >      if (se->conn.max_background >= (1 << 16)) {
> > diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> > index e170b17adb..5b6228210f 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -53,8 +53,10 @@
> >  #include <sys/syscall.h>
> >  #include <sys/wait.h>
> >  #include <sys/xattr.h>
> > +#include <sys/vfs.h>
> >  #include <syslog.h>
> >  #include <linux/fs.h>
> > +#include <linux/magic.h>
> >  
> >  #include "qemu/cutils.h"
> >  #include "passthrough_helpers.h"
> > @@ -136,6 +138,13 @@ enum {
> >      SANDBOX_CHROOT,
> >  };
> >  
> > +/* capability of storing DAX flag persistently */
> > +enum {
> > +    DAX_CAP_NONE,  /* not supported */
> > +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> > +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> > +};
> > +
> >  typedef struct xattr_map_entry {
> >      char *key;
> >      char *prepend;
> > @@ -161,6 +170,7 @@ struct lo_data {
> >      int readdirplus_clear;
> >      int allow_direct_io;
> >      int announce_submounts;
> > +    int perfile_dax_cap; /* capability of backend fs */
> >      bool use_statx;
> >      struct lo_inode root;
> >      GHashTable *inodes; /* protected by lo->mutex */
> > @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> >          lo->killpriv_v2 = 0;
> >      }
> > +
> > +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> > +        conn->want |= FUSE_CAP_PERFILE_DAX;
> > +    }
> >  }
> >  
> >  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> > @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >      int fd, res;
> >      struct stat stat;
> >      uint64_t mnt_id;
> > +    struct statfs statfs;
> >  
> >      fd = open("/", O_PATH);
> >      if (fd == -1) {
> > @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >          root->posix_locks = g_hash_table_new_full(
> >              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
> >      }
> > +
> > +    /*
> > +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> > +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> > +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> > +     * FS_IOC_FSG[S]ETXATTR ioctl.
> > +     */
> > +    res = fstatfs(fd, &statfs);
> > +    if (!res) {
> > +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> > +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> > +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> > +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> > +    }
> >  }
> >  
> >  static guint lo_key_hash(gconstpointer key)
> > -- 
> > 2.27.0
> > 
> > _______________________________________________
> > Virtio-fs mailing list
> > Virtio-fs@redhat.com
> > https://listman.redhat.com/mailman/listinfo/virtio-fs
> > 
> -- 
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [virtiofsd PATCH v4 1/4] virtiofsd: add .ioctl() support
  2021-08-17  2:23     ` Jeffle Xu
  (?)
@ 2021-08-18 17:33       ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 17:33 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: stefanha, miklos, linux-fsdevel, virtualization, virtio-fs,
	joseph.qi, bo.liu

On Tue, Aug 17, 2021 at 10:23:44AM +0800, Jeffle Xu wrote:
> Add .ioctl() support for passthrough, in prep for the following support
> for following per-file DAX feature.
> 
> Once advertising support for per-file DAX feature, virtiofsd should
> support storing FS_DAX_FL flag persistently passed by
> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> 
> When it comes to passthrough, it passes corresponding ioctls to host
> directly. Currently only these ioctls that are needed for per-file DAX
> feature, i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
> FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported. Later we can restrict
> the flags/attributes allowed to be set to reinforce the security, or
> extend the scope of allowed ioctls if it is really needed later.

Dave had concerns about which attrs should be allowed to be set by
guest. And we were also wondering why virtiofs is not supporting
ioctl yet.

It think that it probably will make sense that supporting ioctls,
is a separate patch series for virtiofs. Anyway, we probably will
need to add it. 

Vivek
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c      | 53 +++++++++++++++++++++++++++
>  tools/virtiofsd/passthrough_seccomp.c |  1 +
>  2 files changed, 54 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index b76d878509..e170b17adb 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -54,6 +54,7 @@
>  #include <sys/wait.h>
>  #include <sys/xattr.h>
>  #include <syslog.h>
> +#include <linux/fs.h>
>  
>  #include "qemu/cutils.h"
>  #include "passthrough_helpers.h"
> @@ -2105,6 +2106,57 @@ out:
>      fuse_reply_err(req, saverr);
>  }
>  
> +static void lo_ioctl(fuse_req_t req, fuse_ino_t ino, unsigned int cmd, void *arg,
> +                  struct fuse_file_info *fi, unsigned flags, const void *in_buf,
> +                  size_t in_bufsz, size_t out_bufsz)
> +{
> +    int fd = lo_fi_fd(req, fi);
> +    int res;
> +    int saverr = ENOSYS;
> +
> +    fuse_log(FUSE_LOG_DEBUG, "lo_ioctl(ino=%" PRIu64 ", cmd=0x%x, flags=0x%x, "
> +	     "in_bufsz = %lu, out_bufsz = %lu)\n",
> +	     ino, cmd, flags, in_bufsz, out_bufsz);
> +
> +    /* unrestricted ioctl is not supported yet */
> +    if (flags & FUSE_IOCTL_UNRESTRICTED)
> +        goto out;
> +
> +    /*
> +     * Currently only those ioctls needed to support per-file DAX feature,
> +     * i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
> +     * FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported.
> +     */
> +    if (cmd == FS_IOC_SETFLAGS || cmd == FS_IOC_FSSETXATTR) {
> +        res = ioctl(fd, cmd, in_buf);
> +        if (res < 0)
> +            goto out_err;
> +
> +	fuse_reply_ioctl(req, 0, NULL, 0);
> +    }
> +    else if (cmd == FS_IOC_GETFLAGS || cmd == FS_IOC_FSGETXATTR) {
> +	/* reused for 'unsigned int' for FS_IOC_GETFLAGS */
> +	struct fsxattr attr;
> +
> +        res = ioctl(fd, cmd, &attr);
> +        if (res < 0)
> +            goto out_err;
> +
> +        fuse_reply_ioctl(req, 0, &attr, out_bufsz);
> +    }
> +    else {
> +	fuse_log(FUSE_LOG_DEBUG, "Unsupported ioctl 0x%x\n", cmd);
> +	goto out;
> +    }
> +
> +    return;
> +
> +out_err:
> +	saverr = errno;
> +out:
> +	fuse_reply_err(req, saverr);
> +}
> +
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
>                          struct fuse_file_info *fi)
>  {
> @@ -3279,6 +3331,7 @@ static struct fuse_lowlevel_ops lo_oper = {
>      .create = lo_create,
>      .getlk = lo_getlk,
>      .setlk = lo_setlk,
> +    .ioctl = lo_ioctl,
>      .open = lo_open,
>      .release = lo_release,
>      .flush = lo_flush,
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index 62441cfcdb..2a5f7614fc 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -62,6 +62,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(gettid),
>      SCMP_SYS(gettimeofday),
>      SCMP_SYS(getxattr),
> +    SCMP_SYS(ioctl),
>      SCMP_SYS(linkat),
>      SCMP_SYS(listxattr),
>      SCMP_SYS(lseek),
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [virtiofsd PATCH v4 1/4] virtiofsd: add .ioctl() support
@ 2021-08-18 17:33       ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 17:33 UTC (permalink / raw)
  To: Jeffle Xu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, bo.liu, stefanha,
	linux-fsdevel

On Tue, Aug 17, 2021 at 10:23:44AM +0800, Jeffle Xu wrote:
> Add .ioctl() support for passthrough, in prep for the following support
> for following per-file DAX feature.
> 
> Once advertising support for per-file DAX feature, virtiofsd should
> support storing FS_DAX_FL flag persistently passed by
> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> 
> When it comes to passthrough, it passes corresponding ioctls to host
> directly. Currently only these ioctls that are needed for per-file DAX
> feature, i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
> FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported. Later we can restrict
> the flags/attributes allowed to be set to reinforce the security, or
> extend the scope of allowed ioctls if it is really needed later.

Dave had concerns about which attrs should be allowed to be set by
guest. And we were also wondering why virtiofs is not supporting
ioctl yet.

It think that it probably will make sense that supporting ioctls,
is a separate patch series for virtiofs. Anyway, we probably will
need to add it. 

Vivek
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c      | 53 +++++++++++++++++++++++++++
>  tools/virtiofsd/passthrough_seccomp.c |  1 +
>  2 files changed, 54 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index b76d878509..e170b17adb 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -54,6 +54,7 @@
>  #include <sys/wait.h>
>  #include <sys/xattr.h>
>  #include <syslog.h>
> +#include <linux/fs.h>
>  
>  #include "qemu/cutils.h"
>  #include "passthrough_helpers.h"
> @@ -2105,6 +2106,57 @@ out:
>      fuse_reply_err(req, saverr);
>  }
>  
> +static void lo_ioctl(fuse_req_t req, fuse_ino_t ino, unsigned int cmd, void *arg,
> +                  struct fuse_file_info *fi, unsigned flags, const void *in_buf,
> +                  size_t in_bufsz, size_t out_bufsz)
> +{
> +    int fd = lo_fi_fd(req, fi);
> +    int res;
> +    int saverr = ENOSYS;
> +
> +    fuse_log(FUSE_LOG_DEBUG, "lo_ioctl(ino=%" PRIu64 ", cmd=0x%x, flags=0x%x, "
> +	     "in_bufsz = %lu, out_bufsz = %lu)\n",
> +	     ino, cmd, flags, in_bufsz, out_bufsz);
> +
> +    /* unrestricted ioctl is not supported yet */
> +    if (flags & FUSE_IOCTL_UNRESTRICTED)
> +        goto out;
> +
> +    /*
> +     * Currently only those ioctls needed to support per-file DAX feature,
> +     * i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
> +     * FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported.
> +     */
> +    if (cmd == FS_IOC_SETFLAGS || cmd == FS_IOC_FSSETXATTR) {
> +        res = ioctl(fd, cmd, in_buf);
> +        if (res < 0)
> +            goto out_err;
> +
> +	fuse_reply_ioctl(req, 0, NULL, 0);
> +    }
> +    else if (cmd == FS_IOC_GETFLAGS || cmd == FS_IOC_FSGETXATTR) {
> +	/* reused for 'unsigned int' for FS_IOC_GETFLAGS */
> +	struct fsxattr attr;
> +
> +        res = ioctl(fd, cmd, &attr);
> +        if (res < 0)
> +            goto out_err;
> +
> +        fuse_reply_ioctl(req, 0, &attr, out_bufsz);
> +    }
> +    else {
> +	fuse_log(FUSE_LOG_DEBUG, "Unsupported ioctl 0x%x\n", cmd);
> +	goto out;
> +    }
> +
> +    return;
> +
> +out_err:
> +	saverr = errno;
> +out:
> +	fuse_reply_err(req, saverr);
> +}
> +
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
>                          struct fuse_file_info *fi)
>  {
> @@ -3279,6 +3331,7 @@ static struct fuse_lowlevel_ops lo_oper = {
>      .create = lo_create,
>      .getlk = lo_getlk,
>      .setlk = lo_setlk,
> +    .ioctl = lo_ioctl,
>      .open = lo_open,
>      .release = lo_release,
>      .flush = lo_flush,
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index 62441cfcdb..2a5f7614fc 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -62,6 +62,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(gettid),
>      SCMP_SYS(gettimeofday),
>      SCMP_SYS(getxattr),
> +    SCMP_SYS(ioctl),
>      SCMP_SYS(linkat),
>      SCMP_SYS(listxattr),
>      SCMP_SYS(lseek),
> -- 
> 2.27.0
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 1/4] virtiofsd: add .ioctl() support
@ 2021-08-18 17:33       ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-08-18 17:33 UTC (permalink / raw)
  To: Jeffle Xu; +Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel

On Tue, Aug 17, 2021 at 10:23:44AM +0800, Jeffle Xu wrote:
> Add .ioctl() support for passthrough, in prep for the following support
> for following per-file DAX feature.
> 
> Once advertising support for per-file DAX feature, virtiofsd should
> support storing FS_DAX_FL flag persistently passed by
> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> 
> When it comes to passthrough, it passes corresponding ioctls to host
> directly. Currently only these ioctls that are needed for per-file DAX
> feature, i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
> FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported. Later we can restrict
> the flags/attributes allowed to be set to reinforce the security, or
> extend the scope of allowed ioctls if it is really needed later.

Dave had concerns about which attrs should be allowed to be set by
guest. And we were also wondering why virtiofs is not supporting
ioctl yet.

It think that it probably will make sense that supporting ioctls,
is a separate patch series for virtiofs. Anyway, we probably will
need to add it. 

Vivek
> 
> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> ---
>  tools/virtiofsd/passthrough_ll.c      | 53 +++++++++++++++++++++++++++
>  tools/virtiofsd/passthrough_seccomp.c |  1 +
>  2 files changed, 54 insertions(+)
> 
> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> index b76d878509..e170b17adb 100644
> --- a/tools/virtiofsd/passthrough_ll.c
> +++ b/tools/virtiofsd/passthrough_ll.c
> @@ -54,6 +54,7 @@
>  #include <sys/wait.h>
>  #include <sys/xattr.h>
>  #include <syslog.h>
> +#include <linux/fs.h>
>  
>  #include "qemu/cutils.h"
>  #include "passthrough_helpers.h"
> @@ -2105,6 +2106,57 @@ out:
>      fuse_reply_err(req, saverr);
>  }
>  
> +static void lo_ioctl(fuse_req_t req, fuse_ino_t ino, unsigned int cmd, void *arg,
> +                  struct fuse_file_info *fi, unsigned flags, const void *in_buf,
> +                  size_t in_bufsz, size_t out_bufsz)
> +{
> +    int fd = lo_fi_fd(req, fi);
> +    int res;
> +    int saverr = ENOSYS;
> +
> +    fuse_log(FUSE_LOG_DEBUG, "lo_ioctl(ino=%" PRIu64 ", cmd=0x%x, flags=0x%x, "
> +	     "in_bufsz = %lu, out_bufsz = %lu)\n",
> +	     ino, cmd, flags, in_bufsz, out_bufsz);
> +
> +    /* unrestricted ioctl is not supported yet */
> +    if (flags & FUSE_IOCTL_UNRESTRICTED)
> +        goto out;
> +
> +    /*
> +     * Currently only those ioctls needed to support per-file DAX feature,
> +     * i.e., FS_IOC_GETFLAGS/FS_IOC_SETFLAGS and
> +     * FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR are supported.
> +     */
> +    if (cmd == FS_IOC_SETFLAGS || cmd == FS_IOC_FSSETXATTR) {
> +        res = ioctl(fd, cmd, in_buf);
> +        if (res < 0)
> +            goto out_err;
> +
> +	fuse_reply_ioctl(req, 0, NULL, 0);
> +    }
> +    else if (cmd == FS_IOC_GETFLAGS || cmd == FS_IOC_FSGETXATTR) {
> +	/* reused for 'unsigned int' for FS_IOC_GETFLAGS */
> +	struct fsxattr attr;
> +
> +        res = ioctl(fd, cmd, &attr);
> +        if (res < 0)
> +            goto out_err;
> +
> +        fuse_reply_ioctl(req, 0, &attr, out_bufsz);
> +    }
> +    else {
> +	fuse_log(FUSE_LOG_DEBUG, "Unsupported ioctl 0x%x\n", cmd);
> +	goto out;
> +    }
> +
> +    return;
> +
> +out_err:
> +	saverr = errno;
> +out:
> +	fuse_reply_err(req, saverr);
> +}
> +
>  static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
>                          struct fuse_file_info *fi)
>  {
> @@ -3279,6 +3331,7 @@ static struct fuse_lowlevel_ops lo_oper = {
>      .create = lo_create,
>      .getlk = lo_getlk,
>      .setlk = lo_setlk,
> +    .ioctl = lo_ioctl,
>      .open = lo_open,
>      .release = lo_release,
>      .flush = lo_flush,
> diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
> index 62441cfcdb..2a5f7614fc 100644
> --- a/tools/virtiofsd/passthrough_seccomp.c
> +++ b/tools/virtiofsd/passthrough_seccomp.c
> @@ -62,6 +62,7 @@ static const int syscall_allowlist[] = {
>      SCMP_SYS(gettid),
>      SCMP_SYS(gettimeofday),
>      SCMP_SYS(getxattr),
> +    SCMP_SYS(ioctl),
>      SCMP_SYS(linkat),
>      SCMP_SYS(listxattr),
>      SCMP_SYS(lseek),
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 14:54           ` Vivek Goyal
  (?)
@ 2021-08-19  6:14             ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-19  6:14 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, Joseph Qi, linux-fsdevel



On 8/17/21 10:54 PM, Vivek Goyal wrote:
[...]
>>
>> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
>> may compete for limited DAX window resource.
>>
>> Besides, supporting DAX for small files can be expensive. Small files
>> can consume DAX window resource rapidly, and if small files are accessed
>> only once, the cost of mmap/munmap on host can not be ignored.
> 
> W.r.r access pattern, same applies to large files also. So if a section
> of large file is accessed only once, it will consume dax window as well
> and will have to be reclaimed.
> 
> Dax in virtiofs provides speed gain only if map file once and access
> it multiple times. If that pattern does not hold true, then dax does
> not seem to provide speed gains and in fact might be slower than
> non-dax.
> 
> So if there is a pattern where we know some files are accessed repeatedly
> while others are not, then enabling/disabling dax selectively will make
> sense. Question is how many workloads really know that and how will
> you make that decision. Do you have any data to back that up.

Empirically, some files are naturally accessed only once, such as
configuration files under /etc/ directory, .py, .js files, etc. It's the
real case that we have met in real world. While some others are most
likely accessed multiple times, such as .so libraries. With per-file DAX
feature, administrator can decide on their own which files shall be dax
enabled and thus gain most benefit from dax, while others not.

As for how we can distinguish the file access mode, besides the
intuitive insights described previously, we can develop more advanced
method distinguishing it, e.g., scanning the DAX window map and finding
the hot files. With the mechanism offered by kernel, more advanced
strategy can be developed then.

> 
> W.r.t small file, is that a real concern. If that file is being accessed
> mutliple times, then we will still see the speed gain. Only down side
> is that there is little wastage of resources because our minimum dax
> mapping granularity is 2MB. I am wondering can we handle that by
> supporting other dax mapping granularities as well. say 256K and let
> users choose it.
> 


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-19  6:14             ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-19  6:14 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, Joseph Qi, linux-fsdevel



On 8/17/21 10:54 PM, Vivek Goyal wrote:
[...]
>>
>> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
>> may compete for limited DAX window resource.
>>
>> Besides, supporting DAX for small files can be expensive. Small files
>> can consume DAX window resource rapidly, and if small files are accessed
>> only once, the cost of mmap/munmap on host can not be ignored.
> 
> W.r.r access pattern, same applies to large files also. So if a section
> of large file is accessed only once, it will consume dax window as well
> and will have to be reclaimed.
> 
> Dax in virtiofs provides speed gain only if map file once and access
> it multiple times. If that pattern does not hold true, then dax does
> not seem to provide speed gains and in fact might be slower than
> non-dax.
> 
> So if there is a pattern where we know some files are accessed repeatedly
> while others are not, then enabling/disabling dax selectively will make
> sense. Question is how many workloads really know that and how will
> you make that decision. Do you have any data to back that up.

Empirically, some files are naturally accessed only once, such as
configuration files under /etc/ directory, .py, .js files, etc. It's the
real case that we have met in real world. While some others are most
likely accessed multiple times, such as .so libraries. With per-file DAX
feature, administrator can decide on their own which files shall be dax
enabled and thus gain most benefit from dax, while others not.

As for how we can distinguish the file access mode, besides the
intuitive insights described previously, we can develop more advanced
method distinguishing it, e.g., scanning the DAX window map and finding
the hot files. With the mechanism offered by kernel, more advanced
strategy can be developed then.

> 
> W.r.t small file, is that a real concern. If that file is being accessed
> mutliple times, then we will still see the speed gain. Only down side
> is that there is little wastage of resources because our minimum dax
> mapping granularity is 2MB. I am wondering can we handle that by
> supporting other dax mapping granularities as well. say 256K and let
> users choose it.
> 


-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-08-19  6:14             ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-19  6:14 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel



On 8/17/21 10:54 PM, Vivek Goyal wrote:
[...]
>>
>> As for virtiofs, Dr. David Alan Gilbert has mentioned that various files
>> may compete for limited DAX window resource.
>>
>> Besides, supporting DAX for small files can be expensive. Small files
>> can consume DAX window resource rapidly, and if small files are accessed
>> only once, the cost of mmap/munmap on host can not be ignored.
> 
> W.r.r access pattern, same applies to large files also. So if a section
> of large file is accessed only once, it will consume dax window as well
> and will have to be reclaimed.
> 
> Dax in virtiofs provides speed gain only if map file once and access
> it multiple times. If that pattern does not hold true, then dax does
> not seem to provide speed gains and in fact might be slower than
> non-dax.
> 
> So if there is a pattern where we know some files are accessed repeatedly
> while others are not, then enabling/disabling dax selectively will make
> sense. Question is how many workloads really know that and how will
> you make that decision. Do you have any data to back that up.

Empirically, some files are naturally accessed only once, such as
configuration files under /etc/ directory, .py, .js files, etc. It's the
real case that we have met in real world. While some others are most
likely accessed multiple times, such as .so libraries. With per-file DAX
feature, administrator can decide on their own which files shall be dax
enabled and thus gain most benefit from dax, while others not.

As for how we can distinguish the file access mode, besides the
intuitive insights described previously, we can develop more advanced
method distinguishing it, e.g., scanning the DAX window map and finding
the hot files. With the mechanism offered by kernel, more advanced
strategy can be developed then.

> 
> W.r.t small file, is that a real concern. If that file is being accessed
> mutliple times, then we will still see the speed gain. Only down side
> is that there is little wastage of resources because our minimum dax
> mapping granularity is 2MB. I am wondering can we handle that by
> supporting other dax mapping granularities as well. say 256K and let
> users choose it.
> 


-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
  2021-08-18  5:46         ` JeffleXu
  (?)
@ 2021-08-19 13:08           ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 13:08 UTC (permalink / raw)
  To: JeffleXu
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization

* JeffleXu (jefflexu@linux.alibaba.com) wrote:
> 
> 
> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> > * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >> For passthrough, when the corresponding virtiofs in guest is mounted
> >> with '-o dax=inode', advertise that the file is capable of per-file
> >> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> >>
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
> >>  1 file changed, 43 insertions(+)
> >>
> >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >> index 5b6228210f..4cbd904248 100644
> >> --- a/tools/virtiofsd/passthrough_ll.c
> >> +++ b/tools/virtiofsd/passthrough_ll.c
> >> @@ -171,6 +171,7 @@ struct lo_data {
> >>      int allow_direct_io;
> >>      int announce_submounts;
> >>      int perfile_dax_cap; /* capability of backend fs */
> >> +    bool perfile_dax; /* enable per-file DAX or not */
> >>      bool use_statx;
> >>      struct lo_inode root;
> >>      GHashTable *inodes; /* protected by lo->mutex */
> >> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>  
> >>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >>          conn->want |= FUSE_CAP_PERFILE_DAX;
> >> +	lo->perfile_dax = 1;
> >> +    }
> >> +    else {
> >> +	lo->perfile_dax = 0;
> >>      }
> >>  }
> >>  
> >> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> >>      return 0;
> >>  }
> >>  
> >> +/*
> >> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> >> + * enabled for this file.
> >> + */
> >> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> >> +				 const char *name)
> >> +{
> >> +    int res, fd;
> >> +    int ret = false;;
> >> +    unsigned int attr;
> >> +    struct fsxattr xattr;
> >> +
> >> +    if (!lo->perfile_dax)
> >> +	return false;
> >> +
> >> +    /* Open file without O_PATH, so that ioctl can be called. */
> >> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> >> +    if (fd == -1)
> >> +        return false;
> > 
> > Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> > might stumble into a /dev node or something else we're not allowed to
> > open?
> 
> As far as I know, virtiofsd will pivot_root/chroot to the source
> directory, and can only access files inside the source directory
> specified by "-o source=". Then where do these unexpected files come
> from? Besides, fd opened without O_PATH here is temporary and used for
> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
> function returns.

The guest is still allowed to mknod.
See:
   https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html

also it's legal to expose a root filesystem for a guest; the virtiofsd
should *never* open a device other than O_PATH - and it's really tricky
to do a check to see if it is a device in a race-free way.


> > 
> >> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
> >> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
> >> +        if (!res && (attr & FS_DAX_FL))
> >> +	    ret = true;
> >> +    }
> >> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
> >> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
> >> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
> >> +	    ret = true;
> >> +    }
> > 
> > This all looks pretty expensive for each lookup.
> 
> Yes. it can be somehow optimized if we can agree on the way of storing
> the dax flag persistently.

Dave

> -- 
> Thanks,
> Jeffle
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-19 13:08           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 13:08 UTC (permalink / raw)
  To: JeffleXu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal

* JeffleXu (jefflexu@linux.alibaba.com) wrote:
> 
> 
> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> > * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >> For passthrough, when the corresponding virtiofs in guest is mounted
> >> with '-o dax=inode', advertise that the file is capable of per-file
> >> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> >>
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
> >>  1 file changed, 43 insertions(+)
> >>
> >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >> index 5b6228210f..4cbd904248 100644
> >> --- a/tools/virtiofsd/passthrough_ll.c
> >> +++ b/tools/virtiofsd/passthrough_ll.c
> >> @@ -171,6 +171,7 @@ struct lo_data {
> >>      int allow_direct_io;
> >>      int announce_submounts;
> >>      int perfile_dax_cap; /* capability of backend fs */
> >> +    bool perfile_dax; /* enable per-file DAX or not */
> >>      bool use_statx;
> >>      struct lo_inode root;
> >>      GHashTable *inodes; /* protected by lo->mutex */
> >> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>  
> >>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >>          conn->want |= FUSE_CAP_PERFILE_DAX;
> >> +	lo->perfile_dax = 1;
> >> +    }
> >> +    else {
> >> +	lo->perfile_dax = 0;
> >>      }
> >>  }
> >>  
> >> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> >>      return 0;
> >>  }
> >>  
> >> +/*
> >> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> >> + * enabled for this file.
> >> + */
> >> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> >> +				 const char *name)
> >> +{
> >> +    int res, fd;
> >> +    int ret = false;;
> >> +    unsigned int attr;
> >> +    struct fsxattr xattr;
> >> +
> >> +    if (!lo->perfile_dax)
> >> +	return false;
> >> +
> >> +    /* Open file without O_PATH, so that ioctl can be called. */
> >> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> >> +    if (fd == -1)
> >> +        return false;
> > 
> > Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> > might stumble into a /dev node or something else we're not allowed to
> > open?
> 
> As far as I know, virtiofsd will pivot_root/chroot to the source
> directory, and can only access files inside the source directory
> specified by "-o source=". Then where do these unexpected files come
> from? Besides, fd opened without O_PATH here is temporary and used for
> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
> function returns.

The guest is still allowed to mknod.
See:
   https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html

also it's legal to expose a root filesystem for a guest; the virtiofsd
should *never* open a device other than O_PATH - and it's really tricky
to do a check to see if it is a device in a race-free way.


> > 
> >> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
> >> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
> >> +        if (!res && (attr & FS_DAX_FL))
> >> +	    ret = true;
> >> +    }
> >> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
> >> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
> >> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
> >> +	    ret = true;
> >> +    }
> > 
> > This all looks pretty expensive for each lookup.
> 
> Yes. it can be somehow optimized if we can agree on the way of storing
> the dax flag persistently.

Dave

> -- 
> Thanks,
> Jeffle
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-19 13:08           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 13:08 UTC (permalink / raw)
  To: JeffleXu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal

* JeffleXu (jefflexu@linux.alibaba.com) wrote:
> 
> 
> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> > * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >> For passthrough, when the corresponding virtiofs in guest is mounted
> >> with '-o dax=inode', advertise that the file is capable of per-file
> >> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> >>
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
> >>  1 file changed, 43 insertions(+)
> >>
> >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >> index 5b6228210f..4cbd904248 100644
> >> --- a/tools/virtiofsd/passthrough_ll.c
> >> +++ b/tools/virtiofsd/passthrough_ll.c
> >> @@ -171,6 +171,7 @@ struct lo_data {
> >>      int allow_direct_io;
> >>      int announce_submounts;
> >>      int perfile_dax_cap; /* capability of backend fs */
> >> +    bool perfile_dax; /* enable per-file DAX or not */
> >>      bool use_statx;
> >>      struct lo_inode root;
> >>      GHashTable *inodes; /* protected by lo->mutex */
> >> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>  
> >>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >>          conn->want |= FUSE_CAP_PERFILE_DAX;
> >> +	lo->perfile_dax = 1;
> >> +    }
> >> +    else {
> >> +	lo->perfile_dax = 0;
> >>      }
> >>  }
> >>  
> >> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> >>      return 0;
> >>  }
> >>  
> >> +/*
> >> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> >> + * enabled for this file.
> >> + */
> >> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> >> +				 const char *name)
> >> +{
> >> +    int res, fd;
> >> +    int ret = false;;
> >> +    unsigned int attr;
> >> +    struct fsxattr xattr;
> >> +
> >> +    if (!lo->perfile_dax)
> >> +	return false;
> >> +
> >> +    /* Open file without O_PATH, so that ioctl can be called. */
> >> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> >> +    if (fd == -1)
> >> +        return false;
> > 
> > Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> > might stumble into a /dev node or something else we're not allowed to
> > open?
> 
> As far as I know, virtiofsd will pivot_root/chroot to the source
> directory, and can only access files inside the source directory
> specified by "-o source=". Then where do these unexpected files come
> from? Besides, fd opened without O_PATH here is temporary and used for
> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
> function returns.

The guest is still allowed to mknod.
See:
   https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html

also it's legal to expose a root filesystem for a guest; the virtiofsd
should *never* open a device other than O_PATH - and it's really tricky
to do a check to see if it is a device in a race-free way.


> > 
> >> +    if (lo->perfile_dax_cap == DAX_CAP_FLAGS) {
> >> +        res = ioctl(fd, FS_IOC_GETFLAGS, &attr);
> >> +        if (!res && (attr & FS_DAX_FL))
> >> +	    ret = true;
> >> +    }
> >> +    else if (lo->perfile_dax_cap == DAX_CAP_XATTR) {
> >> +	res = ioctl(fd, FS_IOC_FSGETXATTR, &xattr);
> >> +	if (!res && (xattr.fsx_xflags & FS_XFLAG_DAX))
> >> +	    ret = true;
> >> +    }
> > 
> > This all looks pretty expensive for each lookup.
> 
> Yes. it can be somehow optimized if we can agree on the way of storing
> the dax flag persistently.

Dave

> -- 
> Thanks,
> Jeffle
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
  2021-08-18  5:28         ` JeffleXu
  (?)
@ 2021-08-19 13:57           ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 13:57 UTC (permalink / raw)
  To: JeffleXu
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization

* JeffleXu (jefflexu@linux.alibaba.com) wrote:
> 
> 
> On 8/18/21 1:15 AM, Dr. David Alan Gilbert wrote:
> > * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >> In FUSE_INIT negotiating phase, server/client should advertise if it
> >> supports per-file DAX.
> >>
> >> Once advertising support for per-file DAX feature, virtiofsd should
> >> support storing FS_DAX_FL flag persistently passed by
> >> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> >> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> >>
> >> Currently only ext4/xfs since linux kernel v5.8 support storing
> >> FS_DAX_FL flag persistently, and thus advertise support for per-file
> >> DAX feature only when the backend fs type is ext4 and xfs.
> > 
> > I'm a little worried about the meaning of the flags we're storing and
> > the fact we're storing them in the normal host DAX flags.
> > 
> > Doesn't this mean that we're using a single host flag to mean:
> >   a) It can be mapped as DAX on the host if it was a real DAX device
> >   b) We can map it as DAX inside the guest with virtiofs?
> 
> Yes the side effect is that the host file is also dax enabled if the
> backend fs is built upon real nvdimm device.
> 
> The rationale here is that, fuse daemon shall be capable of *marking*
> the file as dax capable *persistently*, so that it can be informed that
> this file is capable of dax later.

Right, so my worry here is that the untrusted guest changes both it's
own behaviour (fine) and also the behaviour of the host (less fine).

> I'm not sure if xattr (extent attribute) is a better option for this?

Well, if you used an xattr for it, it wouldn't clash with whatever the
host did (especially if it used the xattr mapping).

Dave

> 
> > 
> > what happens when we're using usernamespaces for the guest?
> > 
> > Dave
> > 
> > 
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  tools/virtiofsd/fuse_common.h    |  5 +++++
> >>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
> >>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
> >>  3 files changed, 40 insertions(+)
> >>
> >> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> >> index 8a75729be9..ee6fc64c23 100644
> >> --- a/tools/virtiofsd/fuse_common.h
> >> +++ b/tools/virtiofsd/fuse_common.h
> >> @@ -372,6 +372,11 @@ struct fuse_file_info {
> >>   */
> >>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
> >>  
> >> +/**
> >> + * Indicates support for per-file DAX.
> >> + */
> >> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> >> +
> >>  /**
> >>   * Ioctl flags
> >>   *
> >> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> >> index 50fc5c8d5a..04a4f17423 100644
> >> --- a/tools/virtiofsd/fuse_lowlevel.c
> >> +++ b/tools/virtiofsd/fuse_lowlevel.c
> >> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
> >>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> >>      }
> >> +    if (arg->flags & FUSE_PERFILE_DAX) {
> >> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> >> +    }
> >>  #ifdef HAVE_SPLICE
> >>  #ifdef HAVE_VMSPLICE
> >>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> >> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
> >>          outarg.flags |= FUSE_POSIX_ACL;
> >>      }
> >> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> >> +        outarg.flags |= FUSE_PERFILE_DAX;
> >> +    }
> >>      outarg.max_readahead = se->conn.max_readahead;
> >>      outarg.max_write = se->conn.max_write;
> >>      if (se->conn.max_background >= (1 << 16)) {
> >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >> index e170b17adb..5b6228210f 100644
> >> --- a/tools/virtiofsd/passthrough_ll.c
> >> +++ b/tools/virtiofsd/passthrough_ll.c
> >> @@ -53,8 +53,10 @@
> >>  #include <sys/syscall.h>
> >>  #include <sys/wait.h>
> >>  #include <sys/xattr.h>
> >> +#include <sys/vfs.h>
> >>  #include <syslog.h>
> >>  #include <linux/fs.h>
> >> +#include <linux/magic.h>
> >>  
> >>  #include "qemu/cutils.h"
> >>  #include "passthrough_helpers.h"
> >> @@ -136,6 +138,13 @@ enum {
> >>      SANDBOX_CHROOT,
> >>  };
> >>  
> >> +/* capability of storing DAX flag persistently */
> >> +enum {
> >> +    DAX_CAP_NONE,  /* not supported */
> >> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> >> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> >> +};
> >> +
> >>  typedef struct xattr_map_entry {
> >>      char *key;
> >>      char *prepend;
> >> @@ -161,6 +170,7 @@ struct lo_data {
> >>      int readdirplus_clear;
> >>      int allow_direct_io;
> >>      int announce_submounts;
> >> +    int perfile_dax_cap; /* capability of backend fs */
> >>      bool use_statx;
> >>      struct lo_inode root;
> >>      GHashTable *inodes; /* protected by lo->mutex */
> >> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> >>          lo->killpriv_v2 = 0;
> >>      }
> >> +
> >> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >> +        conn->want |= FUSE_CAP_PERFILE_DAX;
> >> +    }
> >>  }
> >>  
> >>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> >> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >>      int fd, res;
> >>      struct stat stat;
> >>      uint64_t mnt_id;
> >> +    struct statfs statfs;
> >>  
> >>      fd = open("/", O_PATH);
> >>      if (fd == -1) {
> >> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >>          root->posix_locks = g_hash_table_new_full(
> >>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
> >>      }
> >> +
> >> +    /*
> >> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> >> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> >> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> >> +     * FS_IOC_FSG[S]ETXATTR ioctl.
> >> +     */
> >> +    res = fstatfs(fd, &statfs);
> >> +    if (!res) {
> >> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> >> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> >> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> >> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> >> +    }
> >>  }
> >>  
> >>  static guint lo_key_hash(gconstpointer key)
> >> -- 
> >> 2.27.0
> >>
> >> _______________________________________________
> >> Virtio-fs mailing list
> >> Virtio-fs@redhat.com
> >> https://listman.redhat.com/mailman/listinfo/virtio-fs
> >>
> 
> -- 
> Thanks,
> Jeffle
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-19 13:57           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 13:57 UTC (permalink / raw)
  To: JeffleXu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal

* JeffleXu (jefflexu@linux.alibaba.com) wrote:
> 
> 
> On 8/18/21 1:15 AM, Dr. David Alan Gilbert wrote:
> > * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >> In FUSE_INIT negotiating phase, server/client should advertise if it
> >> supports per-file DAX.
> >>
> >> Once advertising support for per-file DAX feature, virtiofsd should
> >> support storing FS_DAX_FL flag persistently passed by
> >> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> >> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> >>
> >> Currently only ext4/xfs since linux kernel v5.8 support storing
> >> FS_DAX_FL flag persistently, and thus advertise support for per-file
> >> DAX feature only when the backend fs type is ext4 and xfs.
> > 
> > I'm a little worried about the meaning of the flags we're storing and
> > the fact we're storing them in the normal host DAX flags.
> > 
> > Doesn't this mean that we're using a single host flag to mean:
> >   a) It can be mapped as DAX on the host if it was a real DAX device
> >   b) We can map it as DAX inside the guest with virtiofs?
> 
> Yes the side effect is that the host file is also dax enabled if the
> backend fs is built upon real nvdimm device.
> 
> The rationale here is that, fuse daemon shall be capable of *marking*
> the file as dax capable *persistently*, so that it can be informed that
> this file is capable of dax later.

Right, so my worry here is that the untrusted guest changes both it's
own behaviour (fine) and also the behaviour of the host (less fine).

> I'm not sure if xattr (extent attribute) is a better option for this?

Well, if you used an xattr for it, it wouldn't clash with whatever the
host did (especially if it used the xattr mapping).

Dave

> 
> > 
> > what happens when we're using usernamespaces for the guest?
> > 
> > Dave
> > 
> > 
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  tools/virtiofsd/fuse_common.h    |  5 +++++
> >>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
> >>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
> >>  3 files changed, 40 insertions(+)
> >>
> >> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> >> index 8a75729be9..ee6fc64c23 100644
> >> --- a/tools/virtiofsd/fuse_common.h
> >> +++ b/tools/virtiofsd/fuse_common.h
> >> @@ -372,6 +372,11 @@ struct fuse_file_info {
> >>   */
> >>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
> >>  
> >> +/**
> >> + * Indicates support for per-file DAX.
> >> + */
> >> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> >> +
> >>  /**
> >>   * Ioctl flags
> >>   *
> >> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> >> index 50fc5c8d5a..04a4f17423 100644
> >> --- a/tools/virtiofsd/fuse_lowlevel.c
> >> +++ b/tools/virtiofsd/fuse_lowlevel.c
> >> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
> >>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> >>      }
> >> +    if (arg->flags & FUSE_PERFILE_DAX) {
> >> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> >> +    }
> >>  #ifdef HAVE_SPLICE
> >>  #ifdef HAVE_VMSPLICE
> >>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> >> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
> >>          outarg.flags |= FUSE_POSIX_ACL;
> >>      }
> >> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> >> +        outarg.flags |= FUSE_PERFILE_DAX;
> >> +    }
> >>      outarg.max_readahead = se->conn.max_readahead;
> >>      outarg.max_write = se->conn.max_write;
> >>      if (se->conn.max_background >= (1 << 16)) {
> >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >> index e170b17adb..5b6228210f 100644
> >> --- a/tools/virtiofsd/passthrough_ll.c
> >> +++ b/tools/virtiofsd/passthrough_ll.c
> >> @@ -53,8 +53,10 @@
> >>  #include <sys/syscall.h>
> >>  #include <sys/wait.h>
> >>  #include <sys/xattr.h>
> >> +#include <sys/vfs.h>
> >>  #include <syslog.h>
> >>  #include <linux/fs.h>
> >> +#include <linux/magic.h>
> >>  
> >>  #include "qemu/cutils.h"
> >>  #include "passthrough_helpers.h"
> >> @@ -136,6 +138,13 @@ enum {
> >>      SANDBOX_CHROOT,
> >>  };
> >>  
> >> +/* capability of storing DAX flag persistently */
> >> +enum {
> >> +    DAX_CAP_NONE,  /* not supported */
> >> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> >> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> >> +};
> >> +
> >>  typedef struct xattr_map_entry {
> >>      char *key;
> >>      char *prepend;
> >> @@ -161,6 +170,7 @@ struct lo_data {
> >>      int readdirplus_clear;
> >>      int allow_direct_io;
> >>      int announce_submounts;
> >> +    int perfile_dax_cap; /* capability of backend fs */
> >>      bool use_statx;
> >>      struct lo_inode root;
> >>      GHashTable *inodes; /* protected by lo->mutex */
> >> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> >>          lo->killpriv_v2 = 0;
> >>      }
> >> +
> >> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >> +        conn->want |= FUSE_CAP_PERFILE_DAX;
> >> +    }
> >>  }
> >>  
> >>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> >> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >>      int fd, res;
> >>      struct stat stat;
> >>      uint64_t mnt_id;
> >> +    struct statfs statfs;
> >>  
> >>      fd = open("/", O_PATH);
> >>      if (fd == -1) {
> >> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >>          root->posix_locks = g_hash_table_new_full(
> >>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
> >>      }
> >> +
> >> +    /*
> >> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> >> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> >> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> >> +     * FS_IOC_FSG[S]ETXATTR ioctl.
> >> +     */
> >> +    res = fstatfs(fd, &statfs);
> >> +    if (!res) {
> >> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> >> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> >> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> >> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> >> +    }
> >>  }
> >>  
> >>  static guint lo_key_hash(gconstpointer key)
> >> -- 
> >> 2.27.0
> >>
> >> _______________________________________________
> >> Virtio-fs mailing list
> >> Virtio-fs@redhat.com
> >> https://listman.redhat.com/mailman/listinfo/virtio-fs
> >>
> 
> -- 
> Thanks,
> Jeffle
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT
@ 2021-08-19 13:57           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 151+ messages in thread
From: Dr. David Alan Gilbert @ 2021-08-19 13:57 UTC (permalink / raw)
  To: JeffleXu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal

* JeffleXu (jefflexu@linux.alibaba.com) wrote:
> 
> 
> On 8/18/21 1:15 AM, Dr. David Alan Gilbert wrote:
> > * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >> In FUSE_INIT negotiating phase, server/client should advertise if it
> >> supports per-file DAX.
> >>
> >> Once advertising support for per-file DAX feature, virtiofsd should
> >> support storing FS_DAX_FL flag persistently passed by
> >> FS_IOC_SETFLAGS/FS_IOC_FSSETXATTR ioctl, and set FUSE_ATTR_DAX in
> >> FUSE_LOOKUP accordingly if the file is capable of per-file DAX.
> >>
> >> Currently only ext4/xfs since linux kernel v5.8 support storing
> >> FS_DAX_FL flag persistently, and thus advertise support for per-file
> >> DAX feature only when the backend fs type is ext4 and xfs.
> > 
> > I'm a little worried about the meaning of the flags we're storing and
> > the fact we're storing them in the normal host DAX flags.
> > 
> > Doesn't this mean that we're using a single host flag to mean:
> >   a) It can be mapped as DAX on the host if it was a real DAX device
> >   b) We can map it as DAX inside the guest with virtiofs?
> 
> Yes the side effect is that the host file is also dax enabled if the
> backend fs is built upon real nvdimm device.
> 
> The rationale here is that, fuse daemon shall be capable of *marking*
> the file as dax capable *persistently*, so that it can be informed that
> this file is capable of dax later.

Right, so my worry here is that the untrusted guest changes both it's
own behaviour (fine) and also the behaviour of the host (less fine).

> I'm not sure if xattr (extent attribute) is a better option for this?

Well, if you used an xattr for it, it wouldn't clash with whatever the
host did (especially if it used the xattr mapping).

Dave

> 
> > 
> > what happens when we're using usernamespaces for the guest?
> > 
> > Dave
> > 
> > 
> >> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >> ---
> >>  tools/virtiofsd/fuse_common.h    |  5 +++++
> >>  tools/virtiofsd/fuse_lowlevel.c  |  6 ++++++
> >>  tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++++++++++
> >>  3 files changed, 40 insertions(+)
> >>
> >> diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
> >> index 8a75729be9..ee6fc64c23 100644
> >> --- a/tools/virtiofsd/fuse_common.h
> >> +++ b/tools/virtiofsd/fuse_common.h
> >> @@ -372,6 +372,11 @@ struct fuse_file_info {
> >>   */
> >>  #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
> >>  
> >> +/**
> >> + * Indicates support for per-file DAX.
> >> + */
> >> +#define FUSE_CAP_PERFILE_DAX (1 << 29)
> >> +
> >>  /**
> >>   * Ioctl flags
> >>   *
> >> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> >> index 50fc5c8d5a..04a4f17423 100644
> >> --- a/tools/virtiofsd/fuse_lowlevel.c
> >> +++ b/tools/virtiofsd/fuse_lowlevel.c
> >> @@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >>      if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
> >>          se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
> >>      }
> >> +    if (arg->flags & FUSE_PERFILE_DAX) {
> >> +        se->conn.capable |= FUSE_CAP_PERFILE_DAX;
> >> +    }
> >>  #ifdef HAVE_SPLICE
> >>  #ifdef HAVE_VMSPLICE
> >>      se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
> >> @@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
> >>      if (se->conn.want & FUSE_CAP_POSIX_ACL) {
> >>          outarg.flags |= FUSE_POSIX_ACL;
> >>      }
> >> +    if (se->op.ioctl && (se->conn.want & FUSE_CAP_PERFILE_DAX)) {
> >> +        outarg.flags |= FUSE_PERFILE_DAX;
> >> +    }
> >>      outarg.max_readahead = se->conn.max_readahead;
> >>      outarg.max_write = se->conn.max_write;
> >>      if (se->conn.max_background >= (1 << 16)) {
> >> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >> index e170b17adb..5b6228210f 100644
> >> --- a/tools/virtiofsd/passthrough_ll.c
> >> +++ b/tools/virtiofsd/passthrough_ll.c
> >> @@ -53,8 +53,10 @@
> >>  #include <sys/syscall.h>
> >>  #include <sys/wait.h>
> >>  #include <sys/xattr.h>
> >> +#include <sys/vfs.h>
> >>  #include <syslog.h>
> >>  #include <linux/fs.h>
> >> +#include <linux/magic.h>
> >>  
> >>  #include "qemu/cutils.h"
> >>  #include "passthrough_helpers.h"
> >> @@ -136,6 +138,13 @@ enum {
> >>      SANDBOX_CHROOT,
> >>  };
> >>  
> >> +/* capability of storing DAX flag persistently */
> >> +enum {
> >> +    DAX_CAP_NONE,  /* not supported */
> >> +    DAX_CAP_FLAGS, /* stored in flags (FS_IOC_GETFLAGS/FS_IOC_SETFLAGS) */
> >> +    DAX_CAP_XATTR, /* stored in xflags (FS_IOC_FSGETXATTR/FS_IOC_FSSETXATTR) */
> >> +};
> >> +
> >>  typedef struct xattr_map_entry {
> >>      char *key;
> >>      char *prepend;
> >> @@ -161,6 +170,7 @@ struct lo_data {
> >>      int readdirplus_clear;
> >>      int allow_direct_io;
> >>      int announce_submounts;
> >> +    int perfile_dax_cap; /* capability of backend fs */
> >>      bool use_statx;
> >>      struct lo_inode root;
> >>      GHashTable *inodes; /* protected by lo->mutex */
> >> @@ -703,6 +713,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>          conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
> >>          lo->killpriv_v2 = 0;
> >>      }
> >> +
> >> +    if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >> +        conn->want |= FUSE_CAP_PERFILE_DAX;
> >> +    }
> >>  }
> >>  
> >>  static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
> >> @@ -3800,6 +3814,7 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >>      int fd, res;
> >>      struct stat stat;
> >>      uint64_t mnt_id;
> >> +    struct statfs statfs;
> >>  
> >>      fd = open("/", O_PATH);
> >>      if (fd == -1) {
> >> @@ -3826,6 +3841,20 @@ static void setup_root(struct lo_data *lo, struct lo_inode *root)
> >>          root->posix_locks = g_hash_table_new_full(
> >>              g_direct_hash, g_direct_equal, NULL, posix_locks_value_destroy);
> >>      }
> >> +
> >> +    /*
> >> +     * Currently only ext4/xfs since linux kernel v5.8 support storing
> >> +     * FS_DAX_FL flag persistently. Ext4 accesses this flag through
> >> +     * FS_IOC_G[S]ETFLAGS ioctl, while xfs accesses this flag through
> >> +     * FS_IOC_FSG[S]ETXATTR ioctl.
> >> +     */
> >> +    res = fstatfs(fd, &statfs);
> >> +    if (!res) {
> >> +	if (statfs.f_type == EXT4_SUPER_MAGIC)
> >> +	    lo->perfile_dax_cap = DAX_CAP_FLAGS;
> >> +	else if (statfs.f_type == XFS_SUPER_MAGIC)
> >> +	    lo->perfile_dax_cap = DAX_CAP_XATTR;
> >> +    }
> >>  }
> >>  
> >>  static guint lo_key_hash(gconstpointer key)
> >> -- 
> >> 2.27.0
> >>
> >> _______________________________________________
> >> Virtio-fs mailing list
> >> Virtio-fs@redhat.com
> >> https://listman.redhat.com/mailman/listinfo/virtio-fs
> >>
> 
> -- 
> Thanks,
> Jeffle
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
  2021-08-19 13:08           ` Dr. David Alan Gilbert
  (?)
@ 2021-08-20  5:03             ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-20  5:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: vgoyal, stefanha, miklos, linux-fsdevel, virtio-fs, joseph.qi,
	virtualization



On 8/19/21 9:08 PM, Dr. David Alan Gilbert wrote:
> * JeffleXu (jefflexu@linux.alibaba.com) wrote:
>>
>>
>> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
>>> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>>>> For passthrough, when the corresponding virtiofs in guest is mounted
>>>> with '-o dax=inode', advertise that the file is capable of per-file
>>>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
>>>>
>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>> ---
>>>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>>>>  1 file changed, 43 insertions(+)
>>>>
>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>> index 5b6228210f..4cbd904248 100644
>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>> @@ -171,6 +171,7 @@ struct lo_data {
>>>>      int allow_direct_io;
>>>>      int announce_submounts;
>>>>      int perfile_dax_cap; /* capability of backend fs */
>>>> +    bool perfile_dax; /* enable per-file DAX or not */
>>>>      bool use_statx;
>>>>      struct lo_inode root;
>>>>      GHashTable *inodes; /* protected by lo->mutex */
>>>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>>>  
>>>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>>>>          conn->want |= FUSE_CAP_PERFILE_DAX;
>>>> +	lo->perfile_dax = 1;
>>>> +    }
>>>> +    else {
>>>> +	lo->perfile_dax = 0;
>>>>      }
>>>>  }
>>>>  
>>>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>>>      return 0;
>>>>  }
>>>>  
>>>> +/*
>>>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
>>>> + * enabled for this file.
>>>> + */
>>>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
>>>> +				 const char *name)
>>>> +{
>>>> +    int res, fd;
>>>> +    int ret = false;;
>>>> +    unsigned int attr;
>>>> +    struct fsxattr xattr;
>>>> +
>>>> +    if (!lo->perfile_dax)
>>>> +	return false;
>>>> +
>>>> +    /* Open file without O_PATH, so that ioctl can be called. */
>>>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
>>>> +    if (fd == -1)
>>>> +        return false;
>>>
>>> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
>>> might stumble into a /dev node or something else we're not allowed to
>>> open?
>>
>> As far as I know, virtiofsd will pivot_root/chroot to the source
>> directory, and can only access files inside the source directory
>> specified by "-o source=". Then where do these unexpected files come
>> from? Besides, fd opened without O_PATH here is temporary and used for
>> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
>> function returns.
> 
> The guest is still allowed to mknod.
> See:
>    https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html
> 
> also it's legal to expose a root filesystem for a guest; the virtiofsd
> should *never* open a device other than O_PATH - and it's really tricky
> to do a check to see if it is a device in a race-free way.
> 

Fine. Got it. However the returned fd (opened without O_PATH) is only
used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
for special device files, these two ioctls should return -ENOTTY.

If it's really a security issue, then lo_inode_open() could be used to
get a temporary fd, i.e., check if it's a special file before opening.
After all, FUSE_OPEN also handles in this way. Besides, I can't
understand what "race-free way" means.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-20  5:03             ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-20  5:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: miklos, virtualization, virtio-fs, joseph.qi, stefanha,
	linux-fsdevel, vgoyal



On 8/19/21 9:08 PM, Dr. David Alan Gilbert wrote:
> * JeffleXu (jefflexu@linux.alibaba.com) wrote:
>>
>>
>> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
>>> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>>>> For passthrough, when the corresponding virtiofs in guest is mounted
>>>> with '-o dax=inode', advertise that the file is capable of per-file
>>>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
>>>>
>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>> ---
>>>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>>>>  1 file changed, 43 insertions(+)
>>>>
>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>> index 5b6228210f..4cbd904248 100644
>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>> @@ -171,6 +171,7 @@ struct lo_data {
>>>>      int allow_direct_io;
>>>>      int announce_submounts;
>>>>      int perfile_dax_cap; /* capability of backend fs */
>>>> +    bool perfile_dax; /* enable per-file DAX or not */
>>>>      bool use_statx;
>>>>      struct lo_inode root;
>>>>      GHashTable *inodes; /* protected by lo->mutex */
>>>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>>>  
>>>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>>>>          conn->want |= FUSE_CAP_PERFILE_DAX;
>>>> +	lo->perfile_dax = 1;
>>>> +    }
>>>> +    else {
>>>> +	lo->perfile_dax = 0;
>>>>      }
>>>>  }
>>>>  
>>>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>>>      return 0;
>>>>  }
>>>>  
>>>> +/*
>>>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
>>>> + * enabled for this file.
>>>> + */
>>>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
>>>> +				 const char *name)
>>>> +{
>>>> +    int res, fd;
>>>> +    int ret = false;;
>>>> +    unsigned int attr;
>>>> +    struct fsxattr xattr;
>>>> +
>>>> +    if (!lo->perfile_dax)
>>>> +	return false;
>>>> +
>>>> +    /* Open file without O_PATH, so that ioctl can be called. */
>>>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
>>>> +    if (fd == -1)
>>>> +        return false;
>>>
>>> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
>>> might stumble into a /dev node or something else we're not allowed to
>>> open?
>>
>> As far as I know, virtiofsd will pivot_root/chroot to the source
>> directory, and can only access files inside the source directory
>> specified by "-o source=". Then where do these unexpected files come
>> from? Besides, fd opened without O_PATH here is temporary and used for
>> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
>> function returns.
> 
> The guest is still allowed to mknod.
> See:
>    https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html
> 
> also it's legal to expose a root filesystem for a guest; the virtiofsd
> should *never* open a device other than O_PATH - and it's really tricky
> to do a check to see if it is a device in a race-free way.
> 

Fine. Got it. However the returned fd (opened without O_PATH) is only
used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
for special device files, these two ioctls should return -ENOTTY.

If it's really a security issue, then lo_inode_open() could be used to
get a temporary fd, i.e., check if it's a special file before opening.
After all, FUSE_OPEN also handles in this way. Besides, I can't
understand what "race-free way" means.


-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-20  5:03             ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-08-20  5:03 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal



On 8/19/21 9:08 PM, Dr. David Alan Gilbert wrote:
> * JeffleXu (jefflexu@linux.alibaba.com) wrote:
>>
>>
>> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
>>> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
>>>> For passthrough, when the corresponding virtiofs in guest is mounted
>>>> with '-o dax=inode', advertise that the file is capable of per-file
>>>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
>>>>
>>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
>>>> ---
>>>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
>>>>  1 file changed, 43 insertions(+)
>>>>
>>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
>>>> index 5b6228210f..4cbd904248 100644
>>>> --- a/tools/virtiofsd/passthrough_ll.c
>>>> +++ b/tools/virtiofsd/passthrough_ll.c
>>>> @@ -171,6 +171,7 @@ struct lo_data {
>>>>      int allow_direct_io;
>>>>      int announce_submounts;
>>>>      int perfile_dax_cap; /* capability of backend fs */
>>>> +    bool perfile_dax; /* enable per-file DAX or not */
>>>>      bool use_statx;
>>>>      struct lo_inode root;
>>>>      GHashTable *inodes; /* protected by lo->mutex */
>>>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
>>>>  
>>>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
>>>>          conn->want |= FUSE_CAP_PERFILE_DAX;
>>>> +	lo->perfile_dax = 1;
>>>> +    }
>>>> +    else {
>>>> +	lo->perfile_dax = 0;
>>>>      }
>>>>  }
>>>>  
>>>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
>>>>      return 0;
>>>>  }
>>>>  
>>>> +/*
>>>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
>>>> + * enabled for this file.
>>>> + */
>>>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
>>>> +				 const char *name)
>>>> +{
>>>> +    int res, fd;
>>>> +    int ret = false;;
>>>> +    unsigned int attr;
>>>> +    struct fsxattr xattr;
>>>> +
>>>> +    if (!lo->perfile_dax)
>>>> +	return false;
>>>> +
>>>> +    /* Open file without O_PATH, so that ioctl can be called. */
>>>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
>>>> +    if (fd == -1)
>>>> +        return false;
>>>
>>> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
>>> might stumble into a /dev node or something else we're not allowed to
>>> open?
>>
>> As far as I know, virtiofsd will pivot_root/chroot to the source
>> directory, and can only access files inside the source directory
>> specified by "-o source=". Then where do these unexpected files come
>> from? Besides, fd opened without O_PATH here is temporary and used for
>> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
>> function returns.
> 
> The guest is still allowed to mknod.
> See:
>    https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html
> 
> also it's legal to expose a root filesystem for a guest; the virtiofsd
> should *never* open a device other than O_PATH - and it's really tricky
> to do a check to see if it is a device in a race-free way.
> 

Fine. Got it. However the returned fd (opened without O_PATH) is only
used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
for special device files, these two ioctls should return -ENOTTY.

If it's really a security issue, then lo_inode_open() could be used to
get a temporary fd, i.e., check if it's a special file before opening.
After all, FUSE_OPEN also handles in this way. Besides, I can't
understand what "race-free way" means.


-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
  2021-08-20  5:03             ` JeffleXu
  (?)
@ 2021-08-24 10:15               ` Greg Kurz
  -1 siblings, 0 replies; 151+ messages in thread
From: Greg Kurz @ 2021-08-24 10:15 UTC (permalink / raw)
  To: JeffleXu
  Cc: Dr. David Alan Gilbert, miklos, virtualization, virtio-fs,
	joseph.qi, stefanha, linux-fsdevel, vgoyal

On Fri, 20 Aug 2021 13:03:23 +0800
JeffleXu <jefflexu@linux.alibaba.com> wrote:

> 
> 
> On 8/19/21 9:08 PM, Dr. David Alan Gilbert wrote:
> > * JeffleXu (jefflexu@linux.alibaba.com) wrote:
> >>
> >>
> >> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> >>> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >>>> For passthrough, when the corresponding virtiofs in guest is mounted
> >>>> with '-o dax=inode', advertise that the file is capable of per-file
> >>>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> >>>>
> >>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>>> ---
> >>>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 43 insertions(+)
> >>>>
> >>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >>>> index 5b6228210f..4cbd904248 100644
> >>>> --- a/tools/virtiofsd/passthrough_ll.c
> >>>> +++ b/tools/virtiofsd/passthrough_ll.c
> >>>> @@ -171,6 +171,7 @@ struct lo_data {
> >>>>      int allow_direct_io;
> >>>>      int announce_submounts;
> >>>>      int perfile_dax_cap; /* capability of backend fs */
> >>>> +    bool perfile_dax; /* enable per-file DAX or not */
> >>>>      bool use_statx;
> >>>>      struct lo_inode root;
> >>>>      GHashTable *inodes; /* protected by lo->mutex */
> >>>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>>>  
> >>>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >>>>          conn->want |= FUSE_CAP_PERFILE_DAX;
> >>>> +	lo->perfile_dax = 1;
> >>>> +    }
> >>>> +    else {
> >>>> +	lo->perfile_dax = 0;
> >>>>      }
> >>>>  }
> >>>>  
> >>>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> >>>>      return 0;
> >>>>  }
> >>>>  
> >>>> +/*
> >>>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> >>>> + * enabled for this file.
> >>>> + */
> >>>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> >>>> +				 const char *name)
> >>>> +{
> >>>> +    int res, fd;
> >>>> +    int ret = false;;
> >>>> +    unsigned int attr;
> >>>> +    struct fsxattr xattr;
> >>>> +
> >>>> +    if (!lo->perfile_dax)
> >>>> +	return false;
> >>>> +
> >>>> +    /* Open file without O_PATH, so that ioctl can be called. */
> >>>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> >>>> +    if (fd == -1)
> >>>> +        return false;
> >>>
> >>> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> >>> might stumble into a /dev node or something else we're not allowed to
> >>> open?
> >>
> >> As far as I know, virtiofsd will pivot_root/chroot to the source
> >> directory, and can only access files inside the source directory
> >> specified by "-o source=". Then where do these unexpected files come
> >> from? Besides, fd opened without O_PATH here is temporary and used for
> >> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
> >> function returns.
> > 
> > The guest is still allowed to mknod.
> > See:
> >    https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html
> > 
> > also it's legal to expose a root filesystem for a guest; the virtiofsd
> > should *never* open a device other than O_PATH - and it's really tricky
> > to do a check to see if it is a device in a race-free way.
> > 
> 
> Fine. Got it. However the returned fd (opened without O_PATH) is only
> used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
> for special device files, these two ioctls should return -ENOTTY.
> 

The actual problem is that a FIFO will cause openat() to block until
the other end of the FIFO is open for writing...

> If it's really a security issue, then lo_inode_open() could be used to

... and cause a DoS on virtiofsd. So yes, this is a security issue and
lo_inode_open() was introduced specifically to handle this.

> get a temporary fd, i.e., check if it's a special file before opening.
> After all, FUSE_OPEN also handles in this way. Besides, I can't
> understand what "race-free way" means.
> 

"race-free way" means a way that guarantees that file type
cannot change between the time you check it and the time
you open it (TOCTOU error). For example, doing a plain stat(),
checking st_mode and proceeding to open() is wrong : nothing
prevents the file to be unlinked and replaced by something
else between stat() and open().

We avoid that by keeping O_PATH fds around and using
lo_inode_open() instead of openat().

In your case, it seems that you should do the checking after
you have an actual lo_inode for the target file, and pass
that to lo_should_enable_dax() instead of the parent lo_inode
and target name.

Cheers,

--
Greg


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-24 10:15               ` Greg Kurz
  0 siblings, 0 replies; 151+ messages in thread
From: Greg Kurz @ 2021-08-24 10:15 UTC (permalink / raw)
  To: JeffleXu
  Cc: miklos, Dr. David Alan Gilbert, virtualization, virtio-fs,
	joseph.qi, stefanha, linux-fsdevel, vgoyal

On Fri, 20 Aug 2021 13:03:23 +0800
JeffleXu <jefflexu@linux.alibaba.com> wrote:

> 
> 
> On 8/19/21 9:08 PM, Dr. David Alan Gilbert wrote:
> > * JeffleXu (jefflexu@linux.alibaba.com) wrote:
> >>
> >>
> >> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> >>> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >>>> For passthrough, when the corresponding virtiofs in guest is mounted
> >>>> with '-o dax=inode', advertise that the file is capable of per-file
> >>>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> >>>>
> >>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>>> ---
> >>>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 43 insertions(+)
> >>>>
> >>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >>>> index 5b6228210f..4cbd904248 100644
> >>>> --- a/tools/virtiofsd/passthrough_ll.c
> >>>> +++ b/tools/virtiofsd/passthrough_ll.c
> >>>> @@ -171,6 +171,7 @@ struct lo_data {
> >>>>      int allow_direct_io;
> >>>>      int announce_submounts;
> >>>>      int perfile_dax_cap; /* capability of backend fs */
> >>>> +    bool perfile_dax; /* enable per-file DAX or not */
> >>>>      bool use_statx;
> >>>>      struct lo_inode root;
> >>>>      GHashTable *inodes; /* protected by lo->mutex */
> >>>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>>>  
> >>>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >>>>          conn->want |= FUSE_CAP_PERFILE_DAX;
> >>>> +	lo->perfile_dax = 1;
> >>>> +    }
> >>>> +    else {
> >>>> +	lo->perfile_dax = 0;
> >>>>      }
> >>>>  }
> >>>>  
> >>>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> >>>>      return 0;
> >>>>  }
> >>>>  
> >>>> +/*
> >>>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> >>>> + * enabled for this file.
> >>>> + */
> >>>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> >>>> +				 const char *name)
> >>>> +{
> >>>> +    int res, fd;
> >>>> +    int ret = false;;
> >>>> +    unsigned int attr;
> >>>> +    struct fsxattr xattr;
> >>>> +
> >>>> +    if (!lo->perfile_dax)
> >>>> +	return false;
> >>>> +
> >>>> +    /* Open file without O_PATH, so that ioctl can be called. */
> >>>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> >>>> +    if (fd == -1)
> >>>> +        return false;
> >>>
> >>> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> >>> might stumble into a /dev node or something else we're not allowed to
> >>> open?
> >>
> >> As far as I know, virtiofsd will pivot_root/chroot to the source
> >> directory, and can only access files inside the source directory
> >> specified by "-o source=". Then where do these unexpected files come
> >> from? Besides, fd opened without O_PATH here is temporary and used for
> >> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
> >> function returns.
> > 
> > The guest is still allowed to mknod.
> > See:
> >    https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html
> > 
> > also it's legal to expose a root filesystem for a guest; the virtiofsd
> > should *never* open a device other than O_PATH - and it's really tricky
> > to do a check to see if it is a device in a race-free way.
> > 
> 
> Fine. Got it. However the returned fd (opened without O_PATH) is only
> used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
> for special device files, these two ioctls should return -ENOTTY.
> 

The actual problem is that a FIFO will cause openat() to block until
the other end of the FIFO is open for writing...

> If it's really a security issue, then lo_inode_open() could be used to

... and cause a DoS on virtiofsd. So yes, this is a security issue and
lo_inode_open() was introduced specifically to handle this.

> get a temporary fd, i.e., check if it's a special file before opening.
> After all, FUSE_OPEN also handles in this way. Besides, I can't
> understand what "race-free way" means.
> 

"race-free way" means a way that guarantees that file type
cannot change between the time you check it and the time
you open it (TOCTOU error). For example, doing a plain stat(),
checking st_mode and proceeding to open() is wrong : nothing
prevents the file to be unlinked and replaced by something
else between stat() and open().

We avoid that by keeping O_PATH fds around and using
lo_inode_open() instead of openat().

In your case, it seems that you should do the checking after
you have an actual lo_inode for the target file, and pass
that to lo_should_enable_dax() instead of the parent lo_inode
and target name.

Cheers,

--
Greg

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-08-24 10:15               ` Greg Kurz
  0 siblings, 0 replies; 151+ messages in thread
From: Greg Kurz @ 2021-08-24 10:15 UTC (permalink / raw)
  To: JeffleXu
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal

On Fri, 20 Aug 2021 13:03:23 +0800
JeffleXu <jefflexu@linux.alibaba.com> wrote:

> 
> 
> On 8/19/21 9:08 PM, Dr. David Alan Gilbert wrote:
> > * JeffleXu (jefflexu@linux.alibaba.com) wrote:
> >>
> >>
> >> On 8/18/21 3:00 AM, Dr. David Alan Gilbert wrote:
> >>> * Jeffle Xu (jefflexu@linux.alibaba.com) wrote:
> >>>> For passthrough, when the corresponding virtiofs in guest is mounted
> >>>> with '-o dax=inode', advertise that the file is capable of per-file
> >>>> DAX if the inode in the backend fs is marked with FS_DAX_FL flag.
> >>>>
> >>>> Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
> >>>> ---
> >>>>  tools/virtiofsd/passthrough_ll.c | 43 ++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 43 insertions(+)
> >>>>
> >>>> diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
> >>>> index 5b6228210f..4cbd904248 100644
> >>>> --- a/tools/virtiofsd/passthrough_ll.c
> >>>> +++ b/tools/virtiofsd/passthrough_ll.c
> >>>> @@ -171,6 +171,7 @@ struct lo_data {
> >>>>      int allow_direct_io;
> >>>>      int announce_submounts;
> >>>>      int perfile_dax_cap; /* capability of backend fs */
> >>>> +    bool perfile_dax; /* enable per-file DAX or not */
> >>>>      bool use_statx;
> >>>>      struct lo_inode root;
> >>>>      GHashTable *inodes; /* protected by lo->mutex */
> >>>> @@ -716,6 +717,10 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn)
> >>>>  
> >>>>      if (conn->capable & FUSE_CAP_PERFILE_DAX && lo->perfile_dax_cap ) {
> >>>>          conn->want |= FUSE_CAP_PERFILE_DAX;
> >>>> +	lo->perfile_dax = 1;
> >>>> +    }
> >>>> +    else {
> >>>> +	lo->perfile_dax = 0;
> >>>>      }
> >>>>  }
> >>>>  
> >>>> @@ -983,6 +988,41 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
> >>>>      return 0;
> >>>>  }
> >>>>  
> >>>> +/*
> >>>> + * If the file is marked with FS_DAX_FL or FS_XFLAG_DAX, then DAX should be
> >>>> + * enabled for this file.
> >>>> + */
> >>>> +static bool lo_should_enable_dax(struct lo_data *lo, struct lo_inode *dir,
> >>>> +				 const char *name)
> >>>> +{
> >>>> +    int res, fd;
> >>>> +    int ret = false;;
> >>>> +    unsigned int attr;
> >>>> +    struct fsxattr xattr;
> >>>> +
> >>>> +    if (!lo->perfile_dax)
> >>>> +	return false;
> >>>> +
> >>>> +    /* Open file without O_PATH, so that ioctl can be called. */
> >>>> +    fd = openat(dir->fd, name, O_NOFOLLOW);
> >>>> +    if (fd == -1)
> >>>> +        return false;
> >>>
> >>> Doesn't that defeat the whole benefit of using O_PATH - i.e. that we
> >>> might stumble into a /dev node or something else we're not allowed to
> >>> open?
> >>
> >> As far as I know, virtiofsd will pivot_root/chroot to the source
> >> directory, and can only access files inside the source directory
> >> specified by "-o source=". Then where do these unexpected files come
> >> from? Besides, fd opened without O_PATH here is temporary and used for
> >> FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl only. It's closed when the
> >> function returns.
> > 
> > The guest is still allowed to mknod.
> > See:
> >    https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg05461.html
> > 
> > also it's legal to expose a root filesystem for a guest; the virtiofsd
> > should *never* open a device other than O_PATH - and it's really tricky
> > to do a check to see if it is a device in a race-free way.
> > 
> 
> Fine. Got it. However the returned fd (opened without O_PATH) is only
> used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
> for special device files, these two ioctls should return -ENOTTY.
> 

The actual problem is that a FIFO will cause openat() to block until
the other end of the FIFO is open for writing...

> If it's really a security issue, then lo_inode_open() could be used to

... and cause a DoS on virtiofsd. So yes, this is a security issue and
lo_inode_open() was introduced specifically to handle this.

> get a temporary fd, i.e., check if it's a special file before opening.
> After all, FUSE_OPEN also handles in this way. Besides, I can't
> understand what "race-free way" means.
> 

"race-free way" means a way that guarantees that file type
cannot change between the time you check it and the time
you open it (TOCTOU error). For example, doing a plain stat(),
checking st_mode and proceeding to open() is wrong : nothing
prevents the file to be unlinked and replaced by something
else between stat() and open().

We avoid that by keeping O_PATH fds around and using
lo_inode_open() instead of openat().

In your case, it seems that you should do the checking after
you have an actual lo_inode for the target file, and pass
that to lo_should_enable_dax() instead of the parent lo_inode
and target name.

Cheers,

--
Greg


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 14:57         ` Vivek Goyal
                           ` (2 preceding siblings ...)
  (?)
@ 2021-08-30 23:31         ` Liu Bo
  -1 siblings, 0 replies; 151+ messages in thread
From: Liu Bo @ 2021-08-30 23:31 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Miklos Szeredi, virtio-fs-list, Joseph Qi, JeffleXu

On Tue, Aug 17, 2021 at 10:57:40AM -0400, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 09:22:53PM +0800, JeffleXu wrote:
> > 
> > 
> > On 8/17/21 8:39 PM, Vivek Goyal wrote:
> > > On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> > >> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> > >>>
> > >>> This patchset adds support of per-file DAX for virtiofs, which is
> > >>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> > >>
> > >> Can you please explain the background of this change in detail?
> > >>
> > >> Why would an admin want to enable DAX for a particular virtiofs file
> > >> and not for others?
> > > 
> > > Initially I thought that they needed it because they are downloading
> > > files on the fly from server. So they don't want to enable dax on the file
> > > till file is completely downloaded. 
> > 
> > Right, it's our initial requirement.
> > 
> > 
> > > But later I realized that they should
> > > be able to block in FUSE_SETUPMAPPING call and make sure associated
> > > file section has been downloaded before returning and solve the problem.
> > > So that can't be the primary reason.
> > 
> > Saying we want to access 4KB of one file inside guest, if it goes
> > through FUSE request routine, then the fuse daemon only need to download
> > this 4KB from remote server. But if it goes through DAX, then the fuse
> > daemon need to download the whole DAX window (e.g., 2MB) from remote
> > server, so called amplification. Maybe we could decrease the DAX window
> > size, but it's a trade off.
> 
> Downloading 2MB chunk should not be a big issue (IMHO). And if this
> turns out to be real concern, we could experiment with a smaller
> mapping granularity.
> 
> > 
> > > 
> > > Other reason mentioned I think was that only certain files benefit
> > > from DAX. But not much details are there after that. It will be nice
> > > to hear a more concrete use case and more details about this usage.
> > > 
> > 
> > Apart from our internal requirement, more fine grained control for DAX
> > shall be general and more flexible. Glad to hear more discussion from
> > community.
> 
> Sure it will be more general and flexible. But there needs to be 1-2
> good concrete use cases to justify additional complexity. And I don't
> think that so far a good use case has come forward.
>

Hi Vivek,

Our use case can be summarized like this, we need to share a readonly
fs image with _multiple_ guests while keeping the memory overhead as
low as possible.  More specifically, we have a userspace fs and a
tool[1] to split a ext4 directory into to metadata part and data part
in two individual files respectively, given the fact these two files
are supposed to be accessed multiple times and their sizes are
ususally larger than 2MB, it indeed matches what dax's best practice
hopes.

However, since these two files are sharing the same virtiofs mnt point
with non-fs-image files, we'd like to have a more fine-grained control
on the dax flag.

And please note that how guests would mount dax and use dax is under
our control, so it's server who will decide the dax flag.

AFAICS this patch set can let either server or guest make the
decision, but if most of us have concerns about the complexity brought
by mixing the two cases, I think we can go with only letting server
decide it by setting attr.flags.


[1]: https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md


> Thanks
> Vivek


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 14:08         ` [Virtio-fs] " Miklos Szeredi
  (?)
@ 2021-09-03  5:30           ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-03  5:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Vivek Goyal, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo



On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded.
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> That could be achieved with a plain fuse filesystem on the host (which
> will get 4k READ requests for accesses to mapped area inside guest).
> Since this can be done selectively for files which are not yet
> downloaded, the extra layer wouldn't be a performance problem.
> 
> Is there a reason why that wouldn't work?

I didn't realize this mechanism (working around from user space) before
sending this patch set.

After learning the virtualization and KVM stuffs, I find that, as Vivek
Goyal replied in [1], virtiofsd/qemu need to somehow hook the user page
fault and then download the remained part.

IMHO, this mechanism (as you proposed by implementing a plain fuse
filesystem on the host) seems a little bit sophisticated so far.


[1] https://lore.kernel.org/linux-fsdevel/YR08KnP8cO8LjKY7@redhat.com/


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-03  5:30           ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-03  5:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: virtualization, virtio-fs-list, Joseph Qi, Liu Bo,
	Stefan Hajnoczi, linux-fsdevel, Vivek Goyal



On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded.
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> That could be achieved with a plain fuse filesystem on the host (which
> will get 4k READ requests for accesses to mapped area inside guest).
> Since this can be done selectively for files which are not yet
> downloaded, the extra layer wouldn't be a performance problem.
> 
> Is there a reason why that wouldn't work?

I didn't realize this mechanism (working around from user space) before
sending this patch set.

After learning the virtualization and KVM stuffs, I find that, as Vivek
Goyal replied in [1], virtiofsd/qemu need to somehow hook the user page
fault and then download the remained part.

IMHO, this mechanism (as you proposed by implementing a plain fuse
filesystem on the host) seems a little bit sophisticated so far.


[1] https://lore.kernel.org/linux-fsdevel/YR08KnP8cO8LjKY7@redhat.com/


-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-03  5:30           ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-03  5:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal



On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are downloading
>>> files on the fly from server. So they don't want to enable dax on the file
>>> till file is completely downloaded.
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
> 
> That could be achieved with a plain fuse filesystem on the host (which
> will get 4k READ requests for accesses to mapped area inside guest).
> Since this can be done selectively for files which are not yet
> downloaded, the extra layer wouldn't be a performance problem.
> 
> Is there a reason why that wouldn't work?

I didn't realize this mechanism (working around from user space) before
sending this patch set.

After learning the virtualization and KVM stuffs, I find that, as Vivek
Goyal replied in [1], virtiofsd/qemu need to somehow hook the user page
fault and then download the remained part.

IMHO, this mechanism (as you proposed by implementing a plain fuse
filesystem on the host) seems a little bit sophisticated so far.


[1] https://lore.kernel.org/linux-fsdevel/YR08KnP8cO8LjKY7@redhat.com/


-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-09-03  5:30           ` JeffleXu
@ 2021-09-07 14:51             ` Miklos Szeredi
  -1 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-09-07 14:51 UTC (permalink / raw)
  To: JeffleXu
  Cc: Vivek Goyal, Stefan Hajnoczi, linux-fsdevel, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo

On Fri, 3 Sept 2021 at 07:31, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> > On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 8/17/21 8:39 PM, Vivek Goyal wrote:
> >>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> >>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>>
> >>>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>>
> >>>> Can you please explain the background of this change in detail?
> >>>>
> >>>> Why would an admin want to enable DAX for a particular virtiofs file
> >>>> and not for others?
> >>>
> >>> Initially I thought that they needed it because they are downloading
> >>> files on the fly from server. So they don't want to enable dax on the file
> >>> till file is completely downloaded.
> >>
> >> Right, it's our initial requirement.
> >>
> >>
> >>> But later I realized that they should
> >>> be able to block in FUSE_SETUPMAPPING call and make sure associated
> >>> file section has been downloaded before returning and solve the problem.
> >>> So that can't be the primary reason.
> >>
> >> Saying we want to access 4KB of one file inside guest, if it goes
> >> through FUSE request routine, then the fuse daemon only need to download
> >> this 4KB from remote server. But if it goes through DAX, then the fuse
> >> daemon need to download the whole DAX window (e.g., 2MB) from remote
> >> server, so called amplification. Maybe we could decrease the DAX window
> >> size, but it's a trade off.
> >
> > That could be achieved with a plain fuse filesystem on the host (which
> > will get 4k READ requests for accesses to mapped area inside guest).
> > Since this can be done selectively for files which are not yet
> > downloaded, the extra layer wouldn't be a performance problem.
> >
> > Is there a reason why that wouldn't work?
>
> I didn't realize this mechanism (working around from user space) before
> sending this patch set.
>
> After learning the virtualization and KVM stuffs, I find that, as Vivek
> Goyal replied in [1], virtiofsd/qemu need to somehow hook the user page
> fault and then download the remained part.
>
> IMHO, this mechanism (as you proposed by implementing a plain fuse
> filesystem on the host) seems a little bit sophisticated so far.


Agree.  Let's start with the simplest variant, which is the server
selectively enabling dax.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-07 14:51             ` Miklos Szeredi
  0 siblings, 0 replies; 151+ messages in thread
From: Miklos Szeredi @ 2021-09-07 14:51 UTC (permalink / raw)
  To: JeffleXu
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Vivek Goyal

On Fri, 3 Sept 2021 at 07:31, JeffleXu <jefflexu@linux.alibaba.com> wrote:
>
>
>
> On 8/17/21 10:08 PM, Miklos Szeredi wrote:
> > On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> On 8/17/21 8:39 PM, Vivek Goyal wrote:
> >>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
> >>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>>
> >>>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>>
> >>>> Can you please explain the background of this change in detail?
> >>>>
> >>>> Why would an admin want to enable DAX for a particular virtiofs file
> >>>> and not for others?
> >>>
> >>> Initially I thought that they needed it because they are downloading
> >>> files on the fly from server. So they don't want to enable dax on the file
> >>> till file is completely downloaded.
> >>
> >> Right, it's our initial requirement.
> >>
> >>
> >>> But later I realized that they should
> >>> be able to block in FUSE_SETUPMAPPING call and make sure associated
> >>> file section has been downloaded before returning and solve the problem.
> >>> So that can't be the primary reason.
> >>
> >> Saying we want to access 4KB of one file inside guest, if it goes
> >> through FUSE request routine, then the fuse daemon only need to download
> >> this 4KB from remote server. But if it goes through DAX, then the fuse
> >> daemon need to download the whole DAX window (e.g., 2MB) from remote
> >> server, so called amplification. Maybe we could decrease the DAX window
> >> size, but it's a trade off.
> >
> > That could be achieved with a plain fuse filesystem on the host (which
> > will get 4k READ requests for accesses to mapped area inside guest).
> > Since this can be done selectively for files which are not yet
> > downloaded, the extra layer wouldn't be a performance problem.
> >
> > Is there a reason why that wouldn't work?
>
> I didn't realize this mechanism (working around from user space) before
> sending this patch set.
>
> After learning the virtualization and KVM stuffs, I find that, as Vivek
> Goyal replied in [1], virtiofsd/qemu need to somehow hook the user page
> fault and then download the remained part.
>
> IMHO, this mechanism (as you proposed by implementing a plain fuse
> filesystem on the host) seems a little bit sophisticated so far.


Agree.  Let's start with the simplest variant, which is the server
selectively enabling dax.

Thanks,
Miklos


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
  2021-08-24 10:15               ` Greg Kurz
  (?)
@ 2021-09-08 10:34                 ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-08 10:34 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Dr. David Alan Gilbert, miklos, virtualization, virtio-fs,
	joseph.qi, stefanha, linux-fsdevel, vgoyal



On 8/24/21 6:15 PM, Greg Kurz wrote:
> On Fri, 20 Aug 2021 13:03:23 +0800
> JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>> Fine. Got it. However the returned fd (opened without O_PATH) is only
>> used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
>> for special device files, these two ioctls should return -ENOTTY.
>>
> 
> The actual problem is that a FIFO will cause openat() to block until
> the other end of the FIFO is open for writing...

Got it.

> 
>> If it's really a security issue, then lo_inode_open() could be used to
> 
> ... and cause a DoS on virtiofsd. So yes, this is a security issue and
> lo_inode_open() was introduced specifically to handle this.
> 
>> get a temporary fd, i.e., check if it's a special file before opening.
>> After all, FUSE_OPEN also handles in this way. Besides, I can't
>> understand what "race-free way" means.
>>
> 
> "race-free way" means a way that guarantees that file type
> cannot change between the time you check it and the time
> you open it (TOCTOU error). For example, doing a plain stat(),
> checking st_mode and proceeding to open() is wrong : nothing
> prevents the file to be unlinked and replaced by something
> else between stat() and open().
> 
> We avoid that by keeping O_PATH fds around and using
> lo_inode_open() instead of openat().

Thanks for the detailed explanation. Got it.

> 
> In your case, it seems that you should do the checking after
> you have an actual lo_inode for the target file, and pass
> that to lo_should_enable_dax() instead of the parent lo_inode
> and target name.
> 

Yes, that will be more reasonable. Thanks.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-09-08 10:34                 ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-08 10:34 UTC (permalink / raw)
  To: Greg Kurz
  Cc: miklos, Dr. David Alan Gilbert, virtualization, virtio-fs,
	joseph.qi, stefanha, linux-fsdevel, vgoyal



On 8/24/21 6:15 PM, Greg Kurz wrote:
> On Fri, 20 Aug 2021 13:03:23 +0800
> JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>> Fine. Got it. However the returned fd (opened without O_PATH) is only
>> used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
>> for special device files, these two ioctls should return -ENOTTY.
>>
> 
> The actual problem is that a FIFO will cause openat() to block until
> the other end of the FIFO is open for writing...

Got it.

> 
>> If it's really a security issue, then lo_inode_open() could be used to
> 
> ... and cause a DoS on virtiofsd. So yes, this is a security issue and
> lo_inode_open() was introduced specifically to handle this.
> 
>> get a temporary fd, i.e., check if it's a special file before opening.
>> After all, FUSE_OPEN also handles in this way. Besides, I can't
>> understand what "race-free way" means.
>>
> 
> "race-free way" means a way that guarantees that file type
> cannot change between the time you check it and the time
> you open it (TOCTOU error). For example, doing a plain stat(),
> checking st_mode and proceeding to open() is wrong : nothing
> prevents the file to be unlinked and replaced by something
> else between stat() and open().
> 
> We avoid that by keeping O_PATH fds around and using
> lo_inode_open() instead of openat().

Thanks for the detailed explanation. Got it.

> 
> In your case, it seems that you should do the checking after
> you have an actual lo_inode for the target file, and pass
> that to lo_should_enable_dax() instead of the parent lo_inode
> and target name.
> 

Yes, that will be more reasonable. Thanks.

-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP
@ 2021-09-08 10:34                 ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-08 10:34 UTC (permalink / raw)
  To: Greg Kurz
  Cc: miklos, virtualization, virtio-fs, joseph.qi, linux-fsdevel, vgoyal



On 8/24/21 6:15 PM, Greg Kurz wrote:
> On Fri, 20 Aug 2021 13:03:23 +0800
> JeffleXu <jefflexu@linux.alibaba.com> wrote:
>>
>> Fine. Got it. However the returned fd (opened without O_PATH) is only
>> used for FS_IOC_GETFLAGS/FS_IOC_FSGETXATTR ioctl, while in most cases
>> for special device files, these two ioctls should return -ENOTTY.
>>
> 
> The actual problem is that a FIFO will cause openat() to block until
> the other end of the FIFO is open for writing...

Got it.

> 
>> If it's really a security issue, then lo_inode_open() could be used to
> 
> ... and cause a DoS on virtiofsd. So yes, this is a security issue and
> lo_inode_open() was introduced specifically to handle this.
> 
>> get a temporary fd, i.e., check if it's a special file before opening.
>> After all, FUSE_OPEN also handles in this way. Besides, I can't
>> understand what "race-free way" means.
>>
> 
> "race-free way" means a way that guarantees that file type
> cannot change between the time you check it and the time
> you open it (TOCTOU error). For example, doing a plain stat(),
> checking st_mode and proceeding to open() is wrong : nothing
> prevents the file to be unlinked and replaced by something
> else between stat() and open().
> 
> We avoid that by keeping O_PATH fds around and using
> lo_inode_open() instead of openat().

Thanks for the detailed explanation. Got it.

> 
> In your case, it seems that you should do the checking after
> you have an actual lo_inode for the target file, and pass
> that to lo_should_enable_dax() instead of the parent lo_inode
> and target name.
> 

Yes, that will be more reasonable. Thanks.

-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-08-17 12:40       ` Vivek Goyal
  (?)
@ 2021-09-16  8:21         ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-16  8:21 UTC (permalink / raw)
  To: Vivek Goyal, Dr. David Alan Gilbert, Miklos Szeredi
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Liu Bo

Hi, I add some performance statistics below.


On 8/17/21 8:40 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>
>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>
>>> Can you please explain the background of this change in detail?
>>>
>>> Why would an admin want to enable DAX for a particular virtiofs file
>>> and not for others?
>>
>> Where we're contending on virtiofs dax cache size it makes a lot of
>> sense; it's quite expensive for us to map something into the cache
>> (especially if we push something else out), so selectively DAXing files
>> that are expected to be hot could help reduce cache churn.

Yes, the performance of dax can be limited when the DAX window is
limited, where dax window may be contended by multiple files.

I tested kernel compiling in virtiofs, emulating the scenario where a
lot of files contending dax window and triggering dax window reclaiming.

Environment setup:
- guest vCPU: 16
- time make vmlinux -j128

type    | cache  | cache-size | time
------- | ------ | ---------- | ----
non-dax | always |   --       | real 2m48.119s
dax     | always | 64M        | real 4m49.563s
dax     | always |   1G       | real 3m14.200s
dax     | always |   4G       | real 2m41.141s


It can be seen that there's performance drop, comparing to the normal
buffered IO, when dax window resource is restricted and dax window
relcaiming is triggered. The smaller the cache size is, the worse the
performance is. The performance drop can be alleviated and eliminated as
cache size increases.

Though we may not compile kernel in virtiofs, indeed we may access a lot
of small files in virtiofs and suffer this performance drop.


> In that case probaly we should just make DAX window larger. I assume

Yes, as the DAX window gets larger, it is less likely that we can run
short of dax window resource.

However it doesn't come without cost. 'struct page' descriptor for dax
window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
page descriptor is of 64 bytes size, assuming 4K sized page). That is,
every 1GB cache size will cost 16MB guest memory. As the cache size
increases, the memory footprint for page descriptors also increases,
which may offset the benefit of dax by eliminating guest page cache.

In summary, per-file dax feature tries to achieve a balance between
performance and memory overhead, by offering a finer gained control for
dax to users.


> that selecting which files to turn DAX on, will itself will not be
> a trivial. Not sure what heuristics are being deployed to determine
> that. Will like to know more about it.

Currently we enable dax for hot and large blob files, while disabling
dax for other miscellaneous small files.



-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-16  8:21         ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-16  8:21 UTC (permalink / raw)
  To: Vivek Goyal, Dr. David Alan Gilbert, Miklos Szeredi
  Cc: virtio-fs-list, Joseph Qi, linux-fsdevel, Liu Bo, virtualization

Hi, I add some performance statistics below.


On 8/17/21 8:40 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>
>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>
>>> Can you please explain the background of this change in detail?
>>>
>>> Why would an admin want to enable DAX for a particular virtiofs file
>>> and not for others?
>>
>> Where we're contending on virtiofs dax cache size it makes a lot of
>> sense; it's quite expensive for us to map something into the cache
>> (especially if we push something else out), so selectively DAXing files
>> that are expected to be hot could help reduce cache churn.

Yes, the performance of dax can be limited when the DAX window is
limited, where dax window may be contended by multiple files.

I tested kernel compiling in virtiofs, emulating the scenario where a
lot of files contending dax window and triggering dax window reclaiming.

Environment setup:
- guest vCPU: 16
- time make vmlinux -j128

type    | cache  | cache-size | time
------- | ------ | ---------- | ----
non-dax | always |   --       | real 2m48.119s
dax     | always | 64M        | real 4m49.563s
dax     | always |   1G       | real 3m14.200s
dax     | always |   4G       | real 2m41.141s


It can be seen that there's performance drop, comparing to the normal
buffered IO, when dax window resource is restricted and dax window
relcaiming is triggered. The smaller the cache size is, the worse the
performance is. The performance drop can be alleviated and eliminated as
cache size increases.

Though we may not compile kernel in virtiofs, indeed we may access a lot
of small files in virtiofs and suffer this performance drop.


> In that case probaly we should just make DAX window larger. I assume

Yes, as the DAX window gets larger, it is less likely that we can run
short of dax window resource.

However it doesn't come without cost. 'struct page' descriptor for dax
window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
page descriptor is of 64 bytes size, assuming 4K sized page). That is,
every 1GB cache size will cost 16MB guest memory. As the cache size
increases, the memory footprint for page descriptors also increases,
which may offset the benefit of dax by eliminating guest page cache.

In summary, per-file dax feature tries to achieve a balance between
performance and memory overhead, by offering a finer gained control for
dax to users.


> that selecting which files to turn DAX on, will itself will not be
> a trivial. Not sure what heuristics are being deployed to determine
> that. Will like to know more about it.

Currently we enable dax for hot and large blob files, while disabling
dax for other miscellaneous small files.



-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-16  8:21         ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-16  8:21 UTC (permalink / raw)
  To: Vivek Goyal, Dr. David Alan Gilbert, Miklos Szeredi
  Cc: virtio-fs-list, Joseph Qi, linux-fsdevel, virtualization

Hi, I add some performance statistics below.


On 8/17/21 8:40 PM, Vivek Goyal wrote:
> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>
>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>
>>> Can you please explain the background of this change in detail?
>>>
>>> Why would an admin want to enable DAX for a particular virtiofs file
>>> and not for others?
>>
>> Where we're contending on virtiofs dax cache size it makes a lot of
>> sense; it's quite expensive for us to map something into the cache
>> (especially if we push something else out), so selectively DAXing files
>> that are expected to be hot could help reduce cache churn.

Yes, the performance of dax can be limited when the DAX window is
limited, where dax window may be contended by multiple files.

I tested kernel compiling in virtiofs, emulating the scenario where a
lot of files contending dax window and triggering dax window reclaiming.

Environment setup:
- guest vCPU: 16
- time make vmlinux -j128

type    | cache  | cache-size | time
------- | ------ | ---------- | ----
non-dax | always |   --       | real 2m48.119s
dax     | always | 64M        | real 4m49.563s
dax     | always |   1G       | real 3m14.200s
dax     | always |   4G       | real 2m41.141s


It can be seen that there's performance drop, comparing to the normal
buffered IO, when dax window resource is restricted and dax window
relcaiming is triggered. The smaller the cache size is, the worse the
performance is. The performance drop can be alleviated and eliminated as
cache size increases.

Though we may not compile kernel in virtiofs, indeed we may access a lot
of small files in virtiofs and suffer this performance drop.


> In that case probaly we should just make DAX window larger. I assume

Yes, as the DAX window gets larger, it is less likely that we can run
short of dax window resource.

However it doesn't come without cost. 'struct page' descriptor for dax
window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
page descriptor is of 64 bytes size, assuming 4K sized page). That is,
every 1GB cache size will cost 16MB guest memory. As the cache size
increases, the memory footprint for page descriptors also increases,
which may offset the benefit of dax by eliminating guest page cache.

In summary, per-file dax feature tries to achieve a balance between
performance and memory overhead, by offering a finer gained control for
dax to users.


> that selecting which files to turn DAX on, will itself will not be
> a trivial. Not sure what heuristics are being deployed to determine
> that. Will like to know more about it.

Currently we enable dax for hot and large blob files, while disabling
dax for other miscellaneous small files.



-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-09-16  8:21         ` JeffleXu
  (?)
@ 2021-09-18  3:06           ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-18  3:06 UTC (permalink / raw)
  To: Vivek Goyal, Dr. David Alan Gilbert, Miklos Szeredi
  Cc: virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel, Liu Bo

Hi Vivek, Miklos,

On 9/16/21 4:21 PM, JeffleXu wrote:
> Hi, I add some performance statistics below.
> 
> 
> On 8/17/21 8:40 PM, Vivek Goyal wrote:
>> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>> sense; it's quite expensive for us to map something into the cache
>>> (especially if we push something else out), so selectively DAXing files
>>> that are expected to be hot could help reduce cache churn.
> 
> Yes, the performance of dax can be limited when the DAX window is
> limited, where dax window may be contended by multiple files.
> 
> I tested kernel compiling in virtiofs, emulating the scenario where a
> lot of files contending dax window and triggering dax window reclaiming.
> 
> Environment setup:
> - guest vCPU: 16
> - time make vmlinux -j128
> 
> type    | cache  | cache-size | time
> ------- | ------ | ---------- | ----
> non-dax | always |   --       | real 2m48.119s
> dax     | always | 64M        | real 4m49.563s
> dax     | always |   1G       | real 3m14.200s
> dax     | always |   4G       | real 2m41.141s
> 
> 
> It can be seen that there's performance drop, comparing to the normal
> buffered IO, when dax window resource is restricted and dax window
> relcaiming is triggered. The smaller the cache size is, the worse the
> performance is. The performance drop can be alleviated and eliminated as
> cache size increases.
> 
> Though we may not compile kernel in virtiofs, indeed we may access a lot
> of small files in virtiofs and suffer this performance drop.
> 
> 
>> In that case probaly we should just make DAX window larger. I assume
> 
> Yes, as the DAX window gets larger, it is less likely that we can run
> short of dax window resource.
> 
> However it doesn't come without cost. 'struct page' descriptor for dax
> window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
> page descriptor is of 64 bytes size, assuming 4K sized page). That is,
> every 1GB cache size will cost 16MB guest memory. As the cache size
> increases, the memory footprint for page descriptors also increases,
> which may offset the benefit of dax by eliminating guest page cache.
> 
> In summary, per-file dax feature tries to achieve a balance between
> performance and memory overhead, by offering a finer gained control for
> dax to users.
> 

I'm not sure if this is adequate for introducing per-file dax feature to
community? Need some feedback from the community.

And if that's the case, I also want to know if setting/clearing S_DAX
inside guest is needed, since in our internal using scenario, setting
S_DAX from host daemon is adequate. If setting/clearing S_DAX inside
guest can be omitted then, the negotiation during FUSE_INIT phase is not
needed either. After all we could completely rely on the FUSE_ATTR_DAX
flag feeded by host daemon to see if dax shall be enabled or not for
corresponding file. The whole patch set will also be somehow simper then.


-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-18  3:06           ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-18  3:06 UTC (permalink / raw)
  To: Vivek Goyal, Dr. David Alan Gilbert, Miklos Szeredi
  Cc: virtio-fs-list, Joseph Qi, linux-fsdevel, Liu Bo, virtualization

Hi Vivek, Miklos,

On 9/16/21 4:21 PM, JeffleXu wrote:
> Hi, I add some performance statistics below.
> 
> 
> On 8/17/21 8:40 PM, Vivek Goyal wrote:
>> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>> sense; it's quite expensive for us to map something into the cache
>>> (especially if we push something else out), so selectively DAXing files
>>> that are expected to be hot could help reduce cache churn.
> 
> Yes, the performance of dax can be limited when the DAX window is
> limited, where dax window may be contended by multiple files.
> 
> I tested kernel compiling in virtiofs, emulating the scenario where a
> lot of files contending dax window and triggering dax window reclaiming.
> 
> Environment setup:
> - guest vCPU: 16
> - time make vmlinux -j128
> 
> type    | cache  | cache-size | time
> ------- | ------ | ---------- | ----
> non-dax | always |   --       | real 2m48.119s
> dax     | always | 64M        | real 4m49.563s
> dax     | always |   1G       | real 3m14.200s
> dax     | always |   4G       | real 2m41.141s
> 
> 
> It can be seen that there's performance drop, comparing to the normal
> buffered IO, when dax window resource is restricted and dax window
> relcaiming is triggered. The smaller the cache size is, the worse the
> performance is. The performance drop can be alleviated and eliminated as
> cache size increases.
> 
> Though we may not compile kernel in virtiofs, indeed we may access a lot
> of small files in virtiofs and suffer this performance drop.
> 
> 
>> In that case probaly we should just make DAX window larger. I assume
> 
> Yes, as the DAX window gets larger, it is less likely that we can run
> short of dax window resource.
> 
> However it doesn't come without cost. 'struct page' descriptor for dax
> window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
> page descriptor is of 64 bytes size, assuming 4K sized page). That is,
> every 1GB cache size will cost 16MB guest memory. As the cache size
> increases, the memory footprint for page descriptors also increases,
> which may offset the benefit of dax by eliminating guest page cache.
> 
> In summary, per-file dax feature tries to achieve a balance between
> performance and memory overhead, by offering a finer gained control for
> dax to users.
> 

I'm not sure if this is adequate for introducing per-file dax feature to
community? Need some feedback from the community.

And if that's the case, I also want to know if setting/clearing S_DAX
inside guest is needed, since in our internal using scenario, setting
S_DAX from host daemon is adequate. If setting/clearing S_DAX inside
guest can be omitted then, the negotiation during FUSE_INIT phase is not
needed either. After all we could completely rely on the FUSE_ATTR_DAX
flag feeded by host daemon to see if dax shall be enabled or not for
corresponding file. The whole patch set will also be somehow simper then.


-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-18  3:06           ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-18  3:06 UTC (permalink / raw)
  To: Vivek Goyal, Dr. David Alan Gilbert, Miklos Szeredi
  Cc: virtio-fs-list, Joseph Qi, linux-fsdevel, virtualization

Hi Vivek, Miklos,

On 9/16/21 4:21 PM, JeffleXu wrote:
> Hi, I add some performance statistics below.
> 
> 
> On 8/17/21 8:40 PM, Vivek Goyal wrote:
>> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>> and not for others?
>>>
>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>> sense; it's quite expensive for us to map something into the cache
>>> (especially if we push something else out), so selectively DAXing files
>>> that are expected to be hot could help reduce cache churn.
> 
> Yes, the performance of dax can be limited when the DAX window is
> limited, where dax window may be contended by multiple files.
> 
> I tested kernel compiling in virtiofs, emulating the scenario where a
> lot of files contending dax window and triggering dax window reclaiming.
> 
> Environment setup:
> - guest vCPU: 16
> - time make vmlinux -j128
> 
> type    | cache  | cache-size | time
> ------- | ------ | ---------- | ----
> non-dax | always |   --       | real 2m48.119s
> dax     | always | 64M        | real 4m49.563s
> dax     | always |   1G       | real 3m14.200s
> dax     | always |   4G       | real 2m41.141s
> 
> 
> It can be seen that there's performance drop, comparing to the normal
> buffered IO, when dax window resource is restricted and dax window
> relcaiming is triggered. The smaller the cache size is, the worse the
> performance is. The performance drop can be alleviated and eliminated as
> cache size increases.
> 
> Though we may not compile kernel in virtiofs, indeed we may access a lot
> of small files in virtiofs and suffer this performance drop.
> 
> 
>> In that case probaly we should just make DAX window larger. I assume
> 
> Yes, as the DAX window gets larger, it is less likely that we can run
> short of dax window resource.
> 
> However it doesn't come without cost. 'struct page' descriptor for dax
> window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
> page descriptor is of 64 bytes size, assuming 4K sized page). That is,
> every 1GB cache size will cost 16MB guest memory. As the cache size
> increases, the memory footprint for page descriptors also increases,
> which may offset the benefit of dax by eliminating guest page cache.
> 
> In summary, per-file dax feature tries to achieve a balance between
> performance and memory overhead, by offering a finer gained control for
> dax to users.
> 

I'm not sure if this is adequate for introducing per-file dax feature to
community? Need some feedback from the community.

And if that's the case, I also want to know if setting/clearing S_DAX
inside guest is needed, since in our internal using scenario, setting
S_DAX from host daemon is adequate. If setting/clearing S_DAX inside
guest can be omitted then, the negotiation during FUSE_INIT phase is not
needed either. After all we could completely rely on the FUSE_ATTR_DAX
flag feeded by host daemon to see if dax shall be enabled or not for
corresponding file. The whole patch set will also be somehow simper then.


-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-09-16  8:21         ` JeffleXu
  (?)
@ 2021-09-19 19:45           ` Vivek Goyal
  -1 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-09-19 19:45 UTC (permalink / raw)
  To: JeffleXu
  Cc: Dr. David Alan Gilbert, Miklos Szeredi, virtualization,
	virtio-fs-list, Joseph Qi, linux-fsdevel, Liu Bo

On Thu, Sep 16, 2021 at 04:21:59PM +0800, JeffleXu wrote:
> Hi, I add some performance statistics below.
> 
> 
> On 8/17/21 8:40 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
> >> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>
> >>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> 
> Yes, the performance of dax can be limited when the DAX window is
> limited, where dax window may be contended by multiple files.
> 
> I tested kernel compiling in virtiofs, emulating the scenario where a
> lot of files contending dax window and triggering dax window reclaiming.
> 
> Environment setup:
> - guest vCPU: 16
> - time make vmlinux -j128
> 
> type    | cache  | cache-size | time
> ------- | ------ | ---------- | ----
> non-dax | always |   --       | real 2m48.119s
> dax     | always | 64M        | real 4m49.563s
> dax     | always |   1G       | real 3m14.200s
> dax     | always |   4G       | real 2m41.141s
> 
> 
> It can be seen that there's performance drop, comparing to the normal
> buffered IO, when dax window resource is restricted and dax window
> relcaiming is triggered. The smaller the cache size is, the worse the
> performance is. The performance drop can be alleviated and eliminated as
> cache size increases.
> 
> Though we may not compile kernel in virtiofs, indeed we may access a lot
> of small files in virtiofs and suffer this performance drop.

Hi Jeffle,

If you access lot of big files or a file bigger than dax window, still
you will face performance drop due to reclaim. IOW, if data being
accessed is bigger than dax window, then reclaim will trigger and
performance drop will be observed. So I think its not fair to assciate
performance drop with big for small files as such.

What makes more sense is that memomry usage argument you have used
later in the email. That is, we have a fixed chunk size of 2MB. And
that means we use 512 * 64 = 32K of memory per chunk. So if a file
is smaller than 32K in size, it might be better to just access it
without DAX and incur the cost of page cache in guest instead. Even this
argument also works only if dax window is being utilized fully.

Anyway, I think Miklos already asked you to send patches so that
virtiofs daemon specifies which file to use dax on. So are you
planning to post patches again for that. (And drop patches to
read dax attr from per inode from filesystem in guest).

Thanks
Vivek

> 
> 
> > In that case probaly we should just make DAX window larger. I assume
> 
> Yes, as the DAX window gets larger, it is less likely that we can run
> short of dax window resource.
> 
> However it doesn't come without cost. 'struct page' descriptor for dax
> window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
> page descriptor is of 64 bytes size, assuming 4K sized page). That is,
> every 1GB cache size will cost 16MB guest memory. As the cache size
> increases, the memory footprint for page descriptors also increases,
> which may offset the benefit of dax by eliminating guest page cache.
> 
> In summary, per-file dax feature tries to achieve a balance between
> performance and memory overhead, by offering a finer gained control for
> dax to users.
> 
> 
> > that selecting which files to turn DAX on, will itself will not be
> > a trivial. Not sure what heuristics are being deployed to determine
> > that. Will like to know more about it.
> 
> Currently we enable dax for hot and large blob files, while disabling
> dax for other miscellaneous small files.
> 
> 
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-19 19:45           ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-09-19 19:45 UTC (permalink / raw)
  To: JeffleXu
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo, linux-fsdevel

On Thu, Sep 16, 2021 at 04:21:59PM +0800, JeffleXu wrote:
> Hi, I add some performance statistics below.
> 
> 
> On 8/17/21 8:40 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
> >> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>
> >>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> 
> Yes, the performance of dax can be limited when the DAX window is
> limited, where dax window may be contended by multiple files.
> 
> I tested kernel compiling in virtiofs, emulating the scenario where a
> lot of files contending dax window and triggering dax window reclaiming.
> 
> Environment setup:
> - guest vCPU: 16
> - time make vmlinux -j128
> 
> type    | cache  | cache-size | time
> ------- | ------ | ---------- | ----
> non-dax | always |   --       | real 2m48.119s
> dax     | always | 64M        | real 4m49.563s
> dax     | always |   1G       | real 3m14.200s
> dax     | always |   4G       | real 2m41.141s
> 
> 
> It can be seen that there's performance drop, comparing to the normal
> buffered IO, when dax window resource is restricted and dax window
> relcaiming is triggered. The smaller the cache size is, the worse the
> performance is. The performance drop can be alleviated and eliminated as
> cache size increases.
> 
> Though we may not compile kernel in virtiofs, indeed we may access a lot
> of small files in virtiofs and suffer this performance drop.

Hi Jeffle,

If you access lot of big files or a file bigger than dax window, still
you will face performance drop due to reclaim. IOW, if data being
accessed is bigger than dax window, then reclaim will trigger and
performance drop will be observed. So I think its not fair to assciate
performance drop with big for small files as such.

What makes more sense is that memomry usage argument you have used
later in the email. That is, we have a fixed chunk size of 2MB. And
that means we use 512 * 64 = 32K of memory per chunk. So if a file
is smaller than 32K in size, it might be better to just access it
without DAX and incur the cost of page cache in guest instead. Even this
argument also works only if dax window is being utilized fully.

Anyway, I think Miklos already asked you to send patches so that
virtiofs daemon specifies which file to use dax on. So are you
planning to post patches again for that. (And drop patches to
read dax attr from per inode from filesystem in guest).

Thanks
Vivek

> 
> 
> > In that case probaly we should just make DAX window larger. I assume
> 
> Yes, as the DAX window gets larger, it is less likely that we can run
> short of dax window resource.
> 
> However it doesn't come without cost. 'struct page' descriptor for dax
> window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
> page descriptor is of 64 bytes size, assuming 4K sized page). That is,
> every 1GB cache size will cost 16MB guest memory. As the cache size
> increases, the memory footprint for page descriptors also increases,
> which may offset the benefit of dax by eliminating guest page cache.
> 
> In summary, per-file dax feature tries to achieve a balance between
> performance and memory overhead, by offering a finer gained control for
> dax to users.
> 
> 
> > that selecting which files to turn DAX on, will itself will not be
> > a trivial. Not sure what heuristics are being deployed to determine
> > that. Will like to know more about it.
> 
> Currently we enable dax for hot and large blob files, while disabling
> dax for other miscellaneous small files.
> 
> 
> 
> -- 
> Thanks,
> Jeffle
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-19 19:45           ` Vivek Goyal
  0 siblings, 0 replies; 151+ messages in thread
From: Vivek Goyal @ 2021-09-19 19:45 UTC (permalink / raw)
  To: JeffleXu
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel

On Thu, Sep 16, 2021 at 04:21:59PM +0800, JeffleXu wrote:
> Hi, I add some performance statistics below.
> 
> 
> On 8/17/21 8:40 PM, Vivek Goyal wrote:
> > On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
> >> * Miklos Szeredi (miklos@szeredi.hu) wrote:
> >>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> >>>>
> >>>> This patchset adds support of per-file DAX for virtiofs, which is
> >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
> >>>
> >>> Can you please explain the background of this change in detail?
> >>>
> >>> Why would an admin want to enable DAX for a particular virtiofs file
> >>> and not for others?
> >>
> >> Where we're contending on virtiofs dax cache size it makes a lot of
> >> sense; it's quite expensive for us to map something into the cache
> >> (especially if we push something else out), so selectively DAXing files
> >> that are expected to be hot could help reduce cache churn.
> 
> Yes, the performance of dax can be limited when the DAX window is
> limited, where dax window may be contended by multiple files.
> 
> I tested kernel compiling in virtiofs, emulating the scenario where a
> lot of files contending dax window and triggering dax window reclaiming.
> 
> Environment setup:
> - guest vCPU: 16
> - time make vmlinux -j128
> 
> type    | cache  | cache-size | time
> ------- | ------ | ---------- | ----
> non-dax | always |   --       | real 2m48.119s
> dax     | always | 64M        | real 4m49.563s
> dax     | always |   1G       | real 3m14.200s
> dax     | always |   4G       | real 2m41.141s
> 
> 
> It can be seen that there's performance drop, comparing to the normal
> buffered IO, when dax window resource is restricted and dax window
> relcaiming is triggered. The smaller the cache size is, the worse the
> performance is. The performance drop can be alleviated and eliminated as
> cache size increases.
> 
> Though we may not compile kernel in virtiofs, indeed we may access a lot
> of small files in virtiofs and suffer this performance drop.

Hi Jeffle,

If you access lot of big files or a file bigger than dax window, still
you will face performance drop due to reclaim. IOW, if data being
accessed is bigger than dax window, then reclaim will trigger and
performance drop will be observed. So I think its not fair to assciate
performance drop with big for small files as such.

What makes more sense is that memomry usage argument you have used
later in the email. That is, we have a fixed chunk size of 2MB. And
that means we use 512 * 64 = 32K of memory per chunk. So if a file
is smaller than 32K in size, it might be better to just access it
without DAX and incur the cost of page cache in guest instead. Even this
argument also works only if dax window is being utilized fully.

Anyway, I think Miklos already asked you to send patches so that
virtiofs daemon specifies which file to use dax on. So are you
planning to post patches again for that. (And drop patches to
read dax attr from per inode from filesystem in guest).

Thanks
Vivek

> 
> 
> > In that case probaly we should just make DAX window larger. I assume
> 
> Yes, as the DAX window gets larger, it is less likely that we can run
> short of dax window resource.
> 
> However it doesn't come without cost. 'struct page' descriptor for dax
> window will consume guest memory at a ratio of ~1.5% (64/4096 = ~1.5%,
> page descriptor is of 64 bytes size, assuming 4K sized page). That is,
> every 1GB cache size will cost 16MB guest memory. As the cache size
> increases, the memory footprint for page descriptors also increases,
> which may offset the benefit of dax by eliminating guest page cache.
> 
> In summary, per-file dax feature tries to achieve a balance between
> performance and memory overhead, by offering a finer gained control for
> dax to users.
> 
> 
> > that selecting which files to turn DAX on, will itself will not be
> > a trivial. Not sure what heuristics are being deployed to determine
> > that. Will like to know more about it.
> 
> Currently we enable dax for hot and large blob files, while disabling
> dax for other miscellaneous small files.
> 
> 
> 
> -- 
> Thanks,
> Jeffle
> 


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
  2021-09-19 19:45           ` Vivek Goyal
  (?)
@ 2021-09-22  8:16             ` JeffleXu
  -1 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-22  8:16 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Dr. David Alan Gilbert, Miklos Szeredi, virtualization,
	virtio-fs-list, Joseph Qi, linux-fsdevel, Liu Bo

Thanks for the replying and suggesting. ;)


On 9/20/21 3:45 AM, Vivek Goyal wrote:
> On Thu, Sep 16, 2021 at 04:21:59PM +0800, JeffleXu wrote:
>> Hi, I add some performance statistics below.
>>
>>
>> On 8/17/21 8:40 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>>
>>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>>
>>>>> Can you please explain the background of this change in detail?
>>>>>
>>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>>> and not for others?
>>>>
>>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>>> sense; it's quite expensive for us to map something into the cache
>>>> (especially if we push something else out), so selectively DAXing files
>>>> that are expected to be hot could help reduce cache churn.
>>
>> Yes, the performance of dax can be limited when the DAX window is
>> limited, where dax window may be contended by multiple files.
>>
>> I tested kernel compiling in virtiofs, emulating the scenario where a
>> lot of files contending dax window and triggering dax window reclaiming.
>>
>> Environment setup:
>> - guest vCPU: 16
>> - time make vmlinux -j128
>>
>> type    | cache  | cache-size | time
>> ------- | ------ | ---------- | ----
>> non-dax | always |   --       | real 2m48.119s
>> dax     | always | 64M        | real 4m49.563s
>> dax     | always |   1G       | real 3m14.200s
>> dax     | always |   4G       | real 2m41.141s
>>
>>
>> It can be seen that there's performance drop, comparing to the normal
>> buffered IO, when dax window resource is restricted and dax window
>> relcaiming is triggered. The smaller the cache size is, the worse the
>> performance is. The performance drop can be alleviated and eliminated as
>> cache size increases.
>>
>> Though we may not compile kernel in virtiofs, indeed we may access a lot
>> of small files in virtiofs and suffer this performance drop.
> 
> Hi Jeffle,
> 
> If you access lot of big files or a file bigger than dax window, still
> you will face performance drop due to reclaim. IOW, if data being
> accessed is bigger than dax window, then reclaim will trigger and
> performance drop will be observed. So I think its not fair to assciate
> performance drop with big for small files as such.

Yes, it is. Actually what I mean is that small files (with size smaller
than dax window chunk size) is more likely to consume more dax windows
compared to large files, under the same total file size.


> 
> What makes more sense is that memomry usage argument you have used
> later in the email. That is, we have a fixed chunk size of 2MB. And
> that means we use 512 * 64 = 32K of memory per chunk. So if a file
> is smaller than 32K in size, it might be better to just access it
> without DAX and incur the cost of page cache in guest instead. Even this
> argument also works only if dax window is being utilized fully.

Yes, agreed. In this case, the meaning of per-file dax is that, admin
could control the size of overall dax window under a limited number,
while still sustaining a reasonable performance. But at least, users are
capable of tuning it now.

> 
> Anyway, I think Miklos already asked you to send patches so that
> virtiofs daemon specifies which file to use dax on. So are you
> planning to post patches again for that. (And drop patches to
> read dax attr from per inode from filesystem in guest).

OK. I will send a new version, disabling dax based on the file size on
the host daemon side. Besides, I'm afraid the negotiation phase is also
not needed anymore, since currently the hint whether dax shall be
enabled or not is completely feeded from host daemon, and the guest side
needn't set/clear per inode dax attr now.

-- 
Thanks,
Jeffle

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-22  8:16             ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-22  8:16 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, Dr. David Alan Gilbert, virtualization,
	virtio-fs-list, Joseph Qi, Liu Bo, linux-fsdevel

Thanks for the replying and suggesting. ;)


On 9/20/21 3:45 AM, Vivek Goyal wrote:
> On Thu, Sep 16, 2021 at 04:21:59PM +0800, JeffleXu wrote:
>> Hi, I add some performance statistics below.
>>
>>
>> On 8/17/21 8:40 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>>
>>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>>
>>>>> Can you please explain the background of this change in detail?
>>>>>
>>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>>> and not for others?
>>>>
>>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>>> sense; it's quite expensive for us to map something into the cache
>>>> (especially if we push something else out), so selectively DAXing files
>>>> that are expected to be hot could help reduce cache churn.
>>
>> Yes, the performance of dax can be limited when the DAX window is
>> limited, where dax window may be contended by multiple files.
>>
>> I tested kernel compiling in virtiofs, emulating the scenario where a
>> lot of files contending dax window and triggering dax window reclaiming.
>>
>> Environment setup:
>> - guest vCPU: 16
>> - time make vmlinux -j128
>>
>> type    | cache  | cache-size | time
>> ------- | ------ | ---------- | ----
>> non-dax | always |   --       | real 2m48.119s
>> dax     | always | 64M        | real 4m49.563s
>> dax     | always |   1G       | real 3m14.200s
>> dax     | always |   4G       | real 2m41.141s
>>
>>
>> It can be seen that there's performance drop, comparing to the normal
>> buffered IO, when dax window resource is restricted and dax window
>> relcaiming is triggered. The smaller the cache size is, the worse the
>> performance is. The performance drop can be alleviated and eliminated as
>> cache size increases.
>>
>> Though we may not compile kernel in virtiofs, indeed we may access a lot
>> of small files in virtiofs and suffer this performance drop.
> 
> Hi Jeffle,
> 
> If you access lot of big files or a file bigger than dax window, still
> you will face performance drop due to reclaim. IOW, if data being
> accessed is bigger than dax window, then reclaim will trigger and
> performance drop will be observed. So I think its not fair to assciate
> performance drop with big for small files as such.

Yes, it is. Actually what I mean is that small files (with size smaller
than dax window chunk size) is more likely to consume more dax windows
compared to large files, under the same total file size.


> 
> What makes more sense is that memomry usage argument you have used
> later in the email. That is, we have a fixed chunk size of 2MB. And
> that means we use 512 * 64 = 32K of memory per chunk. So if a file
> is smaller than 32K in size, it might be better to just access it
> without DAX and incur the cost of page cache in guest instead. Even this
> argument also works only if dax window is being utilized fully.

Yes, agreed. In this case, the meaning of per-file dax is that, admin
could control the size of overall dax window under a limited number,
while still sustaining a reasonable performance. But at least, users are
capable of tuning it now.

> 
> Anyway, I think Miklos already asked you to send patches so that
> virtiofs daemon specifies which file to use dax on. So are you
> planning to post patches again for that. (And drop patches to
> read dax attr from per inode from filesystem in guest).

OK. I will send a new version, disabling dax based on the file size on
the host daemon side. Besides, I'm afraid the negotiation phase is also
not needed anymore, since currently the hint whether dax shall be
enabled or not is completely feeded from host daemon, and the guest side
needn't set/clear per inode dax attr now.

-- 
Thanks,
Jeffle
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/8] fuse,virtiofs: support per-file DAX
@ 2021-09-22  8:16             ` JeffleXu
  0 siblings, 0 replies; 151+ messages in thread
From: JeffleXu @ 2021-09-22  8:16 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Miklos Szeredi, virtualization, virtio-fs-list, Joseph Qi, linux-fsdevel

Thanks for the replying and suggesting. ;)


On 9/20/21 3:45 AM, Vivek Goyal wrote:
> On Thu, Sep 16, 2021 at 04:21:59PM +0800, JeffleXu wrote:
>> Hi, I add some performance statistics below.
>>
>>
>> On 8/17/21 8:40 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:32:14AM +0100, Dr. David Alan Gilbert wrote:
>>>> * Miklos Szeredi (miklos@szeredi.hu) wrote:
>>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
>>>>>>
>>>>>> This patchset adds support of per-file DAX for virtiofs, which is
>>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>>
>>>>> Can you please explain the background of this change in detail?
>>>>>
>>>>> Why would an admin want to enable DAX for a particular virtiofs file
>>>>> and not for others?
>>>>
>>>> Where we're contending on virtiofs dax cache size it makes a lot of
>>>> sense; it's quite expensive for us to map something into the cache
>>>> (especially if we push something else out), so selectively DAXing files
>>>> that are expected to be hot could help reduce cache churn.
>>
>> Yes, the performance of dax can be limited when the DAX window is
>> limited, where dax window may be contended by multiple files.
>>
>> I tested kernel compiling in virtiofs, emulating the scenario where a
>> lot of files contending dax window and triggering dax window reclaiming.
>>
>> Environment setup:
>> - guest vCPU: 16
>> - time make vmlinux -j128
>>
>> type    | cache  | cache-size | time
>> ------- | ------ | ---------- | ----
>> non-dax | always |   --       | real 2m48.119s
>> dax     | always | 64M        | real 4m49.563s
>> dax     | always |   1G       | real 3m14.200s
>> dax     | always |   4G       | real 2m41.141s
>>
>>
>> It can be seen that there's performance drop, comparing to the normal
>> buffered IO, when dax window resource is restricted and dax window
>> relcaiming is triggered. The smaller the cache size is, the worse the
>> performance is. The performance drop can be alleviated and eliminated as
>> cache size increases.
>>
>> Though we may not compile kernel in virtiofs, indeed we may access a lot
>> of small files in virtiofs and suffer this performance drop.
> 
> Hi Jeffle,
> 
> If you access lot of big files or a file bigger than dax window, still
> you will face performance drop due to reclaim. IOW, if data being
> accessed is bigger than dax window, then reclaim will trigger and
> performance drop will be observed. So I think its not fair to assciate
> performance drop with big for small files as such.

Yes, it is. Actually what I mean is that small files (with size smaller
than dax window chunk size) is more likely to consume more dax windows
compared to large files, under the same total file size.


> 
> What makes more sense is that memomry usage argument you have used
> later in the email. That is, we have a fixed chunk size of 2MB. And
> that means we use 512 * 64 = 32K of memory per chunk. So if a file
> is smaller than 32K in size, it might be better to just access it
> without DAX and incur the cost of page cache in guest instead. Even this
> argument also works only if dax window is being utilized fully.

Yes, agreed. In this case, the meaning of per-file dax is that, admin
could control the size of overall dax window under a limited number,
while still sustaining a reasonable performance. But at least, users are
capable of tuning it now.

> 
> Anyway, I think Miklos already asked you to send patches so that
> virtiofs daemon specifies which file to use dax on. So are you
> planning to post patches again for that. (And drop patches to
> read dax attr from per inode from filesystem in guest).

OK. I will send a new version, disabling dax based on the file size on
the host daemon side. Besides, I'm afraid the negotiation phase is also
not needed anymore, since currently the hint whether dax shall be
enabled or not is completely feeded from host daemon, and the guest side
needn't set/clear per inode dax attr now.

-- 
Thanks,
Jeffle


^ permalink raw reply	[flat|nested] 151+ messages in thread

end of thread, other threads:[~2021-09-22  8:16 UTC | newest]

Thread overview: 151+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17  2:22 [PATCH v4 0/8] fuse,virtiofs: support per-file DAX Jeffle Xu
2021-08-17  2:22 ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22 ` Jeffle Xu
2021-08-17  2:22 ` [PATCH v4 1/8] fuse: add fuse_should_enable_dax() helper Jeffle Xu
2021-08-17  2:22   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22   ` Jeffle Xu
2021-08-17  2:22 ` [PATCH v4 2/8] fuse: Make DAX mount option a tri-state Jeffle Xu
2021-08-17  2:22   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22   ` Jeffle Xu
2021-08-17  2:22 ` [PATCH v4 3/8] fuse: support per-file DAX Jeffle Xu
2021-08-17  2:22   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22   ` Jeffle Xu
2021-08-17  2:22 ` [PATCH v4 4/8] fuse: negotiate if server/client supports " Jeffle Xu
2021-08-17  2:22   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22   ` Jeffle Xu
2021-08-17  2:22 ` [PATCH v4 5/8] fuse: enable " Jeffle Xu
2021-08-17  2:22   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22   ` Jeffle Xu
2021-08-17  2:22 ` [PATCH v4 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes Jeffle Xu
2021-08-17  2:22   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22   ` Jeffle Xu
2021-08-17 10:26   ` [Virtio-fs] " Dr. David Alan Gilbert
2021-08-17 10:26     ` Dr. David Alan Gilbert
2021-08-17 10:26     ` Dr. David Alan Gilbert
2021-08-17 13:23     ` JeffleXu
2021-08-17 13:23       ` JeffleXu
2021-08-17 13:23       ` JeffleXu
2021-08-17  2:22 ` [PATCH v4 7/8] fuse: support changing per-file DAX flag inside guest Jeffle Xu
2021-08-17  2:22   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22   ` Jeffle Xu
2021-08-17  2:22 ` [PATCH v4 8/8] fuse: show '-o dax=inode' option only when FUSE server supports Jeffle Xu
2021-08-17  2:22   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:22   ` Jeffle Xu
2021-08-17  2:23 ` [virtiofsd PATCH v4 0/4] virtiofsd: support per-file DAX Jeffle Xu
2021-08-17  2:23   ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:23   ` Jeffle Xu
2021-08-17  2:23   ` [virtiofsd PATCH v4 1/4] virtiofsd: add .ioctl() support Jeffle Xu
2021-08-17  2:23     ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:23     ` Jeffle Xu
2021-08-18 17:33     ` Vivek Goyal
2021-08-18 17:33       ` [Virtio-fs] " Vivek Goyal
2021-08-18 17:33       ` Vivek Goyal
2021-08-17  2:23   ` [virtiofsd PATCH v4 2/4] virtiofsd: expand fuse protocol to support per-file DAX Jeffle Xu
2021-08-17  2:23     ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:23     ` Jeffle Xu
2021-08-17  2:23   ` [virtiofsd PATCH v4 3/4] virtiofsd: support per-file DAX negotiation in FUSE_INIT Jeffle Xu
2021-08-17  2:23     ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:23     ` Jeffle Xu
2021-08-17 17:15     ` [Virtio-fs] " Dr. David Alan Gilbert
2021-08-17 17:15       ` Dr. David Alan Gilbert
2021-08-17 17:15       ` Dr. David Alan Gilbert
2021-08-18  5:28       ` JeffleXu
2021-08-18  5:28         ` JeffleXu
2021-08-18  5:28         ` JeffleXu
2021-08-19 13:57         ` Dr. David Alan Gilbert
2021-08-19 13:57           ` Dr. David Alan Gilbert
2021-08-19 13:57           ` Dr. David Alan Gilbert
2021-08-18 17:30       ` Vivek Goyal
2021-08-18 17:30         ` Vivek Goyal
2021-08-18 17:30         ` Vivek Goyal
2021-08-17  2:23   ` [virtiofsd PATCH v4 4/4] virtiofsd: support per-file DAX in FUSE_LOOKUP Jeffle Xu
2021-08-17  2:23     ` [Virtio-fs] " Jeffle Xu
2021-08-17  2:23     ` Jeffle Xu
2021-08-17 19:00     ` [Virtio-fs] " Dr. David Alan Gilbert
2021-08-17 19:00       ` Dr. David Alan Gilbert
2021-08-17 19:00       ` Dr. David Alan Gilbert
2021-08-18  5:46       ` JeffleXu
2021-08-18  5:46         ` JeffleXu
2021-08-18  5:46         ` JeffleXu
2021-08-19 13:08         ` Dr. David Alan Gilbert
2021-08-19 13:08           ` Dr. David Alan Gilbert
2021-08-19 13:08           ` Dr. David Alan Gilbert
2021-08-20  5:03           ` JeffleXu
2021-08-20  5:03             ` JeffleXu
2021-08-20  5:03             ` JeffleXu
2021-08-24 10:15             ` Greg Kurz
2021-08-24 10:15               ` Greg Kurz
2021-08-24 10:15               ` Greg Kurz
2021-09-08 10:34               ` JeffleXu
2021-09-08 10:34                 ` JeffleXu
2021-09-08 10:34                 ` JeffleXu
2021-08-17  8:06 ` [PATCH v4 0/8] fuse,virtiofs: support per-file DAX Miklos Szeredi
2021-08-17  8:06   ` [Virtio-fs] " Miklos Szeredi
2021-08-17  9:32   ` Dr. David Alan Gilbert
2021-08-17  9:32     ` Dr. David Alan Gilbert
2021-08-17  9:32     ` Dr. David Alan Gilbert
2021-08-17 10:09     ` Miklos Szeredi
2021-08-17 10:09       ` Miklos Szeredi
2021-08-17 10:37       ` Dr. David Alan Gilbert
2021-08-17 10:37         ` Dr. David Alan Gilbert
2021-08-17 10:37         ` Dr. David Alan Gilbert
2021-08-17 13:08       ` JeffleXu
2021-08-17 13:08         ` JeffleXu
2021-08-17 13:08         ` JeffleXu
2021-08-17 14:11         ` Miklos Szeredi
2021-08-17 14:11           ` Miklos Szeredi
2021-08-17 15:19           ` Vivek Goyal
2021-08-17 15:19             ` Vivek Goyal
2021-08-17 15:19             ` Vivek Goyal
2021-08-17 14:54         ` Vivek Goyal
2021-08-17 14:54           ` Vivek Goyal
2021-08-17 14:54           ` Vivek Goyal
2021-08-18  5:10           ` JeffleXu
2021-08-18  5:10             ` JeffleXu
2021-08-18  5:10             ` JeffleXu
2021-08-19  6:14           ` JeffleXu
2021-08-19  6:14             ` JeffleXu
2021-08-19  6:14             ` JeffleXu
2021-08-17 12:40     ` Vivek Goyal
2021-08-17 12:40       ` Vivek Goyal
2021-08-17 12:40       ` Vivek Goyal
2021-09-16  8:21       ` JeffleXu
2021-09-16  8:21         ` JeffleXu
2021-09-16  8:21         ` JeffleXu
2021-09-18  3:06         ` JeffleXu
2021-09-18  3:06           ` JeffleXu
2021-09-18  3:06           ` JeffleXu
2021-09-19 19:45         ` Vivek Goyal
2021-09-19 19:45           ` Vivek Goyal
2021-09-19 19:45           ` Vivek Goyal
2021-09-22  8:16           ` JeffleXu
2021-09-22  8:16             ` JeffleXu
2021-09-22  8:16             ` JeffleXu
2021-08-17 12:39   ` Vivek Goyal
2021-08-17 12:39     ` [Virtio-fs] " Vivek Goyal
2021-08-17 12:39     ` Vivek Goyal
2021-08-17 13:22     ` JeffleXu
2021-08-17 13:22       ` [Virtio-fs] " JeffleXu
2021-08-17 13:22       ` JeffleXu
2021-08-17 14:08       ` Miklos Szeredi
2021-08-17 14:08         ` [Virtio-fs] " Miklos Szeredi
2021-08-18  3:39         ` JeffleXu
2021-08-18  3:39           ` [Virtio-fs] " JeffleXu
2021-08-18  3:39           ` JeffleXu
2021-08-18  5:08           ` Miklos Szeredi
2021-08-18  5:08             ` [Virtio-fs] " Miklos Szeredi
2021-08-18 16:58             ` Vivek Goyal
2021-08-18 16:58               ` [Virtio-fs] " Vivek Goyal
2021-08-18 16:58               ` Vivek Goyal
2021-09-03  5:30         ` JeffleXu
2021-09-03  5:30           ` [Virtio-fs] " JeffleXu
2021-09-03  5:30           ` JeffleXu
2021-09-07 14:51           ` Miklos Szeredi
2021-09-07 14:51             ` [Virtio-fs] " Miklos Szeredi
2021-08-17 14:57       ` Vivek Goyal
2021-08-17 14:57         ` [Virtio-fs] " Vivek Goyal
2021-08-17 14:57         ` Vivek Goyal
2021-08-18  5:20         ` JeffleXu
2021-08-18  5:20           ` [Virtio-fs] " JeffleXu
2021-08-18  5:20           ` JeffleXu
2021-08-30 23:31         ` [Virtio-fs] " Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.