linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio
@ 2024-02-28 14:41 Hou Tao
  2024-02-28 14:41 ` [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages Hou Tao
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Hou Tao @ 2024-02-28 14:41 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

From: Hou Tao <houtao1@huawei.com>

Hi,

The patch set aims to fix the warning related to an abnormal size
parameter of kmalloc() in virtiofs. The warning occurred when attempting
to insert a 10MB sized kernel module kept in a virtiofs with cache
disabled. As analyzed in patch #1, the root cause is that the length of
the read buffer is no limited, and the read buffer is passed directly to
virtiofs through out_args[0].value. Therefore patch #1 limits the
length of the read buffer passed to virtiofs by using max_pages. However
it is not enough, because now the maximal value of max_pages is 256.
Consequently, when reading a 10MB-sized kernel module, the length of the
bounce buffer in virtiofs will be 40 + (256 * 4096), and kmalloc will
try to allocate 2MB from memory subsystem. The request for 2MB of
physically contiguous memory significantly stress the memory subsystem
and may fail indefinitely on hosts with fragmented memory. To address
this, patch #2~#5 use scattered pages in a bio_vec to replace the
kmalloc-allocated bounce buffer when the length of the bounce buffer for
KVEC_ITER dio is larger than PAGE_SIZE. The final issue with the
allocation of the bounce buffer and sg array in virtiofs is that
GFP_ATOMIC is used even when the allocation occurs in a kworker context.
Therefore the last patch uses GFP_NOFS for the allocation of both sg
array and bounce buffer when initiated by the kworker. For more details,
please check the individual patches.

As usual, comments are always welcome.

Change Log:

v2:
  * limit the length of ITER_KVEC dio by max_pages instead of the
    newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
    dio being consistent with other rw operations.
  * replace kmalloc-allocated bounce buffer by using a bounce buffer
    backed by scattered pages when the length of the bounce buffer for
    KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
    fragmented memory, the KVEC_ITER dio can be handled normally by
    virtiofs. (Bernd Schubert)
  * merge the GFP_NOFS patch [1] into this patch-set and use
    memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
    (Benjamin Coddington)

v1: https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-houtao@huaweicloud.com/

[1]: https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-houtao@huaweicloud.com/

Hou Tao (6):
  fuse: limit the length of ITER_KVEC dio by max_pages
  virtiofs: move alloc/free of argbuf into separated helpers
  virtiofs: factor out more common methods for argbuf
  virtiofs: support bounce buffer backed by scattered pages
  virtiofs: use scattered bounce buffer for ITER_KVEC dio
  virtiofs: use GFP_NOFS when enqueuing request through kworker

 fs/fuse/file.c      |  12 +-
 fs/fuse/virtio_fs.c | 336 +++++++++++++++++++++++++++++++++++++-------
 2 files changed, 296 insertions(+), 52 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages
  2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
@ 2024-02-28 14:41 ` Hou Tao
  2024-03-01 13:42   ` Miklos Szeredi
  2024-02-28 14:41 ` [PATCH v2 2/6] virtiofs: move alloc/free of argbuf into separated helpers Hou Tao
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Hou Tao @ 2024-02-28 14:41 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

From: Hou Tao <houtao1@huawei.com>

When trying to insert a 10MB kernel module kept in a virtio-fs with cache
disabled, the following warning was reported:

  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 439 at mm/page_alloc.c:4544 ......
  Modules linked in:
  CPU: 2 PID: 439 Comm: insmod Not tainted 6.7.0-rc7+ #33
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ......
  RIP: 0010:__alloc_pages+0x2c4/0x360
  ......
  Call Trace:
   <TASK>
   ? __warn+0x8f/0x150
   ? __alloc_pages+0x2c4/0x360
   __kmalloc_large_node+0x86/0x160
   __kmalloc+0xcd/0x140
   virtio_fs_enqueue_req+0x240/0x6d0
   virtio_fs_wake_pending_and_unlock+0x7f/0x190
   queue_request_and_unlock+0x58/0x70
   fuse_simple_request+0x18b/0x2e0
   fuse_direct_io+0x58a/0x850
   fuse_file_read_iter+0xdb/0x130
   __kernel_read+0xf3/0x260
   kernel_read+0x45/0x60
   kernel_read_file+0x1ad/0x2b0
   init_module_from_file+0x6a/0xe0
   idempotent_init_module+0x179/0x230
   __x64_sys_finit_module+0x5d/0xb0
   do_syscall_64+0x36/0xb0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
   ......
   </TASK>
  ---[ end trace 0000000000000000 ]---

The warning is triggered when:

1) inserting a 10MB sized kernel module kept in a virtiofs.
syscall finit_module() will handle the module insertion and it will
invoke kernel_read_file() to read the content of the module first.

2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and
passes it to kernel_read(). kernel_read() constructs a kvec iter by
using iov_iter_kvec() and passes it to fuse_file_read_iter().

3) virtio-fs disables the cache, so fuse_file_read_iter() invokes
fuse_direct_io(). As for now, the maximal read size for kvec iter is
only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so
fuse_direct_io() doesn't split the 10MB buffer. It saves the address and
the size of the 10MB-sized buffer in out_args[0] of a fuse request and
passes the fuse request to virtio_fs_wake_pending_and_unlock().

4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to
queue the request. Because the arguments in fuse request may be kept in
stack, so virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce
buffer for all fuse args, copies these args into the bounce buffer and
passed the physical address of the bounce buffer to virtiofsd. The total
length of these fuse args for the passed fuse request is about 10MB, so
copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter
and it triggers the warning in __alloc_pages():

	if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
		return NULL;

5) virtio_fs_enqueue_req() will retry the memory allocation in a
kworker, but it won't help, because kmalloc() will always return NULL
due to the abnormal size and finit_module() will hang forever.

A feasible solution is to limit the value of max_read for virtio-fs, so
the length passed to kmalloc() will be limited. However it will affect
the maximal read size for normal fuse read. And for virtio-fs write
initiated from kernel, it has the similar problem and now there is no
way to limit fc->max_write in kernel.

So instead of limiting both the values of max_read and max_write in
kernel, capping the maximal length of kvec iter IO by using max_pages in
fuse_direct_io() just like it does for ubuf/iovec iter IO. Now the max
value for max_pages is 256, so on host with 4KB page size, the maximal
size passed to kmalloc() in copy_args_to_argbuf() is about 1MB+40B. The
allocation of 2MB of physically contiguous memory will still incur
significant stress on the memory subsystem, but the warning is fixed.
Additionally, the requirement for huge physically contiguous memory will
be removed in the following patch.

Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/fuse/file.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 148a71b8b4d0e..f90ea25e366f0 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1423,6 +1423,16 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
 	return ret < 0 ? ret : 0;
 }
 
+static size_t fuse_max_dio_rw_size(const struct fuse_conn *fc,
+				   const struct iov_iter *iter, int write)
+{
+	unsigned int nmax = write ? fc->max_write : fc->max_read;
+
+	if (iov_iter_is_kvec(iter))
+		nmax = min(nmax, fc->max_pages << PAGE_SHIFT);
+	return nmax;
+}
+
 ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
 		       loff_t *ppos, int flags)
 {
@@ -1433,7 +1443,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
 	struct inode *inode = mapping->host;
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = ff->fm->fc;
-	size_t nmax = write ? fc->max_write : fc->max_read;
+	size_t nmax = fuse_max_dio_rw_size(fc, iter, write);
 	loff_t pos = *ppos;
 	size_t count = iov_iter_count(iter);
 	pgoff_t idx_from = pos >> PAGE_SHIFT;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/6] virtiofs: move alloc/free of argbuf into separated helpers
  2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
  2024-02-28 14:41 ` [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages Hou Tao
@ 2024-02-28 14:41 ` Hou Tao
  2024-02-28 14:41 ` [PATCH v2 3/6] virtiofs: factor out more common methods for argbuf Hou Tao
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Hou Tao @ 2024-02-28 14:41 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

From: Hou Tao <houtao1@huawei.com>

The bounce buffer for fuse args in virtiofs will be extended to support
scatterd pages later. Therefore, move the allocation and the free of
argbuf out of the copy procedures and factor them into
virtio_fs_argbuf_{new|free}() helpers.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/fuse/virtio_fs.c | 52 +++++++++++++++++++++++++++------------------
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 5f1be1da92ce9..cd1330506daba 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -404,6 +404,24 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work)
 	}
 }
 
+static void virtio_fs_argbuf_free(void *argbuf)
+{
+	kfree(argbuf);
+}
+
+static void *virtio_fs_argbuf_new(struct fuse_args *args, gfp_t gfp)
+{
+	unsigned int numargs;
+	unsigned int len;
+
+	numargs = args->in_numargs - args->in_pages;
+	len = fuse_len_args(numargs, (struct fuse_arg *) args->in_args);
+	numargs = args->out_numargs - args->out_pages;
+	len += fuse_len_args(numargs, args->out_args);
+
+	return kmalloc(len, gfp);
+}
+
 /*
  * Returns 1 if queue is full and sender should wait a bit before sending
  * next request, 0 otherwise.
@@ -487,36 +505,24 @@ static void virtio_fs_hiprio_dispatch_work(struct work_struct *work)
 	}
 }
 
-/* Allocate and copy args into req->argbuf */
-static int copy_args_to_argbuf(struct fuse_req *req)
+/* Copy args into req->argbuf */
+static void copy_args_to_argbuf(struct fuse_req *req)
 {
 	struct fuse_args *args = req->args;
 	unsigned int offset = 0;
 	unsigned int num_in;
-	unsigned int num_out;
-	unsigned int len;
 	unsigned int i;
 
 	num_in = args->in_numargs - args->in_pages;
-	num_out = args->out_numargs - args->out_pages;
-	len = fuse_len_args(num_in, (struct fuse_arg *) args->in_args) +
-	      fuse_len_args(num_out, args->out_args);
-
-	req->argbuf = kmalloc(len, GFP_ATOMIC);
-	if (!req->argbuf)
-		return -ENOMEM;
-
 	for (i = 0; i < num_in; i++) {
 		memcpy(req->argbuf + offset,
 		       args->in_args[i].value,
 		       args->in_args[i].size);
 		offset += args->in_args[i].size;
 	}
-
-	return 0;
 }
 
-/* Copy args out of and free req->argbuf */
+/* Copy args out of req->argbuf */
 static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req)
 {
 	unsigned int remaining;
@@ -549,9 +555,6 @@ static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req)
 	/* Store the actual size of the variable-length arg */
 	if (args->out_argvar)
 		args->out_args[args->out_numargs - 1].size = remaining;
-
-	kfree(req->argbuf);
-	req->argbuf = NULL;
 }
 
 /* Work function for request completion */
@@ -571,6 +574,9 @@ static void virtio_fs_request_complete(struct fuse_req *req,
 	args = req->args;
 	copy_args_from_argbuf(args, req);
 
+	virtio_fs_argbuf_free(req->argbuf);
+	req->argbuf = NULL;
+
 	if (args->out_pages && args->page_zeroing) {
 		len = args->out_args[args->out_numargs - 1].size;
 		ap = container_of(args, typeof(*ap), args);
@@ -1149,9 +1155,13 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	}
 
 	/* Use a bounce buffer since stack args cannot be mapped */
-	ret = copy_args_to_argbuf(req);
-	if (ret < 0)
+	req->argbuf = virtio_fs_argbuf_new(args, GFP_ATOMIC);
+	if (!req->argbuf) {
+		ret = -ENOMEM;
 		goto out;
+	}
+
+	copy_args_to_argbuf(req);
 
 	/* Request elements */
 	sg_init_one(&sg[out_sgs++], &req->in.h, sizeof(req->in.h));
@@ -1210,7 +1220,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 
 out:
 	if (ret < 0 && req->argbuf) {
-		kfree(req->argbuf);
+		virtio_fs_argbuf_free(req->argbuf);
 		req->argbuf = NULL;
 	}
 	if (sgs != stack_sgs) {
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/6] virtiofs: factor out more common methods for argbuf
  2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
  2024-02-28 14:41 ` [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages Hou Tao
  2024-02-28 14:41 ` [PATCH v2 2/6] virtiofs: move alloc/free of argbuf into separated helpers Hou Tao
@ 2024-02-28 14:41 ` Hou Tao
  2024-03-01 14:24   ` Miklos Szeredi
  2024-02-28 14:41 ` [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages Hou Tao
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Hou Tao @ 2024-02-28 14:41 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

From: Hou Tao <houtao1@huawei.com>

Factor out more common methods for bounce buffer of fuse args:

1) virtio_fs_argbuf_setup_sg: set-up sgs for bounce buffer
2) virtio_fs_argbuf_copy_from_in_arg: copy each in-arg to bounce buffer
3) virtio_fs_argbuf_out_args_offset: calc the start offset of out-arg
4) virtio_fs_argbuf_copy_to_out_arg: copy bounce buffer to each out-arg

These methods will be used to implement bounce buffer backed by
scattered pages which are allocated separatedly.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/fuse/virtio_fs.c | 77 +++++++++++++++++++++++++++++++++++----------
 1 file changed, 60 insertions(+), 17 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index cd1330506daba..f10fff7f23a0f 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -86,6 +86,10 @@ struct virtio_fs_req_work {
 	struct work_struct done_work;
 };
 
+struct virtio_fs_argbuf {
+	DECLARE_FLEX_ARRAY(u8, buf);
+};
+
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
@@ -404,13 +408,15 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work)
 	}
 }
 
-static void virtio_fs_argbuf_free(void *argbuf)
+static void virtio_fs_argbuf_free(struct virtio_fs_argbuf *argbuf)
 {
 	kfree(argbuf);
 }
 
-static void *virtio_fs_argbuf_new(struct fuse_args *args, gfp_t gfp)
+static struct virtio_fs_argbuf *virtio_fs_argbuf_new(struct fuse_args *args,
+						     gfp_t gfp)
 {
+	struct virtio_fs_argbuf *argbuf;
 	unsigned int numargs;
 	unsigned int len;
 
@@ -419,7 +425,41 @@ static void *virtio_fs_argbuf_new(struct fuse_args *args, gfp_t gfp)
 	numargs = args->out_numargs - args->out_pages;
 	len += fuse_len_args(numargs, args->out_args);
 
-	return kmalloc(len, gfp);
+	argbuf = kmalloc(struct_size(argbuf, buf, len), gfp);
+
+	return argbuf;
+}
+
+static unsigned int virtio_fs_argbuf_setup_sg(struct virtio_fs_argbuf *argbuf,
+					      unsigned int offset,
+					      unsigned int len,
+					      struct scatterlist *sg)
+{
+	sg_init_one(sg, argbuf->buf + offset, len);
+	return 1;
+}
+
+static void virtio_fs_argbuf_copy_from_in_arg(struct virtio_fs_argbuf *argbuf,
+					      unsigned int offset,
+					      const void *src, unsigned int len)
+{
+	memcpy(argbuf->buf + offset, src, len);
+}
+
+static unsigned int
+virtio_fs_argbuf_out_args_offset(struct virtio_fs_argbuf *argbuf,
+				 const struct fuse_args *args)
+{
+	unsigned int num_in = args->in_numargs - args->in_pages;
+
+	return fuse_len_args(num_in, (struct fuse_arg *)args->in_args);
+}
+
+static void virtio_fs_argbuf_copy_to_out_arg(struct virtio_fs_argbuf *argbuf,
+					     unsigned int offset, void *dst,
+					     unsigned int len)
+{
+	memcpy(dst, argbuf->buf + offset, len);
 }
 
 /*
@@ -515,9 +555,9 @@ static void copy_args_to_argbuf(struct fuse_req *req)
 
 	num_in = args->in_numargs - args->in_pages;
 	for (i = 0; i < num_in; i++) {
-		memcpy(req->argbuf + offset,
-		       args->in_args[i].value,
-		       args->in_args[i].size);
+		virtio_fs_argbuf_copy_from_in_arg(req->argbuf, offset,
+						  args->in_args[i].value,
+						  args->in_args[i].size);
 		offset += args->in_args[i].size;
 	}
 }
@@ -525,17 +565,19 @@ static void copy_args_to_argbuf(struct fuse_req *req)
 /* Copy args out of req->argbuf */
 static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req)
 {
+	struct virtio_fs_argbuf *argbuf;
 	unsigned int remaining;
 	unsigned int offset;
-	unsigned int num_in;
 	unsigned int num_out;
 	unsigned int i;
 
 	remaining = req->out.h.len - sizeof(req->out.h);
-	num_in = args->in_numargs - args->in_pages;
 	num_out = args->out_numargs - args->out_pages;
-	offset = fuse_len_args(num_in, (struct fuse_arg *)args->in_args);
+	if (!num_out)
+		goto out;
 
+	argbuf = req->argbuf;
+	offset = virtio_fs_argbuf_out_args_offset(argbuf, args);
 	for (i = 0; i < num_out; i++) {
 		unsigned int argsize = args->out_args[i].size;
 
@@ -545,13 +587,16 @@ static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req)
 			argsize = remaining;
 		}
 
-		memcpy(args->out_args[i].value, req->argbuf + offset, argsize);
+		virtio_fs_argbuf_copy_to_out_arg(argbuf, offset,
+						 args->out_args[i].value,
+						 argsize);
 		offset += argsize;
 
 		if (i != args->out_numargs - 1)
 			remaining -= argsize;
 	}
 
+out:
 	/* Store the actual size of the variable-length arg */
 	if (args->out_argvar)
 		args->out_args[args->out_numargs - 1].size = remaining;
@@ -1100,7 +1145,6 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg,
 				      struct fuse_arg *args,
 				      unsigned int numargs,
 				      bool argpages,
-				      void *argbuf,
 				      unsigned int *len_used)
 {
 	struct fuse_args_pages *ap = container_of(req->args, typeof(*ap), args);
@@ -1109,7 +1153,8 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg,
 
 	len = fuse_len_args(numargs - argpages, args);
 	if (len)
-		sg_init_one(&sg[total_sgs++], argbuf, len);
+		total_sgs += virtio_fs_argbuf_setup_sg(req->argbuf, *len_used,
+						       len, &sg[total_sgs]);
 
 	if (argpages)
 		total_sgs += sg_init_fuse_pages(&sg[total_sgs],
@@ -1117,8 +1162,7 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg,
 						ap->num_pages,
 						args[numargs - 1].size);
 
-	if (len_used)
-		*len_used = len;
+	*len_used = len;
 
 	return total_sgs;
 }
@@ -1168,7 +1212,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	out_sgs += sg_init_fuse_args(&sg[out_sgs], req,
 				     (struct fuse_arg *)args->in_args,
 				     args->in_numargs, args->in_pages,
-				     req->argbuf, &argbuf_used);
+				     &argbuf_used);
 
 	/* Reply elements */
 	if (test_bit(FR_ISREPLY, &req->flags)) {
@@ -1176,8 +1220,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 			    &req->out.h, sizeof(req->out.h));
 		in_sgs += sg_init_fuse_args(&sg[out_sgs + in_sgs], req,
 					    args->out_args, args->out_numargs,
-					    args->out_pages,
-					    req->argbuf + argbuf_used, NULL);
+					    args->out_pages, &argbuf_used);
 	}
 
 	WARN_ON(out_sgs + in_sgs != total_sgs);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages
  2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
                   ` (2 preceding siblings ...)
  2024-02-28 14:41 ` [PATCH v2 3/6] virtiofs: factor out more common methods for argbuf Hou Tao
@ 2024-02-28 14:41 ` Hou Tao
  2024-02-29 15:01   ` Brian Foster
  2024-02-28 14:41 ` [PATCH v2 5/6] virtiofs: use scattered bounce buffer for ITER_KVEC dio Hou Tao
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Hou Tao @ 2024-02-28 14:41 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

From: Hou Tao <houtao1@huawei.com>

When reading a file kept in virtiofs from kernel (e.g., insmod a kernel
module), if the cache of virtiofs is disabled, the read buffer will be
passed to virtiofs through out_args[0].value instead of pages. Because
virtiofs can't get the pages for the read buffer, virtio_fs_argbuf_new()
will create a bounce buffer for the read buffer by using kmalloc() and
copy the read buffer into bounce buffer. If the read buffer is large
(e.g., 1MB), the allocation will incur significant stress on the memory
subsystem.

So instead of allocating bounce buffer by using kmalloc(), allocate a
bounce buffer which is backed by scattered pages. The original idea is
to use vmap(), but the use of GFP_ATOMIC is no possible for vmap(). To
simplify the copy operations in the bounce buffer, use a bio_vec flex
array to represent the argbuf. Also add an is_flat field in struct
virtio_fs_argbuf to distinguish between kmalloc-ed and scattered bounce
buffer.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/fuse/virtio_fs.c | 163 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 149 insertions(+), 14 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index f10fff7f23a0f..ffea684bd100d 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -86,10 +86,27 @@ struct virtio_fs_req_work {
 	struct work_struct done_work;
 };
 
-struct virtio_fs_argbuf {
+struct virtio_fs_flat_argbuf {
 	DECLARE_FLEX_ARRAY(u8, buf);
 };
 
+struct virtio_fs_scattered_argbuf {
+	unsigned int size;
+	unsigned int nr;
+	DECLARE_FLEX_ARRAY(struct bio_vec, bvec);
+};
+
+struct virtio_fs_argbuf {
+	bool is_flat;
+	/* There is flexible array in the end of these two struct
+	 * definitions, so they must be the last field.
+	 */
+	union {
+		struct virtio_fs_flat_argbuf f;
+		struct virtio_fs_scattered_argbuf s;
+	};
+};
+
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight);
 
@@ -408,42 +425,143 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work)
 	}
 }
 
+static unsigned int virtio_fs_argbuf_len(unsigned int in_args_len,
+					 unsigned int out_args_len,
+					 bool is_flat)
+{
+	if (is_flat)
+		return in_args_len + out_args_len;
+
+	/*
+	 * Align in_args_len with PAGE_SIZE to reduce the total number of
+	 * sg entries when the value of out_args_len (e.g., the length of
+	 * read buffer) is page-aligned.
+	 */
+	return round_up(in_args_len, PAGE_SIZE) +
+	       round_up(out_args_len, PAGE_SIZE);
+}
+
 static void virtio_fs_argbuf_free(struct virtio_fs_argbuf *argbuf)
 {
+	unsigned int i;
+
+	if (!argbuf)
+		return;
+
+	if (argbuf->is_flat)
+		goto free_argbuf;
+
+	for (i = 0; i < argbuf->s.nr; i++)
+		__free_page(argbuf->s.bvec[i].bv_page);
+
+free_argbuf:
 	kfree(argbuf);
 }
 
 static struct virtio_fs_argbuf *virtio_fs_argbuf_new(struct fuse_args *args,
-						     gfp_t gfp)
+						     gfp_t gfp, bool is_flat)
 {
 	struct virtio_fs_argbuf *argbuf;
 	unsigned int numargs;
-	unsigned int len;
+	unsigned int in_len, out_len, len;
+	unsigned int i, nr;
 
 	numargs = args->in_numargs - args->in_pages;
-	len = fuse_len_args(numargs, (struct fuse_arg *) args->in_args);
+	in_len = fuse_len_args(numargs, (struct fuse_arg *) args->in_args);
 	numargs = args->out_numargs - args->out_pages;
-	len += fuse_len_args(numargs, args->out_args);
+	out_len = fuse_len_args(numargs, args->out_args);
+	len = virtio_fs_argbuf_len(in_len, out_len, is_flat);
+
+	if (is_flat) {
+		argbuf = kmalloc(struct_size(argbuf, f.buf, len), gfp);
+		if (argbuf)
+			argbuf->is_flat = true;
+
+		return argbuf;
+	}
+
+	nr = len >> PAGE_SHIFT;
+	argbuf = kmalloc(struct_size(argbuf, s.bvec, nr), gfp);
+	if (!argbuf)
+		return NULL;
+
+	argbuf->is_flat = false;
+	argbuf->s.size = len;
+	argbuf->s.nr = 0;
+	for (i = 0; i < nr; i++) {
+		struct page *page;
+
+		page = alloc_page(gfp);
+		if (!page) {
+			virtio_fs_argbuf_free(argbuf);
+			return NULL;
+		}
+		bvec_set_page(&argbuf->s.bvec[i], page, PAGE_SIZE, 0);
+		argbuf->s.nr++;
+	}
+
+	/* Zero the unused space for in_args */
+	if (in_len & ~PAGE_MASK) {
+		struct iov_iter iter;
+		unsigned int to_zero;
+
+		iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec, argbuf->s.nr,
+			      argbuf->s.size);
+		iov_iter_advance(&iter, in_len);
 
-	argbuf = kmalloc(struct_size(argbuf, buf, len), gfp);
+		to_zero = PAGE_SIZE - (in_len & ~PAGE_MASK);
+		iov_iter_zero(to_zero, &iter);
+	}
 
 	return argbuf;
 }
 
 static unsigned int virtio_fs_argbuf_setup_sg(struct virtio_fs_argbuf *argbuf,
 					      unsigned int offset,
-					      unsigned int len,
+					      unsigned int *len,
 					      struct scatterlist *sg)
 {
-	sg_init_one(sg, argbuf->buf + offset, len);
-	return 1;
+	struct bvec_iter bi = {
+		.bi_size = offset + *len,
+	};
+	struct scatterlist *cur;
+	struct bio_vec bv;
+
+	if (argbuf->is_flat) {
+		sg_init_one(sg, argbuf->f.buf + offset, *len);
+		return 1;
+	}
+
+	cur = sg;
+	bvec_iter_advance(argbuf->s.bvec, &bi, offset);
+	for_each_bvec(bv, argbuf->s.bvec, bi, bi) {
+		sg_init_table(cur, 1);
+		sg_set_page(cur, bv.bv_page, bv.bv_len, bv.bv_offset);
+		cur++;
+	}
+	*len = round_up(*len, PAGE_SIZE);
+
+	return cur - sg;
 }
 
 static void virtio_fs_argbuf_copy_from_in_arg(struct virtio_fs_argbuf *argbuf,
 					      unsigned int offset,
 					      const void *src, unsigned int len)
 {
-	memcpy(argbuf->buf + offset, src, len);
+	struct iov_iter iter;
+	unsigned int copied;
+
+	if (argbuf->is_flat) {
+		memcpy(argbuf->f.buf + offset, src, len);
+		return;
+	}
+
+	iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec,
+		      argbuf->s.nr, argbuf->s.size);
+	iov_iter_advance(&iter, offset);
+
+	copied = _copy_to_iter(src, len, &iter);
+	WARN_ON_ONCE(copied != len);
 }
 
 static unsigned int
@@ -451,15 +569,32 @@ virtio_fs_argbuf_out_args_offset(struct virtio_fs_argbuf *argbuf,
 				 const struct fuse_args *args)
 {
 	unsigned int num_in = args->in_numargs - args->in_pages;
+	unsigned int offset = fuse_len_args(num_in,
+					    (struct fuse_arg *)args->in_args);
 
-	return fuse_len_args(num_in, (struct fuse_arg *)args->in_args);
+	if (argbuf->is_flat)
+		return offset;
+	return round_up(offset, PAGE_SIZE);
 }
 
 static void virtio_fs_argbuf_copy_to_out_arg(struct virtio_fs_argbuf *argbuf,
 					     unsigned int offset, void *dst,
 					     unsigned int len)
 {
-	memcpy(dst, argbuf->buf + offset, len);
+	struct iov_iter iter;
+	unsigned int copied;
+
+	if (argbuf->is_flat) {
+		memcpy(dst, argbuf->f.buf + offset, len);
+		return;
+	}
+
+	iov_iter_bvec(&iter, ITER_SOURCE, argbuf->s.bvec,
+		      argbuf->s.nr, argbuf->s.size);
+	iov_iter_advance(&iter, offset);
+
+	copied = _copy_from_iter(dst, len, &iter);
+	WARN_ON_ONCE(copied != len);
 }
 
 /*
@@ -1154,7 +1289,7 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg,
 	len = fuse_len_args(numargs - argpages, args);
 	if (len)
 		total_sgs += virtio_fs_argbuf_setup_sg(req->argbuf, *len_used,
-						       len, &sg[total_sgs]);
+						       &len, &sg[total_sgs]);
 
 	if (argpages)
 		total_sgs += sg_init_fuse_pages(&sg[total_sgs],
@@ -1199,7 +1334,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	}
 
 	/* Use a bounce buffer since stack args cannot be mapped */
-	req->argbuf = virtio_fs_argbuf_new(args, GFP_ATOMIC);
+	req->argbuf = virtio_fs_argbuf_new(args, GFP_ATOMIC, true);
 	if (!req->argbuf) {
 		ret = -ENOMEM;
 		goto out;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/6] virtiofs: use scattered bounce buffer for ITER_KVEC dio
  2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
                   ` (3 preceding siblings ...)
  2024-02-28 14:41 ` [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages Hou Tao
@ 2024-02-28 14:41 ` Hou Tao
  2024-02-28 14:41 ` [PATCH v2 6/6] virtiofs: use GFP_NOFS when enqueuing request through kworker Hou Tao
  2024-04-08  7:45 ` [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Michael S. Tsirkin
  6 siblings, 0 replies; 20+ messages in thread
From: Hou Tao @ 2024-02-28 14:41 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

From: Hou Tao <houtao1@huawei.com>

To prevent unnecessary request for large contiguous physical memory
chunk, use bounce buffer backed by scattered pages for ITER_KVEC
direct-io read/write when the total size of its args is greater than
PAGE_SIZE.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/fuse/virtio_fs.c | 78 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 59 insertions(+), 19 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index ffea684bd100d..34b9370beba6d 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -458,20 +458,15 @@ static void virtio_fs_argbuf_free(struct virtio_fs_argbuf *argbuf)
 	kfree(argbuf);
 }
 
-static struct virtio_fs_argbuf *virtio_fs_argbuf_new(struct fuse_args *args,
+static struct virtio_fs_argbuf *virtio_fs_argbuf_new(unsigned int in_len,
+						     unsigned int out_len,
 						     gfp_t gfp, bool is_flat)
 {
 	struct virtio_fs_argbuf *argbuf;
-	unsigned int numargs;
-	unsigned int in_len, out_len, len;
+	unsigned int len;
 	unsigned int i, nr;
 
-	numargs = args->in_numargs - args->in_pages;
-	in_len = fuse_len_args(numargs, (struct fuse_arg *) args->in_args);
-	numargs = args->out_numargs - args->out_pages;
-	out_len = fuse_len_args(numargs, args->out_args);
 	len = virtio_fs_argbuf_len(in_len, out_len, is_flat);
-
 	if (is_flat) {
 		argbuf = kmalloc(struct_size(argbuf, f.buf, len), gfp);
 		if (argbuf)
@@ -1222,14 +1217,17 @@ static unsigned int sg_count_fuse_pages(struct fuse_page_desc *page_descs,
 }
 
 /* Return the number of scatter-gather list elements required */
-static unsigned int sg_count_fuse_req(struct fuse_req *req)
+static unsigned int sg_count_fuse_req(struct fuse_req *req,
+				      unsigned int in_args_len,
+				      unsigned int out_args_len,
+				      bool flat_argbuf)
 {
 	struct fuse_args *args = req->args;
 	struct fuse_args_pages *ap = container_of(args, typeof(*ap), args);
 	unsigned int size, total_sgs = 1 /* fuse_in_header */;
+	unsigned int num_in, num_out;
 
-	if (args->in_numargs - args->in_pages)
-		total_sgs += 1;
+	num_in = args->in_numargs - args->in_pages;
 
 	if (args->in_pages) {
 		size = args->in_args[args->in_numargs - 1].size;
@@ -1237,20 +1235,25 @@ static unsigned int sg_count_fuse_req(struct fuse_req *req)
 						 size);
 	}
 
-	if (!test_bit(FR_ISREPLY, &req->flags))
-		return total_sgs;
+	if (!test_bit(FR_ISREPLY, &req->flags)) {
+		num_out = 0;
+		goto done;
+	}
 
 	total_sgs += 1 /* fuse_out_header */;
-
-	if (args->out_numargs - args->out_pages)
-		total_sgs += 1;
+	num_out = args->out_numargs - args->out_pages;
 
 	if (args->out_pages) {
 		size = args->out_args[args->out_numargs - 1].size;
 		total_sgs += sg_count_fuse_pages(ap->descs, ap->num_pages,
 						 size);
 	}
-
+done:
+	if (flat_argbuf)
+		total_sgs += !!num_in + !!num_out;
+	else
+		total_sgs += virtio_fs_argbuf_len(in_args_len, out_args_len,
+						  false) >> PAGE_SHIFT;
 	return total_sgs;
 }
 
@@ -1302,6 +1305,31 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg,
 	return total_sgs;
 }
 
+static bool use_scattered_argbuf(struct fuse_req *req)
+{
+	struct fuse_args *args = req->args;
+
+	/*
+	 * To prevent unnecessary request for contiguous physical memory chunk,
+	 * use argbuf backed by scattered pages for ITER_KVEC direct-io
+	 * read/write when the total size of its args is greater than PAGE_SIZE.
+	 */
+	if ((req->in.h.opcode == FUSE_WRITE && !args->in_pages) ||
+	    (req->in.h.opcode == FUSE_READ && !args->out_pages)) {
+		unsigned int numargs;
+		unsigned int len;
+
+		numargs = args->in_numargs - args->in_pages;
+		len = fuse_len_args(numargs, (struct fuse_arg *)args->in_args);
+		numargs = args->out_numargs - args->out_pages;
+		len += fuse_len_args(numargs, args->out_args);
+		if (len > PAGE_SIZE)
+			return true;
+	}
+
+	return false;
+}
+
 /* Add a request to a virtqueue and kick the device */
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 				 struct fuse_req *req, bool in_flight)
@@ -1317,13 +1345,24 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	unsigned int out_sgs = 0;
 	unsigned int in_sgs = 0;
 	unsigned int total_sgs;
+	unsigned int numargs;
+	unsigned int in_args_len;
+	unsigned int out_args_len;
 	unsigned int i;
 	int ret;
 	bool notify;
+	bool flat_argbuf;
 	struct fuse_pqueue *fpq;
 
+	flat_argbuf = !use_scattered_argbuf(req);
+	numargs = args->in_numargs - args->in_pages;
+	in_args_len = fuse_len_args(numargs, (struct fuse_arg *) args->in_args);
+	numargs = args->out_numargs - args->out_pages;
+	out_args_len = fuse_len_args(numargs, args->out_args);
+
 	/* Does the sglist fit on the stack? */
-	total_sgs = sg_count_fuse_req(req);
+	total_sgs = sg_count_fuse_req(req, in_args_len, out_args_len,
+				      flat_argbuf);
 	if (total_sgs > ARRAY_SIZE(stack_sgs)) {
 		sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
 		sg = kmalloc_array(total_sgs, sizeof(sg[0]), GFP_ATOMIC);
@@ -1334,7 +1373,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	}
 
 	/* Use a bounce buffer since stack args cannot be mapped */
-	req->argbuf = virtio_fs_argbuf_new(args, GFP_ATOMIC, true);
+	req->argbuf = virtio_fs_argbuf_new(in_args_len, out_args_len,
+					   GFP_ATOMIC, flat_argbuf);
 	if (!req->argbuf) {
 		ret = -ENOMEM;
 		goto out;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/6] virtiofs: use GFP_NOFS when enqueuing request through kworker
  2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
                   ` (4 preceding siblings ...)
  2024-02-28 14:41 ` [PATCH v2 5/6] virtiofs: use scattered bounce buffer for ITER_KVEC dio Hou Tao
@ 2024-02-28 14:41 ` Hou Tao
  2024-04-08  7:45 ` [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Michael S. Tsirkin
  6 siblings, 0 replies; 20+ messages in thread
From: Hou Tao @ 2024-02-28 14:41 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

From: Hou Tao <houtao1@huawei.com>

When invoking virtio_fs_enqueue_req() through kworker, both the
allocation of the sg array and the bounce buffer still use GFP_ATOMIC.
Considering the size of the sg array may be greater than PAGE_SIZE, use
GFP_NOFS instead of GFP_ATOMIC to lower the possibility of memory
allocation failure and to avoid unnecessarily depleting the atomic
reserves. GFP_NOFS is not passed to virtio_fs_enqueue_req() directly,
use GFP_KERNEL and memalloc_nofs_{save|restore} helpers instead.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/fuse/virtio_fs.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 34b9370beba6d..9ee71051c89f2 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -108,7 +108,8 @@ struct virtio_fs_argbuf {
 };
 
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
-				 struct fuse_req *req, bool in_flight);
+				 struct fuse_req *req, bool in_flight,
+				 gfp_t gfp);
 
 static const struct constant_table dax_param_enums[] = {
 	{"always",	FUSE_DAX_ALWAYS },
@@ -394,6 +395,8 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work)
 
 	/* Dispatch pending requests */
 	while (1) {
+		unsigned int flags;
+
 		spin_lock(&fsvq->lock);
 		req = list_first_entry_or_null(&fsvq->queued_reqs,
 					       struct fuse_req, list);
@@ -404,7 +407,9 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work)
 		list_del_init(&req->list);
 		spin_unlock(&fsvq->lock);
 
-		ret = virtio_fs_enqueue_req(fsvq, req, true);
+		flags = memalloc_nofs_save();
+		ret = virtio_fs_enqueue_req(fsvq, req, true, GFP_KERNEL);
+		memalloc_nofs_restore(flags);
 		if (ret < 0) {
 			if (ret == -ENOMEM || ret == -ENOSPC) {
 				spin_lock(&fsvq->lock);
@@ -1332,7 +1337,8 @@ static bool use_scattered_argbuf(struct fuse_req *req)
 
 /* Add a request to a virtqueue and kick the device */
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
-				 struct fuse_req *req, bool in_flight)
+				 struct fuse_req *req, bool in_flight,
+				 gfp_t gfp)
 {
 	/* requests need at least 4 elements */
 	struct scatterlist *stack_sgs[6];
@@ -1364,8 +1370,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	total_sgs = sg_count_fuse_req(req, in_args_len, out_args_len,
 				      flat_argbuf);
 	if (total_sgs > ARRAY_SIZE(stack_sgs)) {
-		sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC);
-		sg = kmalloc_array(total_sgs, sizeof(sg[0]), GFP_ATOMIC);
+		sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), gfp);
+		sg = kmalloc_array(total_sgs, sizeof(sg[0]), gfp);
 		if (!sgs || !sg) {
 			ret = -ENOMEM;
 			goto out;
@@ -1373,8 +1379,8 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 	}
 
 	/* Use a bounce buffer since stack args cannot be mapped */
-	req->argbuf = virtio_fs_argbuf_new(in_args_len, out_args_len,
-					   GFP_ATOMIC, flat_argbuf);
+	req->argbuf = virtio_fs_argbuf_new(in_args_len, out_args_len, gfp,
+					   flat_argbuf);
 	if (!req->argbuf) {
 		ret = -ENOMEM;
 		goto out;
@@ -1473,7 +1479,7 @@ __releases(fiq->lock)
 		 fuse_len_args(req->args->out_numargs, req->args->out_args));
 
 	fsvq = &fs->vqs[queue_id];
-	ret = virtio_fs_enqueue_req(fsvq, req, false);
+	ret = virtio_fs_enqueue_req(fsvq, req, false, GFP_ATOMIC);
 	if (ret < 0) {
 		if (ret == -ENOMEM || ret == -ENOSPC) {
 			/*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages
  2024-02-28 14:41 ` [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages Hou Tao
@ 2024-02-29 15:01   ` Brian Foster
  2024-03-09  4:14     ` Hou Tao
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2024-02-29 15:01 UTC (permalink / raw)
  To: Hou Tao
  Cc: linux-fsdevel, Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi,
	Bernd Schubert, Michael S . Tsirkin, Matthew Wilcox,
	Benjamin Coddington, linux-kernel, virtualization, houtao1

On Wed, Feb 28, 2024 at 10:41:24PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> When reading a file kept in virtiofs from kernel (e.g., insmod a kernel
> module), if the cache of virtiofs is disabled, the read buffer will be
> passed to virtiofs through out_args[0].value instead of pages. Because
> virtiofs can't get the pages for the read buffer, virtio_fs_argbuf_new()
> will create a bounce buffer for the read buffer by using kmalloc() and
> copy the read buffer into bounce buffer. If the read buffer is large
> (e.g., 1MB), the allocation will incur significant stress on the memory
> subsystem.
> 
> So instead of allocating bounce buffer by using kmalloc(), allocate a
> bounce buffer which is backed by scattered pages. The original idea is
> to use vmap(), but the use of GFP_ATOMIC is no possible for vmap(). To
> simplify the copy operations in the bounce buffer, use a bio_vec flex
> array to represent the argbuf. Also add an is_flat field in struct
> virtio_fs_argbuf to distinguish between kmalloc-ed and scattered bounce
> buffer.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  fs/fuse/virtio_fs.c | 163 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 149 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index f10fff7f23a0f..ffea684bd100d 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
...
> @@ -408,42 +425,143 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work)
>  	}
>  }
>  
...  
>  static void virtio_fs_argbuf_copy_from_in_arg(struct virtio_fs_argbuf *argbuf,
>  					      unsigned int offset,
>  					      const void *src, unsigned int len)
>  {
> -	memcpy(argbuf->buf + offset, src, len);
> +	struct iov_iter iter;
> +	unsigned int copied;
> +
> +	if (argbuf->is_flat) {
> +		memcpy(argbuf->f.buf + offset, src, len);
> +		return;
> +	}
> +
> +	iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec,
> +		      argbuf->s.nr, argbuf->s.size);
> +	iov_iter_advance(&iter, offset);

Hi Hou,

Just a random comment, but it seems a little inefficient to reinit and
readvance the iter like this on every copy/call. It looks like offset is
already incremented in the callers of the argbuf copy helpers. Perhaps
iov_iter could be lifted into the callers and passed down, or even just
include it in the argbuf structure and init it at alloc time?

Brian

> +
> +	copied = _copy_to_iter(src, len, &iter);
> +	WARN_ON_ONCE(copied != len);
>  }
>  
>  static unsigned int
> @@ -451,15 +569,32 @@ virtio_fs_argbuf_out_args_offset(struct virtio_fs_argbuf *argbuf,
>  				 const struct fuse_args *args)
>  {
>  	unsigned int num_in = args->in_numargs - args->in_pages;
> +	unsigned int offset = fuse_len_args(num_in,
> +					    (struct fuse_arg *)args->in_args);
>  
> -	return fuse_len_args(num_in, (struct fuse_arg *)args->in_args);
> +	if (argbuf->is_flat)
> +		return offset;
> +	return round_up(offset, PAGE_SIZE);
>  }
>  
>  static void virtio_fs_argbuf_copy_to_out_arg(struct virtio_fs_argbuf *argbuf,
>  					     unsigned int offset, void *dst,
>  					     unsigned int len)
>  {
> -	memcpy(dst, argbuf->buf + offset, len);
> +	struct iov_iter iter;
> +	unsigned int copied;
> +
> +	if (argbuf->is_flat) {
> +		memcpy(dst, argbuf->f.buf + offset, len);
> +		return;
> +	}
> +
> +	iov_iter_bvec(&iter, ITER_SOURCE, argbuf->s.bvec,
> +		      argbuf->s.nr, argbuf->s.size);
> +	iov_iter_advance(&iter, offset);
> +
> +	copied = _copy_from_iter(dst, len, &iter);
> +	WARN_ON_ONCE(copied != len);
>  }
>  
>  /*
> @@ -1154,7 +1289,7 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg,
>  	len = fuse_len_args(numargs - argpages, args);
>  	if (len)
>  		total_sgs += virtio_fs_argbuf_setup_sg(req->argbuf, *len_used,
> -						       len, &sg[total_sgs]);
> +						       &len, &sg[total_sgs]);
>  
>  	if (argpages)
>  		total_sgs += sg_init_fuse_pages(&sg[total_sgs],
> @@ -1199,7 +1334,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>  	}
>  
>  	/* Use a bounce buffer since stack args cannot be mapped */
> -	req->argbuf = virtio_fs_argbuf_new(args, GFP_ATOMIC);
> +	req->argbuf = virtio_fs_argbuf_new(args, GFP_ATOMIC, true);
>  	if (!req->argbuf) {
>  		ret = -ENOMEM;
>  		goto out;
> -- 
> 2.29.2
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages
  2024-02-28 14:41 ` [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages Hou Tao
@ 2024-03-01 13:42   ` Miklos Szeredi
  2024-03-09  4:26     ` Hou Tao
  0 siblings, 1 reply; 20+ messages in thread
From: Miklos Szeredi @ 2024-03-01 13:42 UTC (permalink / raw)
  To: Hou Tao
  Cc: linux-fsdevel, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

On Wed, 28 Feb 2024 at 15:40, Hou Tao <houtao@huaweicloud.com> wrote:

> So instead of limiting both the values of max_read and max_write in
> kernel, capping the maximal length of kvec iter IO by using max_pages in
> fuse_direct_io() just like it does for ubuf/iovec iter IO. Now the max
> value for max_pages is 256, so on host with 4KB page size, the maximal
> size passed to kmalloc() in copy_args_to_argbuf() is about 1MB+40B. The
> allocation of 2MB of physically contiguous memory will still incur
> significant stress on the memory subsystem, but the warning is fixed.
> Additionally, the requirement for huge physically contiguous memory will
> be removed in the following patch.

So the issue will be fixed properly by following patches?

In that case this patch could be omitted, right?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] virtiofs: factor out more common methods for argbuf
  2024-02-28 14:41 ` [PATCH v2 3/6] virtiofs: factor out more common methods for argbuf Hou Tao
@ 2024-03-01 14:24   ` Miklos Szeredi
  2024-03-09  4:27     ` Hou Tao
  0 siblings, 1 reply; 20+ messages in thread
From: Miklos Szeredi @ 2024-03-01 14:24 UTC (permalink / raw)
  To: Hou Tao
  Cc: linux-fsdevel, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

On Wed, 28 Feb 2024 at 15:41, Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Factor out more common methods for bounce buffer of fuse args:
>
> 1) virtio_fs_argbuf_setup_sg: set-up sgs for bounce buffer
> 2) virtio_fs_argbuf_copy_from_in_arg: copy each in-arg to bounce buffer
> 3) virtio_fs_argbuf_out_args_offset: calc the start offset of out-arg
> 4) virtio_fs_argbuf_copy_to_out_arg: copy bounce buffer to each out-arg
>
> These methods will be used to implement bounce buffer backed by
> scattered pages which are allocated separatedly.

Why is req->argbuf not changed to being typed?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages
  2024-02-29 15:01   ` Brian Foster
@ 2024-03-09  4:14     ` Hou Tao
  2024-03-13 12:14       ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Hou Tao @ 2024-03-09  4:14 UTC (permalink / raw)
  To: Brian Foster
  Cc: linux-fsdevel, Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi,
	Bernd Schubert, Michael S . Tsirkin, Matthew Wilcox,
	Benjamin Coddington, linux-kernel, virtualization, houtao1

Hi,

On 2/29/2024 11:01 PM, Brian Foster wrote:
> On Wed, Feb 28, 2024 at 10:41:24PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> When reading a file kept in virtiofs from kernel (e.g., insmod a kernel
>> module), if the cache of virtiofs is disabled, the read buffer will be
>> passed to virtiofs through out_args[0].value instead of pages. Because
>> virtiofs can't get the pages for the read buffer, virtio_fs_argbuf_new()
>> will create a bounce buffer for the read buffer by using kmalloc() and
>> copy the read buffer into bounce buffer. If the read buffer is large
>> (e.g., 1MB), the allocation will incur significant stress on the memory
>> subsystem.
>>
>> So instead of allocating bounce buffer by using kmalloc(), allocate a
>> bounce buffer which is backed by scattered pages. The original idea is
>> to use vmap(), but the use of GFP_ATOMIC is no possible for vmap(). To
>> simplify the copy operations in the bounce buffer, use a bio_vec flex
>> array to represent the argbuf. Also add an is_flat field in struct
>> virtio_fs_argbuf to distinguish between kmalloc-ed and scattered bounce
>> buffer.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  fs/fuse/virtio_fs.c | 163 ++++++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 149 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
>> index f10fff7f23a0f..ffea684bd100d 100644
>> --- a/fs/fuse/virtio_fs.c
>> +++ b/fs/fuse/virtio_fs.c
> ...
>> @@ -408,42 +425,143 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work)
>>  	}
>>  }
>>  
> ...  
>>  static void virtio_fs_argbuf_copy_from_in_arg(struct virtio_fs_argbuf *argbuf,
>>  					      unsigned int offset,
>>  					      const void *src, unsigned int len)
>>  {
>> -	memcpy(argbuf->buf + offset, src, len);
>> +	struct iov_iter iter;
>> +	unsigned int copied;
>> +
>> +	if (argbuf->is_flat) {
>> +		memcpy(argbuf->f.buf + offset, src, len);
>> +		return;
>> +	}
>> +
>> +	iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec,
>> +		      argbuf->s.nr, argbuf->s.size);
>> +	iov_iter_advance(&iter, offset);
> Hi Hou,
>
> Just a random comment, but it seems a little inefficient to reinit and
> readvance the iter like this on every copy/call. It looks like offset is
> already incremented in the callers of the argbuf copy helpers. Perhaps
> iov_iter could be lifted into the callers and passed down, or even just
> include it in the argbuf structure and init it at alloc time?

Sorry for the late reply. Being busy with off-site workshop these days.

I have tried a similar idea before in which iov_iter was saved directly
in argbuf struct, but it didn't work out well. The reason is that for
copy both in_args and out_args, an iov_iter is needed, but the direction
is different. Currently the bi-directional io_vec is not supported, so
the code have to initialize the iov_iter twice: one for copy from
in_args and another one for copy to out_args.

For dio read initiated from kernel, both of its in_numargs and
out_numargs is 1, so there will be only one iov_iter_advance() in
virtio_fs_argbuf_copy_to_out_arg() and the offset is 64, so I think the
overhead will be fine. For dio write initiated from kernel, its
in_numargs is 2 and out_numargs is 1, so there will be two invocations
of iov_iter_advance(). The first one with offset=64, and the another one
with offset=round_up_page_size(64 + write_size), so the later one may
introduce extra overhead. But compared with the overhead of data copy, I
still think the overhead of calling iov_iter_advance() is fine.

> Brian
>
>> +
>> +	copied = _copy_to_iter(src, len, &iter);
>> +	WARN_ON_ONCE(copied != len);
>>  }
>>  
>>  static unsigned int
>> @@ -451,15 +569,32 @@ virtio_fs_argbuf_out_args_offset(struct virtio_fs_argbuf *argbuf,
>>  				 const struct fuse_args *args)
>>  {
>>  	unsigned int num_in = args->in_numargs - args->in_pages;
>> +	unsigned int offset = fuse_len_args(num_in,
>> +					    (struct fuse_arg *)args->in_args);
>>  
>> -	return fuse_len_args(num_in, (struct fuse_arg *)args->in_args);
>> +	if (argbuf->is_flat)
>> +		return offset;
>> +	return round_up(offset, PAGE_SIZE);
>>  }
>>  
>>  static void virtio_fs_argbuf_copy_to_out_arg(struct virtio_fs_argbuf *argbuf,
>>  					     unsigned int offset, void *dst,
>>  					     unsigned int len)
>>  {
>> -	memcpy(dst, argbuf->buf + offset, len);
>> +	struct iov_iter iter;
>> +	unsigned int copied;
>> +
>> +	if (argbuf->is_flat) {
>> +		memcpy(dst, argbuf->f.buf + offset, len);
>> +		return;
>> +	}
>> +
>> +	iov_iter_bvec(&iter, ITER_SOURCE, argbuf->s.bvec,
>> +		      argbuf->s.nr, argbuf->s.size);
>> +	iov_iter_advance(&iter, offset);
>> +
>> +	copied = _copy_from_iter(dst, len, &iter);
>> +	WARN_ON_ONCE(copied != len);
>>  }
>>  
>>  /*
>> @@ -1154,7 +1289,7 @@ static unsigned int sg_init_fuse_args(struct scatterlist *sg,
>>  	len = fuse_len_args(numargs - argpages, args);
>>  	if (len)
>>  		total_sgs += virtio_fs_argbuf_setup_sg(req->argbuf, *len_used,
>> -						       len, &sg[total_sgs]);
>> +						       &len, &sg[total_sgs]);
>>  
>>  	if (argpages)
>>  		total_sgs += sg_init_fuse_pages(&sg[total_sgs],
>> @@ -1199,7 +1334,7 @@ static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
>>  	}
>>  
>>  	/* Use a bounce buffer since stack args cannot be mapped */
>> -	req->argbuf = virtio_fs_argbuf_new(args, GFP_ATOMIC);
>> +	req->argbuf = virtio_fs_argbuf_new(args, GFP_ATOMIC, true);
>>  	if (!req->argbuf) {
>>  		ret = -ENOMEM;
>>  		goto out;
>> -- 
>> 2.29.2
>>
>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages
  2024-03-01 13:42   ` Miklos Szeredi
@ 2024-03-09  4:26     ` Hou Tao
  2024-03-13 23:02       ` Bernd Schubert
  0 siblings, 1 reply; 20+ messages in thread
From: Hou Tao @ 2024-03-09  4:26 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

Hi,

On 3/1/2024 9:42 PM, Miklos Szeredi wrote:
> On Wed, 28 Feb 2024 at 15:40, Hou Tao <houtao@huaweicloud.com> wrote:
>
>> So instead of limiting both the values of max_read and max_write in
>> kernel, capping the maximal length of kvec iter IO by using max_pages in
>> fuse_direct_io() just like it does for ubuf/iovec iter IO. Now the max
>> value for max_pages is 256, so on host with 4KB page size, the maximal
>> size passed to kmalloc() in copy_args_to_argbuf() is about 1MB+40B. The
>> allocation of 2MB of physically contiguous memory will still incur
>> significant stress on the memory subsystem, but the warning is fixed.
>> Additionally, the requirement for huge physically contiguous memory will
>> be removed in the following patch.
> So the issue will be fixed properly by following patches?
>
> In that case this patch could be omitted, right?

Sorry for the late reply. Being busy with off-site workshop these days.

No, this patch is still necessary and it is used to limit the number of
scatterlist used for fuse request and reply in virtio-fs. If the length
of out_args[0].size is not limited, the number of scatterlist used to
map the fuse request may be greater than the queue size of virtio-queue
and the fuse request may hang forever.

>
> Thanks,
> Miklos


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] virtiofs: factor out more common methods for argbuf
  2024-03-01 14:24   ` Miklos Szeredi
@ 2024-03-09  4:27     ` Hou Tao
  0 siblings, 0 replies; 20+ messages in thread
From: Hou Tao @ 2024-03-09  4:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, Vivek Goyal, Stefan Hajnoczi, Bernd Schubert,
	Michael S . Tsirkin, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

Hi,

On 3/1/2024 10:24 PM, Miklos Szeredi wrote:
> On Wed, 28 Feb 2024 at 15:41, Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Factor out more common methods for bounce buffer of fuse args:
>>
>> 1) virtio_fs_argbuf_setup_sg: set-up sgs for bounce buffer
>> 2) virtio_fs_argbuf_copy_from_in_arg: copy each in-arg to bounce buffer
>> 3) virtio_fs_argbuf_out_args_offset: calc the start offset of out-arg
>> 4) virtio_fs_argbuf_copy_to_out_arg: copy bounce buffer to each out-arg
>>
>> These methods will be used to implement bounce buffer backed by
>> scattered pages which are allocated separatedly.
> Why is req->argbuf not changed to being typed?

Will update in next revision. Thanks for the suggestion.
>
> Thanks,
> Miklos


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages
  2024-03-09  4:14     ` Hou Tao
@ 2024-03-13 12:14       ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2024-03-13 12:14 UTC (permalink / raw)
  To: Hou Tao
  Cc: linux-fsdevel, Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi,
	Bernd Schubert, Michael S . Tsirkin, Matthew Wilcox,
	Benjamin Coddington, linux-kernel, virtualization, houtao1

On Sat, Mar 09, 2024 at 12:14:23PM +0800, Hou Tao wrote:
> Hi,
> 
> On 2/29/2024 11:01 PM, Brian Foster wrote:
> > On Wed, Feb 28, 2024 at 10:41:24PM +0800, Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> When reading a file kept in virtiofs from kernel (e.g., insmod a kernel
> >> module), if the cache of virtiofs is disabled, the read buffer will be
> >> passed to virtiofs through out_args[0].value instead of pages. Because
> >> virtiofs can't get the pages for the read buffer, virtio_fs_argbuf_new()
> >> will create a bounce buffer for the read buffer by using kmalloc() and
> >> copy the read buffer into bounce buffer. If the read buffer is large
> >> (e.g., 1MB), the allocation will incur significant stress on the memory
> >> subsystem.
> >>
> >> So instead of allocating bounce buffer by using kmalloc(), allocate a
> >> bounce buffer which is backed by scattered pages. The original idea is
> >> to use vmap(), but the use of GFP_ATOMIC is no possible for vmap(). To
> >> simplify the copy operations in the bounce buffer, use a bio_vec flex
> >> array to represent the argbuf. Also add an is_flat field in struct
> >> virtio_fs_argbuf to distinguish between kmalloc-ed and scattered bounce
> >> buffer.
> >>
> >> Signed-off-by: Hou Tao <houtao1@huawei.com>
> >> ---
> >>  fs/fuse/virtio_fs.c | 163 ++++++++++++++++++++++++++++++++++++++++----
> >>  1 file changed, 149 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> >> index f10fff7f23a0f..ffea684bd100d 100644
> >> --- a/fs/fuse/virtio_fs.c
> >> +++ b/fs/fuse/virtio_fs.c
> > ...
> >> @@ -408,42 +425,143 @@ static void virtio_fs_request_dispatch_work(struct work_struct *work)
> >>  	}
> >>  }
> >>  
> > ...  
> >>  static void virtio_fs_argbuf_copy_from_in_arg(struct virtio_fs_argbuf *argbuf,
> >>  					      unsigned int offset,
> >>  					      const void *src, unsigned int len)
> >>  {
> >> -	memcpy(argbuf->buf + offset, src, len);
> >> +	struct iov_iter iter;
> >> +	unsigned int copied;
> >> +
> >> +	if (argbuf->is_flat) {
> >> +		memcpy(argbuf->f.buf + offset, src, len);
> >> +		return;
> >> +	}
> >> +
> >> +	iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec,
> >> +		      argbuf->s.nr, argbuf->s.size);
> >> +	iov_iter_advance(&iter, offset);
> > Hi Hou,
> >
> > Just a random comment, but it seems a little inefficient to reinit and
> > readvance the iter like this on every copy/call. It looks like offset is
> > already incremented in the callers of the argbuf copy helpers. Perhaps
> > iov_iter could be lifted into the callers and passed down, or even just
> > include it in the argbuf structure and init it at alloc time?
> 
> Sorry for the late reply. Being busy with off-site workshop these days.
> 

No worries.

> I have tried a similar idea before in which iov_iter was saved directly
> in argbuf struct, but it didn't work out well. The reason is that for
> copy both in_args and out_args, an iov_iter is needed, but the direction
> is different. Currently the bi-directional io_vec is not supported, so
> the code have to initialize the iov_iter twice: one for copy from
> in_args and another one for copy to out_args.
> 

Ok, seems reasonable enough.

> For dio read initiated from kernel, both of its in_numargs and
> out_numargs is 1, so there will be only one iov_iter_advance() in
> virtio_fs_argbuf_copy_to_out_arg() and the offset is 64, so I think the
> overhead will be fine. For dio write initiated from kernel, its
> in_numargs is 2 and out_numargs is 1, so there will be two invocations
> of iov_iter_advance(). The first one with offset=64, and the another one
> with offset=round_up_page_size(64 + write_size), so the later one may
> introduce extra overhead. But compared with the overhead of data copy, I
> still think the overhead of calling iov_iter_advance() is fine.
> 

I'm not claiming the overhead is some practical problem here, but rather
that we shouldn't need to be concerned with it in the first place with
some cleaner code. It's been a bit since I first looked at this. I was
originally wondering about defining the iov_iter in the caller and pass
as a param, but on another look, do the lowest level helpers really need
to exist?

If you don't anticipate further users, IMO something like the diff below
is a bit more clean (compile tested only, but no reinits and less code
overall). But that's just my .02, feel free to use it or not..

Brian

--- 8< ---

diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 9ee71051c89f..9de477e9ccd5 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -544,26 +544,6 @@ static unsigned int virtio_fs_argbuf_setup_sg(struct virtio_fs_argbuf *argbuf,
 	return cur - sg;
 }
 
-static void virtio_fs_argbuf_copy_from_in_arg(struct virtio_fs_argbuf *argbuf,
-					      unsigned int offset,
-					      const void *src, unsigned int len)
-{
-	struct iov_iter iter;
-	unsigned int copied;
-
-	if (argbuf->is_flat) {
-		memcpy(argbuf->f.buf + offset, src, len);
-		return;
-	}
-
-	iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec,
-		      argbuf->s.nr, argbuf->s.size);
-	iov_iter_advance(&iter, offset);
-
-	copied = _copy_to_iter(src, len, &iter);
-	WARN_ON_ONCE(copied != len);
-}
-
 static unsigned int
 virtio_fs_argbuf_out_args_offset(struct virtio_fs_argbuf *argbuf,
 				 const struct fuse_args *args)
@@ -577,26 +557,6 @@ virtio_fs_argbuf_out_args_offset(struct virtio_fs_argbuf *argbuf,
 	return round_up(offset, PAGE_SIZE);
 }
 
-static void virtio_fs_argbuf_copy_to_out_arg(struct virtio_fs_argbuf *argbuf,
-					     unsigned int offset, void *dst,
-					     unsigned int len)
-{
-	struct iov_iter iter;
-	unsigned int copied;
-
-	if (argbuf->is_flat) {
-		memcpy(dst, argbuf->f.buf + offset, len);
-		return;
-	}
-
-	iov_iter_bvec(&iter, ITER_SOURCE, argbuf->s.bvec,
-		      argbuf->s.nr, argbuf->s.size);
-	iov_iter_advance(&iter, offset);
-
-	copied = _copy_from_iter(dst, len, &iter);
-	WARN_ON_ONCE(copied != len);
-}
-
 /*
  * Returns 1 if queue is full and sender should wait a bit before sending
  * next request, 0 otherwise.
@@ -684,23 +644,41 @@ static void virtio_fs_hiprio_dispatch_work(struct work_struct *work)
 static void copy_args_to_argbuf(struct fuse_req *req)
 {
 	struct fuse_args *args = req->args;
+	struct virtio_fs_argbuf *argbuf = req->argbuf;
+	struct iov_iter iter;
+	unsigned int copied;
 	unsigned int offset = 0;
 	unsigned int num_in;
 	unsigned int i;
 
+	if (!argbuf->is_flat) {
+		iov_iter_bvec(&iter, ITER_DEST, argbuf->s.bvec, argbuf->s.nr,
+			argbuf->s.size);
+	}
+
 	num_in = args->in_numargs - args->in_pages;
 	for (i = 0; i < num_in; i++) {
-		virtio_fs_argbuf_copy_from_in_arg(req->argbuf, offset,
-						  args->in_args[i].value,
-						  args->in_args[i].size);
-		offset += args->in_args[i].size;
+		const void *src = args->in_args[i].value;
+		unsigned int len = args->in_args[i].size;
+
+		if (argbuf->is_flat) {
+			memcpy(argbuf->f.buf + offset, src, len);
+			offset += len;
+			continue;
+		}
+
+		iov_iter_advance(&iter, len);
+		copied = _copy_to_iter(src, len, &iter);
+		WARN_ON_ONCE(copied != len);
 	}
 }
 
 /* Copy args out of req->argbuf */
 static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req)
 {
-	struct virtio_fs_argbuf *argbuf;
+	struct virtio_fs_argbuf *argbuf = req->argbuf;
+	struct iov_iter iter;
+	unsigned int copied;
 	unsigned int remaining;
 	unsigned int offset;
 	unsigned int num_out;
@@ -711,10 +689,14 @@ static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req)
 	if (!num_out)
 		goto out;
 
-	argbuf = req->argbuf;
+	if (!argbuf->is_flat)
+		iov_iter_bvec(&iter, ITER_SOURCE, argbuf->s.bvec,
+		      argbuf->s.nr, argbuf->s.size);
+
 	offset = virtio_fs_argbuf_out_args_offset(argbuf, args);
 	for (i = 0; i < num_out; i++) {
 		unsigned int argsize = args->out_args[i].size;
+		void *dst = args->out_args[i].value;
 
 		if (args->out_argvar &&
 		    i == args->out_numargs - 1 &&
@@ -722,10 +704,14 @@ static void copy_args_from_argbuf(struct fuse_args *args, struct fuse_req *req)
 			argsize = remaining;
 		}
 
-		virtio_fs_argbuf_copy_to_out_arg(argbuf, offset,
-						 args->out_args[i].value,
-						 argsize);
-		offset += argsize;
+		if (argbuf->is_flat) {
+			memcpy(dst, argbuf->f.buf + offset, argsize);
+			offset += argsize;
+		} else {
+			iov_iter_advance(&iter, argsize);
+			copied = _copy_from_iter(dst, argsize, &iter);
+			WARN_ON_ONCE(copied != argsize);
+		}
 
 		if (i != args->out_numargs - 1)
 			remaining -= argsize;


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages
  2024-03-09  4:26     ` Hou Tao
@ 2024-03-13 23:02       ` Bernd Schubert
  0 siblings, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2024-03-13 23:02 UTC (permalink / raw)
  To: Hou Tao, Miklos Szeredi
  Cc: linux-fsdevel, Vivek Goyal, Stefan Hajnoczi, Michael S . Tsirkin,
	Matthew Wilcox, Benjamin Coddington, linux-kernel,
	virtualization, houtao1



On 3/9/24 05:26, Hou Tao wrote:
> Hi,
> 
> On 3/1/2024 9:42 PM, Miklos Szeredi wrote:
>> On Wed, 28 Feb 2024 at 15:40, Hou Tao <houtao@huaweicloud.com> wrote:
>>
>>> So instead of limiting both the values of max_read and max_write in
>>> kernel, capping the maximal length of kvec iter IO by using max_pages in
>>> fuse_direct_io() just like it does for ubuf/iovec iter IO. Now the max
>>> value for max_pages is 256, so on host with 4KB page size, the maximal
>>> size passed to kmalloc() in copy_args_to_argbuf() is about 1MB+40B. The
>>> allocation of 2MB of physically contiguous memory will still incur
>>> significant stress on the memory subsystem, but the warning is fixed.
>>> Additionally, the requirement for huge physically contiguous memory will
>>> be removed in the following patch.
>> So the issue will be fixed properly by following patches?
>>
>> In that case this patch could be omitted, right?
> 
> Sorry for the late reply. Being busy with off-site workshop these days.
> 
> No, this patch is still necessary and it is used to limit the number of
> scatterlist used for fuse request and reply in virtio-fs. If the length
> of out_args[0].size is not limited, the number of scatterlist used to
> map the fuse request may be greater than the queue size of virtio-queue
> and the fuse request may hang forever.

I'm currently also totally busy and didn't carefully check, but isn't
there something missing that limits fc->max_write/fc->max_read?


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio
  2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
                   ` (5 preceding siblings ...)
  2024-02-28 14:41 ` [PATCH v2 6/6] virtiofs: use GFP_NOFS when enqueuing request through kworker Hou Tao
@ 2024-04-08  7:45 ` Michael S. Tsirkin
  2024-04-09  1:48   ` Hou Tao
  6 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-04-08  7:45 UTC (permalink / raw)
  To: Hou Tao
  Cc: linux-fsdevel, Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi,
	Bernd Schubert, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

On Wed, Feb 28, 2024 at 10:41:20PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The patch set aims to fix the warning related to an abnormal size
> parameter of kmalloc() in virtiofs. The warning occurred when attempting
> to insert a 10MB sized kernel module kept in a virtiofs with cache
> disabled. As analyzed in patch #1, the root cause is that the length of
> the read buffer is no limited, and the read buffer is passed directly to
> virtiofs through out_args[0].value. Therefore patch #1 limits the
> length of the read buffer passed to virtiofs by using max_pages. However
> it is not enough, because now the maximal value of max_pages is 256.
> Consequently, when reading a 10MB-sized kernel module, the length of the
> bounce buffer in virtiofs will be 40 + (256 * 4096), and kmalloc will
> try to allocate 2MB from memory subsystem. The request for 2MB of
> physically contiguous memory significantly stress the memory subsystem
> and may fail indefinitely on hosts with fragmented memory. To address
> this, patch #2~#5 use scattered pages in a bio_vec to replace the
> kmalloc-allocated bounce buffer when the length of the bounce buffer for
> KVEC_ITER dio is larger than PAGE_SIZE. The final issue with the
> allocation of the bounce buffer and sg array in virtiofs is that
> GFP_ATOMIC is used even when the allocation occurs in a kworker context.
> Therefore the last patch uses GFP_NOFS for the allocation of both sg
> array and bounce buffer when initiated by the kworker. For more details,
> please check the individual patches.
> 
> As usual, comments are always welcome.
> 
> Change Log:

Bernd should I just merge the patchset as is?
It seems to fix a real problem and no one has the
time to work on a better fix .... WDYT?


> v2:
>   * limit the length of ITER_KVEC dio by max_pages instead of the
>     newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
>     dio being consistent with other rw operations.
>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
>     backed by scattered pages when the length of the bounce buffer for
>     KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
>     fragmented memory, the KVEC_ITER dio can be handled normally by
>     virtiofs. (Bernd Schubert)
>   * merge the GFP_NOFS patch [1] into this patch-set and use
>     memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
>     (Benjamin Coddington)
> 
> v1: https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-houtao@huaweicloud.com/
> 
> [1]: https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-houtao@huaweicloud.com/
> 
> Hou Tao (6):
>   fuse: limit the length of ITER_KVEC dio by max_pages
>   virtiofs: move alloc/free of argbuf into separated helpers
>   virtiofs: factor out more common methods for argbuf
>   virtiofs: support bounce buffer backed by scattered pages
>   virtiofs: use scattered bounce buffer for ITER_KVEC dio
>   virtiofs: use GFP_NOFS when enqueuing request through kworker
> 
>  fs/fuse/file.c      |  12 +-
>  fs/fuse/virtio_fs.c | 336 +++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 296 insertions(+), 52 deletions(-)
> 
> -- 
> 2.29.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio
  2024-04-08  7:45 ` [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Michael S. Tsirkin
@ 2024-04-09  1:48   ` Hou Tao
  2024-04-22 20:06     ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Hou Tao @ 2024-04-09  1:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-fsdevel, Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi,
	Bernd Schubert, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

Hi,

On 4/8/2024 3:45 PM, Michael S. Tsirkin wrote:
> On Wed, Feb 28, 2024 at 10:41:20PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Hi,
>>
>> The patch set aims to fix the warning related to an abnormal size
>> parameter of kmalloc() in virtiofs. The warning occurred when attempting
>> to insert a 10MB sized kernel module kept in a virtiofs with cache
>> disabled. As analyzed in patch #1, the root cause is that the length of
>> the read buffer is no limited, and the read buffer is passed directly to
>> virtiofs through out_args[0].value. Therefore patch #1 limits the
>> length of the read buffer passed to virtiofs by using max_pages. However
>> it is not enough, because now the maximal value of max_pages is 256.
>> Consequently, when reading a 10MB-sized kernel module, the length of the
>> bounce buffer in virtiofs will be 40 + (256 * 4096), and kmalloc will
>> try to allocate 2MB from memory subsystem. The request for 2MB of
>> physically contiguous memory significantly stress the memory subsystem
>> and may fail indefinitely on hosts with fragmented memory. To address
>> this, patch #2~#5 use scattered pages in a bio_vec to replace the
>> kmalloc-allocated bounce buffer when the length of the bounce buffer for
>> KVEC_ITER dio is larger than PAGE_SIZE. The final issue with the
>> allocation of the bounce buffer and sg array in virtiofs is that
>> GFP_ATOMIC is used even when the allocation occurs in a kworker context.
>> Therefore the last patch uses GFP_NOFS for the allocation of both sg
>> array and bounce buffer when initiated by the kworker. For more details,
>> please check the individual patches.
>>
>> As usual, comments are always welcome.
>>
>> Change Log:
> Bernd should I just merge the patchset as is?
> It seems to fix a real problem and no one has the
> time to work on a better fix .... WDYT?

Sorry for the long delay. I am just start to prepare for v3. In v3, I
plan to avoid the unnecessary memory copy between fuse args and bio_vec.
Will post it before next week.
>
>
>> v2:
>>   * limit the length of ITER_KVEC dio by max_pages instead of the
>>     newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
>>     dio being consistent with other rw operations.
>>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
>>     backed by scattered pages when the length of the bounce buffer for
>>     KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
>>     fragmented memory, the KVEC_ITER dio can be handled normally by
>>     virtiofs. (Bernd Schubert)
>>   * merge the GFP_NOFS patch [1] into this patch-set and use
>>     memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
>>     (Benjamin Coddington)
>>
>> v1: https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-houtao@huaweicloud.com/
>>
>> [1]: https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-houtao@huaweicloud.com/
>>
>> Hou Tao (6):
>>   fuse: limit the length of ITER_KVEC dio by max_pages
>>   virtiofs: move alloc/free of argbuf into separated helpers
>>   virtiofs: factor out more common methods for argbuf
>>   virtiofs: support bounce buffer backed by scattered pages
>>   virtiofs: use scattered bounce buffer for ITER_KVEC dio
>>   virtiofs: use GFP_NOFS when enqueuing request through kworker
>>
>>  fs/fuse/file.c      |  12 +-
>>  fs/fuse/virtio_fs.c | 336 +++++++++++++++++++++++++++++++++++++-------
>>  2 files changed, 296 insertions(+), 52 deletions(-)
>>
>> -- 
>> 2.29.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio
  2024-04-09  1:48   ` Hou Tao
@ 2024-04-22 20:06     ` Michael S. Tsirkin
  2024-04-22 21:11       ` Bernd Schubert
  2024-04-23 13:25       ` Hou Tao
  0 siblings, 2 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-04-22 20:06 UTC (permalink / raw)
  To: Hou Tao
  Cc: linux-fsdevel, Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi,
	Bernd Schubert, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1

On Tue, Apr 09, 2024 at 09:48:08AM +0800, Hou Tao wrote:
> Hi,
> 
> On 4/8/2024 3:45 PM, Michael S. Tsirkin wrote:
> > On Wed, Feb 28, 2024 at 10:41:20PM +0800, Hou Tao wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Hi,
> >>
> >> The patch set aims to fix the warning related to an abnormal size
> >> parameter of kmalloc() in virtiofs. The warning occurred when attempting
> >> to insert a 10MB sized kernel module kept in a virtiofs with cache
> >> disabled. As analyzed in patch #1, the root cause is that the length of
> >> the read buffer is no limited, and the read buffer is passed directly to
> >> virtiofs through out_args[0].value. Therefore patch #1 limits the
> >> length of the read buffer passed to virtiofs by using max_pages. However
> >> it is not enough, because now the maximal value of max_pages is 256.
> >> Consequently, when reading a 10MB-sized kernel module, the length of the
> >> bounce buffer in virtiofs will be 40 + (256 * 4096), and kmalloc will
> >> try to allocate 2MB from memory subsystem. The request for 2MB of
> >> physically contiguous memory significantly stress the memory subsystem
> >> and may fail indefinitely on hosts with fragmented memory. To address
> >> this, patch #2~#5 use scattered pages in a bio_vec to replace the
> >> kmalloc-allocated bounce buffer when the length of the bounce buffer for
> >> KVEC_ITER dio is larger than PAGE_SIZE. The final issue with the
> >> allocation of the bounce buffer and sg array in virtiofs is that
> >> GFP_ATOMIC is used even when the allocation occurs in a kworker context.
> >> Therefore the last patch uses GFP_NOFS for the allocation of both sg
> >> array and bounce buffer when initiated by the kworker. For more details,
> >> please check the individual patches.
> >>
> >> As usual, comments are always welcome.
> >>
> >> Change Log:
> > Bernd should I just merge the patchset as is?
> > It seems to fix a real problem and no one has the
> > time to work on a better fix .... WDYT?
> 
> Sorry for the long delay. I am just start to prepare for v3. In v3, I
> plan to avoid the unnecessary memory copy between fuse args and bio_vec.
> Will post it before next week.

Didn't happen before this week apparently.

> >
> >
> >> v2:
> >>   * limit the length of ITER_KVEC dio by max_pages instead of the
> >>     newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
> >>     dio being consistent with other rw operations.
> >>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
> >>     backed by scattered pages when the length of the bounce buffer for
> >>     KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
> >>     fragmented memory, the KVEC_ITER dio can be handled normally by
> >>     virtiofs. (Bernd Schubert)
> >>   * merge the GFP_NOFS patch [1] into this patch-set and use
> >>     memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
> >>     (Benjamin Coddington)
> >>
> >> v1: https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-houtao@huaweicloud.com/
> >>
> >> [1]: https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-houtao@huaweicloud.com/
> >>
> >> Hou Tao (6):
> >>   fuse: limit the length of ITER_KVEC dio by max_pages
> >>   virtiofs: move alloc/free of argbuf into separated helpers
> >>   virtiofs: factor out more common methods for argbuf
> >>   virtiofs: support bounce buffer backed by scattered pages
> >>   virtiofs: use scattered bounce buffer for ITER_KVEC dio
> >>   virtiofs: use GFP_NOFS when enqueuing request through kworker
> >>
> >>  fs/fuse/file.c      |  12 +-
> >>  fs/fuse/virtio_fs.c | 336 +++++++++++++++++++++++++++++++++++++-------
> >>  2 files changed, 296 insertions(+), 52 deletions(-)
> >>
> >> -- 
> >> 2.29.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio
  2024-04-22 20:06     ` Michael S. Tsirkin
@ 2024-04-22 21:11       ` Bernd Schubert
  2024-04-23 13:25       ` Hou Tao
  1 sibling, 0 replies; 20+ messages in thread
From: Bernd Schubert @ 2024-04-22 21:11 UTC (permalink / raw)
  To: Michael S. Tsirkin, Hou Tao
  Cc: linux-fsdevel, Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi,
	Matthew Wilcox, Benjamin Coddington, linux-kernel,
	virtualization, houtao1



On 4/22/24 22:06, Michael S. Tsirkin wrote:
> On Tue, Apr 09, 2024 at 09:48:08AM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 4/8/2024 3:45 PM, Michael S. Tsirkin wrote:
>>> On Wed, Feb 28, 2024 at 10:41:20PM +0800, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> Hi,
>>>>
>>>> The patch set aims to fix the warning related to an abnormal size
>>>> parameter of kmalloc() in virtiofs. The warning occurred when attempting
>>>> to insert a 10MB sized kernel module kept in a virtiofs with cache
>>>> disabled. As analyzed in patch #1, the root cause is that the length of
>>>> the read buffer is no limited, and the read buffer is passed directly to
>>>> virtiofs through out_args[0].value. Therefore patch #1 limits the
>>>> length of the read buffer passed to virtiofs by using max_pages. However
>>>> it is not enough, because now the maximal value of max_pages is 256.
>>>> Consequently, when reading a 10MB-sized kernel module, the length of the
>>>> bounce buffer in virtiofs will be 40 + (256 * 4096), and kmalloc will
>>>> try to allocate 2MB from memory subsystem. The request for 2MB of
>>>> physically contiguous memory significantly stress the memory subsystem
>>>> and may fail indefinitely on hosts with fragmented memory. To address
>>>> this, patch #2~#5 use scattered pages in a bio_vec to replace the
>>>> kmalloc-allocated bounce buffer when the length of the bounce buffer for
>>>> KVEC_ITER dio is larger than PAGE_SIZE. The final issue with the
>>>> allocation of the bounce buffer and sg array in virtiofs is that
>>>> GFP_ATOMIC is used even when the allocation occurs in a kworker context.
>>>> Therefore the last patch uses GFP_NOFS for the allocation of both sg
>>>> array and bounce buffer when initiated by the kworker. For more details,
>>>> please check the individual patches.
>>>>
>>>> As usual, comments are always welcome.
>>>>
>>>> Change Log:
>>> Bernd should I just merge the patchset as is?
>>> It seems to fix a real problem and no one has the
>>> time to work on a better fix .... WDYT?
>>
>> Sorry for the long delay. I am just start to prepare for v3. In v3, I
>> plan to avoid the unnecessary memory copy between fuse args and bio_vec.
>> Will post it before next week.
> 
> Didn't happen before this week apparently.

Hi Michael,

sorry for my later reply, I had been totally busy for the last weeks as
well. Also I can't decide to merge it - I'm not the official fuse
maintainer...
From my point of view, patch 1 is just missing to set the actual limit
and then would be clear and easy back-portable bug fix.
Not promised, I will try it out if I find a bit time tomorrow.


Bernd

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio
  2024-04-22 20:06     ` Michael S. Tsirkin
  2024-04-22 21:11       ` Bernd Schubert
@ 2024-04-23 13:25       ` Hou Tao
  1 sibling, 0 replies; 20+ messages in thread
From: Hou Tao @ 2024-04-23 13:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-fsdevel, Miklos Szeredi, Vivek Goyal, Stefan Hajnoczi,
	Bernd Schubert, Matthew Wilcox, Benjamin Coddington,
	linux-kernel, virtualization, houtao1



On 4/23/2024 4:06 AM, Michael S. Tsirkin wrote:
> On Tue, Apr 09, 2024 at 09:48:08AM +0800, Hou Tao wrote:
>> Hi,
>>
>> On 4/8/2024 3:45 PM, Michael S. Tsirkin wrote:
>>> On Wed, Feb 28, 2024 at 10:41:20PM +0800, Hou Tao wrote:
>>>> From: Hou Tao <houtao1@huawei.com>
>>>>
>>>> Hi,
>>>>
>>>> The patch set aims to fix the warning related to an abnormal size
>>>> parameter of kmalloc() in virtiofs. The warning occurred when attempting
>>>> to insert a 10MB sized kernel module kept in a virtiofs with cache
>>>> disabled. As analyzed in patch #1, the root cause is that the length of
>>>> the read buffer is no limited, and the read buffer is passed directly to
>>>> virtiofs through out_args[0].value. Therefore patch #1 limits the
>>>> length of the read buffer passed to virtiofs by using max_pages. However
>>>> it is not enough, because now the maximal value of max_pages is 256.
>>>> Consequently, when reading a 10MB-sized kernel module, the length of the
>>>> bounce buffer in virtiofs will be 40 + (256 * 4096), and kmalloc will
>>>> try to allocate 2MB from memory subsystem. The request for 2MB of
>>>> physically contiguous memory significantly stress the memory subsystem
>>>> and may fail indefinitely on hosts with fragmented memory. To address
>>>> this, patch #2~#5 use scattered pages in a bio_vec to replace the
>>>> kmalloc-allocated bounce buffer when the length of the bounce buffer for
>>>> KVEC_ITER dio is larger than PAGE_SIZE. The final issue with the
>>>> allocation of the bounce buffer and sg array in virtiofs is that
>>>> GFP_ATOMIC is used even when the allocation occurs in a kworker context.
>>>> Therefore the last patch uses GFP_NOFS for the allocation of both sg
>>>> array and bounce buffer when initiated by the kworker. For more details,
>>>> please check the individual patches.
>>>>
>>>> As usual, comments are always welcome.
>>>>
>>>> Change Log:
>>> Bernd should I just merge the patchset as is?
>>> It seems to fix a real problem and no one has the
>>> time to work on a better fix .... WDYT?
>> Sorry for the long delay. I am just start to prepare for v3. In v3, I
>> plan to avoid the unnecessary memory copy between fuse args and bio_vec.
>> Will post it before next week.
> Didn't happen before this week apparently.

Sorry for failing to make it this week. Being busy these two weeks. Hope
to send v3 out before the end of April.
>
>>>
>>>> v2:
>>>>   * limit the length of ITER_KVEC dio by max_pages instead of the
>>>>     newly-introduced max_nopage_rw. Using max_pages make the ITER_KVEC
>>>>     dio being consistent with other rw operations.
>>>>   * replace kmalloc-allocated bounce buffer by using a bounce buffer
>>>>     backed by scattered pages when the length of the bounce buffer for
>>>>     KVEC_ITER dio is larger than PAG_SIZE, so even on hosts with
>>>>     fragmented memory, the KVEC_ITER dio can be handled normally by
>>>>     virtiofs. (Bernd Schubert)
>>>>   * merge the GFP_NOFS patch [1] into this patch-set and use
>>>>     memalloc_nofs_{save|restore}+GFP_KERNEL instead of GFP_NOFS
>>>>     (Benjamin Coddington)
>>>>
>>>> v1: https://lore.kernel.org/linux-fsdevel/20240103105929.1902658-1-houtao@huaweicloud.com/
>>>>
>>>> [1]: https://lore.kernel.org/linux-fsdevel/20240105105305.4052672-1-houtao@huaweicloud.com/
>>>>
>>>> Hou Tao (6):
>>>>   fuse: limit the length of ITER_KVEC dio by max_pages
>>>>   virtiofs: move alloc/free of argbuf into separated helpers
>>>>   virtiofs: factor out more common methods for argbuf
>>>>   virtiofs: support bounce buffer backed by scattered pages
>>>>   virtiofs: use scattered bounce buffer for ITER_KVEC dio
>>>>   virtiofs: use GFP_NOFS when enqueuing request through kworker
>>>>
>>>>  fs/fuse/file.c      |  12 +-
>>>>  fs/fuse/virtio_fs.c | 336 +++++++++++++++++++++++++++++++++++++-------
>>>>  2 files changed, 296 insertions(+), 52 deletions(-)
>>>>
>>>> -- 
>>>> 2.29.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-04-23 13:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
2024-02-28 14:41 ` [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages Hou Tao
2024-03-01 13:42   ` Miklos Szeredi
2024-03-09  4:26     ` Hou Tao
2024-03-13 23:02       ` Bernd Schubert
2024-02-28 14:41 ` [PATCH v2 2/6] virtiofs: move alloc/free of argbuf into separated helpers Hou Tao
2024-02-28 14:41 ` [PATCH v2 3/6] virtiofs: factor out more common methods for argbuf Hou Tao
2024-03-01 14:24   ` Miklos Szeredi
2024-03-09  4:27     ` Hou Tao
2024-02-28 14:41 ` [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages Hou Tao
2024-02-29 15:01   ` Brian Foster
2024-03-09  4:14     ` Hou Tao
2024-03-13 12:14       ` Brian Foster
2024-02-28 14:41 ` [PATCH v2 5/6] virtiofs: use scattered bounce buffer for ITER_KVEC dio Hou Tao
2024-02-28 14:41 ` [PATCH v2 6/6] virtiofs: use GFP_NOFS when enqueuing request through kworker Hou Tao
2024-04-08  7:45 ` [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Michael S. Tsirkin
2024-04-09  1:48   ` Hou Tao
2024-04-22 20:06     ` Michael S. Tsirkin
2024-04-22 21:11       ` Bernd Schubert
2024-04-23 13:25       ` Hou Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).