All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hou Tao <houtao@huaweicloud.com>
To: linux-fsdevel@vger.kernel.org
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Vivek Goyal <vgoyal@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Bernd Schubert <bernd.schubert@fastmail.fm>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Benjamin Coddington <bcodding@redhat.com>,
	linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
	houtao1@huawei.com
Subject: [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages
Date: Wed, 28 Feb 2024 22:41:21 +0800	[thread overview]
Message-ID: <20240228144126.2864064-2-houtao@huaweicloud.com> (raw)
In-Reply-To: <20240228144126.2864064-1-houtao@huaweicloud.com>

From: Hou Tao <houtao1@huawei.com>

When trying to insert a 10MB kernel module kept in a virtio-fs with cache
disabled, the following warning was reported:

  ------------[ cut here ]------------
  WARNING: CPU: 2 PID: 439 at mm/page_alloc.c:4544 ......
  Modules linked in:
  CPU: 2 PID: 439 Comm: insmod Not tainted 6.7.0-rc7+ #33
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ......
  RIP: 0010:__alloc_pages+0x2c4/0x360
  ......
  Call Trace:
   <TASK>
   ? __warn+0x8f/0x150
   ? __alloc_pages+0x2c4/0x360
   __kmalloc_large_node+0x86/0x160
   __kmalloc+0xcd/0x140
   virtio_fs_enqueue_req+0x240/0x6d0
   virtio_fs_wake_pending_and_unlock+0x7f/0x190
   queue_request_and_unlock+0x58/0x70
   fuse_simple_request+0x18b/0x2e0
   fuse_direct_io+0x58a/0x850
   fuse_file_read_iter+0xdb/0x130
   __kernel_read+0xf3/0x260
   kernel_read+0x45/0x60
   kernel_read_file+0x1ad/0x2b0
   init_module_from_file+0x6a/0xe0
   idempotent_init_module+0x179/0x230
   __x64_sys_finit_module+0x5d/0xb0
   do_syscall_64+0x36/0xb0
   entry_SYSCALL_64_after_hwframe+0x6e/0x76
   ......
   </TASK>
  ---[ end trace 0000000000000000 ]---

The warning is triggered when:

1) inserting a 10MB sized kernel module kept in a virtiofs.
syscall finit_module() will handle the module insertion and it will
invoke kernel_read_file() to read the content of the module first.

2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and
passes it to kernel_read(). kernel_read() constructs a kvec iter by
using iov_iter_kvec() and passes it to fuse_file_read_iter().

3) virtio-fs disables the cache, so fuse_file_read_iter() invokes
fuse_direct_io(). As for now, the maximal read size for kvec iter is
only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so
fuse_direct_io() doesn't split the 10MB buffer. It saves the address and
the size of the 10MB-sized buffer in out_args[0] of a fuse request and
passes the fuse request to virtio_fs_wake_pending_and_unlock().

4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to
queue the request. Because the arguments in fuse request may be kept in
stack, so virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce
buffer for all fuse args, copies these args into the bounce buffer and
passed the physical address of the bounce buffer to virtiofsd. The total
length of these fuse args for the passed fuse request is about 10MB, so
copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter
and it triggers the warning in __alloc_pages():

	if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
		return NULL;

5) virtio_fs_enqueue_req() will retry the memory allocation in a
kworker, but it won't help, because kmalloc() will always return NULL
due to the abnormal size and finit_module() will hang forever.

A feasible solution is to limit the value of max_read for virtio-fs, so
the length passed to kmalloc() will be limited. However it will affect
the maximal read size for normal fuse read. And for virtio-fs write
initiated from kernel, it has the similar problem and now there is no
way to limit fc->max_write in kernel.

So instead of limiting both the values of max_read and max_write in
kernel, capping the maximal length of kvec iter IO by using max_pages in
fuse_direct_io() just like it does for ubuf/iovec iter IO. Now the max
value for max_pages is 256, so on host with 4KB page size, the maximal
size passed to kmalloc() in copy_args_to_argbuf() is about 1MB+40B. The
allocation of 2MB of physically contiguous memory will still incur
significant stress on the memory subsystem, but the warning is fixed.
Additionally, the requirement for huge physically contiguous memory will
be removed in the following patch.

Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/fuse/file.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 148a71b8b4d0e..f90ea25e366f0 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1423,6 +1423,16 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
 	return ret < 0 ? ret : 0;
 }
 
+static size_t fuse_max_dio_rw_size(const struct fuse_conn *fc,
+				   const struct iov_iter *iter, int write)
+{
+	unsigned int nmax = write ? fc->max_write : fc->max_read;
+
+	if (iov_iter_is_kvec(iter))
+		nmax = min(nmax, fc->max_pages << PAGE_SHIFT);
+	return nmax;
+}
+
 ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
 		       loff_t *ppos, int flags)
 {
@@ -1433,7 +1443,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
 	struct inode *inode = mapping->host;
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = ff->fm->fc;
-	size_t nmax = write ? fc->max_write : fc->max_read;
+	size_t nmax = fuse_max_dio_rw_size(fc, iter, write);
 	loff_t pos = *ppos;
 	size_t count = iov_iter_count(iter);
 	pgoff_t idx_from = pos >> PAGE_SHIFT;
-- 
2.29.2


  reply	other threads:[~2024-02-28 14:40 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 14:41 [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Hou Tao
2024-02-28 14:41 ` Hou Tao [this message]
2024-03-01 13:42   ` [PATCH v2 1/6] fuse: limit the length of ITER_KVEC dio by max_pages Miklos Szeredi
2024-03-09  4:26     ` Hou Tao
2024-03-13 23:02       ` Bernd Schubert
2024-02-28 14:41 ` [PATCH v2 2/6] virtiofs: move alloc/free of argbuf into separated helpers Hou Tao
2024-02-28 14:41 ` [PATCH v2 3/6] virtiofs: factor out more common methods for argbuf Hou Tao
2024-03-01 14:24   ` Miklos Szeredi
2024-03-09  4:27     ` Hou Tao
2024-02-28 14:41 ` [PATCH v2 4/6] virtiofs: support bounce buffer backed by scattered pages Hou Tao
2024-02-29 15:01   ` Brian Foster
2024-03-09  4:14     ` Hou Tao
2024-03-13 12:14       ` Brian Foster
2024-02-28 14:41 ` [PATCH v2 5/6] virtiofs: use scattered bounce buffer for ITER_KVEC dio Hou Tao
2024-02-28 14:41 ` [PATCH v2 6/6] virtiofs: use GFP_NOFS when enqueuing request through kworker Hou Tao
2024-04-08  7:45 ` [PATCH v2 0/6] virtiofs: fix the warning for ITER_KVEC dio Michael S. Tsirkin
2024-04-09  1:48   ` Hou Tao
2024-04-22 20:06     ` Michael S. Tsirkin
2024-04-22 21:11       ` Bernd Schubert
2024-04-23 13:25       ` Hou Tao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240228144126.2864064-2-houtao@huaweicloud.com \
    --to=houtao@huaweicloud.com \
    --cc=bcodding@redhat.com \
    --cc=bernd.schubert@fastmail.fm \
    --cc=houtao1@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mst@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.