From: Christian Schoenebeck <linux_oss@crudebyte.com>
To: v9fs-developer@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
Dominique Martinet <asmadeus@codewreck.org>,
Eric Van Hensbergen <ericvh@gmail.com>,
Latchesar Ionkov <lucho@ionkov.net>,
Nikolay Kichukov <nikolay@oldum.net>
Subject: [PATCH v5 00/11] remove msize limit in virtio transport
Date: Tue, 12 Jul 2022 16:35:54 +0200 [thread overview]
Message-ID: <cover.1657636554.git.linux_oss@crudebyte.com> (raw)
This series aims to get get rid of the current 500k 'msize' limitation in
the 9p virtio transport, which is currently a bottleneck for performance
of 9p mounts.
To avoid confusion: it does remove the msize limit for the virtio transport,
on 9p client level though the anticipated milestone for this series is now
a max. 'msize' of 4 MB. See patch 7 for reason why.
This is a follow-up of the following series and discussion:
https://lore.kernel.org/all/cover.1640870037.git.linux_oss@crudebyte.com/
Latest version of this series:
https://github.com/cschoenebeck/linux/commits/9p-virtio-drop-msize-cap
OVERVIEW OF PATCHES:
* Patches 1..6 remove the msize limitation from the 'virtio' transport
(i.e. the 9p 'virtio' transport itself actually supports >4MB now, tested
successfully with an experimental QEMU version and some dirty 9p Linux
client hacks up to msize=128MB).
* Patch 7 limits msize for all transports to 4 MB for now as >4MB would need
more work on 9p client level (see commit log of patch 7 for details).
* Patches 8..11 tremendously reduce unnecessarily huge 9p message sizes and
therefore provide performance gain as well. So far, almost all 9p messages
simply allocated message buffers exactly msize large, even for messages
that actually just needed few bytes. So these patches make sense by
themselves, independent of this overall series, however for this series
even more, because the larger msize, the more this issue would have hurt
otherwise.
PREREQUISITES:
If you are testing with QEMU then please either use QEMU 6.2 or higher, or
at least apply the following patch on QEMU side:
https://lore.kernel.org/qemu-devel/E1mT2Js-0000DW-OH@lizzy.crudebyte.com/
That QEMU patch is required if you are using a user space app that
automatically retrieves an optimum I/O block size by obeying stat's
st_blksize, which 'cat' for instance is doing, e.g.:
time cat test_rnd.dat > /dev/null
Otherwise please use a user space app for performance testing that allows
you to force a large block size and to avoid that QEMU issue, like 'dd'
for instance, in that case you don't need to patch QEMU.
KNOWN LIMITATION:
With this series applied I can run
QEMU host <-> 9P virtio <-> Linux guest
with up to slightly below 4 MB msize [4186112 = (1024-2) * 4096]. If I try
to run it with exactly 4 MB (4194304) it currently hits a limitation on
QEMU side:
qemu-system-x86_64: virtio: too many write descriptors in indirect table
That's because QEMU currently has a hard coded limit of max. 1024 virtio
descriptors per vring slot (i.e. per virtio message), see to do (1.) below.
STILL TO DO:
1. Negotiating virtio "Queue Indirect Size" (MANDATORY):
The QEMU issue described above must be addressed by negotiating the
maximum length of virtio indirect descriptor tables on virtio device
initialization. This would not only avoid the QEMU error above, but would
also allow msize of >4MB in future. Before that change can be done on
Linux and QEMU sides though, it first requires a change to the virtio
specs. Work on that on the virtio specs is in progress:
https://github.com/oasis-tcs/virtio-spec/issues/122
This is not really an issue for testing this series. Just stick to max.
msize=4186112 as described above and you will be fine. However for the
final PR this should obviously be addressed in a clean way.
2. Reduce readdir buffer sizes (optional - maybe later):
This series already reduced the message buffers for most 9p message
types. This does not include Treaddir though yet, which is still simply
using msize. It would make sense to benchmark first whether this is
actually an issue that hurts. If it does, then one might use already
existing vfs knowledge to estimate the Treaddir size, or starting with
some reasonable hard coded small Treaddir size first and then increasing
it just on the 2nd Treaddir request if there are more directory entries
to fetch.
3. Add more buffer caches (optional - maybe later):
p9_fcall_init() uses kmem_cache_alloc() instead of kmalloc() for very
large buffers to reduce latency waiting for memory allocation to
complete. Currently it does that only if the requested buffer size is
exactly msize large. As patch 10 already divided the 9p message types
into few message size categories, maybe it would make sense to use e.g.
4 separate caches for those memory category (e.g. 4k, 8k, msize/2,
msize). Might be worth a benchmark test.
Testing and feedback appreciated!
v4 -> v5:
* Exclude RDMA transport from buffer size reduction. [patch 11]
Christian Schoenebeck (11):
9p/trans_virtio: separate allocation of scatter gather list
9p/trans_virtio: turn amount of sg lists into runtime info
9p/trans_virtio: introduce struct virtqueue_sg
net/9p: add trans_maxsize to struct p9_client
9p/trans_virtio: support larger msize values
9p/trans_virtio: resize sg lists to whatever is possible
net/9p: limit 'msize' to KMALLOC_MAX_SIZE for all transports
net/9p: split message size argument into 't_size' and 'r_size' pair
9p: add P9_ERRMAX for 9p2000 and 9p2000.u
net/9p: add p9_msg_buf_size()
net/9p: allocate appropriate reduced message buffers
include/net/9p/9p.h | 3 +
include/net/9p/client.h | 2 +
net/9p/client.c | 68 +++++++--
net/9p/protocol.c | 154 ++++++++++++++++++++
net/9p/protocol.h | 2 +
net/9p/trans_virtio.c | 304 +++++++++++++++++++++++++++++++++++-----
6 files changed, 484 insertions(+), 49 deletions(-)
--
2.30.2
next reply other threads:[~2022-07-12 15:55 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-12 14:35 Christian Schoenebeck [this message]
2022-07-12 14:31 ` [PATCH v5 01/11] 9p/trans_virtio: separate allocation of scatter gather list Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 02/11] 9p/trans_virtio: turn amount of sg lists into runtime info Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 03/11] 9p/trans_virtio: introduce struct virtqueue_sg Christian Schoenebeck
2022-07-12 20:33 ` Dominique Martinet
2022-07-13 9:14 ` Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 04/11] net/9p: add trans_maxsize to struct p9_client Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 05/11] 9p/trans_virtio: support larger msize values Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 06/11] 9p/trans_virtio: resize sg lists to whatever is possible Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 07/11] net/9p: limit 'msize' to KMALLOC_MAX_SIZE for all transports Christian Schoenebeck
2022-07-12 20:38 ` Dominique Martinet
2022-07-12 14:31 ` [PATCH v5 08/11] net/9p: split message size argument into 't_size' and 'r_size' pair Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 09/11] 9p: add P9_ERRMAX for 9p2000 and 9p2000.u Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 10/11] net/9p: add p9_msg_buf_size() Christian Schoenebeck
2022-07-13 10:29 ` Dominique Martinet
2022-07-13 13:06 ` Christian Schoenebeck
2022-07-13 20:52 ` Dominique Martinet
2022-07-14 13:14 ` Christian Schoenebeck
2022-07-12 14:31 ` [PATCH v5 11/11] net/9p: allocate appropriate reduced message buffers Christian Schoenebeck
2022-07-12 19:33 ` Dominique Martinet
2022-07-12 21:11 ` [V9fs-developer] " Dominique Martinet
2022-07-13 9:19 ` Christian Schoenebeck
2022-07-13 9:29 ` Christian Schoenebeck
2022-07-13 9:56 ` Dominique Martinet
2022-07-13 9:29 ` Dominique Martinet
2022-07-13 10:22 ` Christian Schoenebeck
2022-07-12 21:13 ` [PATCH v5 00/11] remove msize limit in virtio transport Dominique Martinet
2022-07-13 8:54 ` Christian Schoenebeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1657636554.git.linux_oss@crudebyte.com \
--to=linux_oss@crudebyte.com \
--cc=asmadeus@codewreck.org \
--cc=ericvh@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lucho@ionkov.net \
--cc=netdev@vger.kernel.org \
--cc=nikolay@oldum.net \
--cc=v9fs-developer@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.