From: Pavel Begunkov <asml.silence@gmail.com>
To: io-uring@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
Cc: Jakub Kicinski <kuba@kernel.org>,
Jonathan Lemon <jonathan.lemon@gmail.com>,
"David S . Miller" <davem@davemloft.net>,
Willem de Bruijn <willemb@google.com>,
Eric Dumazet <edumazet@google.com>,
David Ahern <dsahern@kernel.org>, Jens Axboe <axboe@kernel.dk>,
Pavel Begunkov <asml.silence@gmail.com>
Subject: [RFC v2 00/19] io_uring zerocopy tx
Date: Tue, 21 Dec 2021 15:35:22 +0000 [thread overview]
Message-ID: <cover.1640029579.git.asml.silence@gmail.com> (raw)
Update on io_uring zerocopy tx, still RFC. For v1 and design notes see
https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/
Absolute numbers (against dummy) got higher since v1, + ~10-12% requests/s for
the peak performance case. 5/19 brought a couple of percents, but most of it
came with 8/19 and 9/19 (+8-11% in numbers, 5-7% in profiles). It will also
be needed in the future for p2p. Any reason not to do alike for paged non-zc?
Small (under 100-150B) packets?
Most of checks are removed from non-zc paths. Implemented a bit trickier in
__ip_append_data(), but considering already existing assumptions around "from"
argument it should be fine.
Benchmarks for dummy netdev, UDP/IPv4, payload size=4096:
-n<N> is how many requests we submit per syscall. From io_uring perspective -n1
is wasteful and far from optimal, but included for comparison.
-z0 disables zerocopy, just normal io_uring send requests
-f makes to flush "buffer free" notifications for every request
| K reqs/s | speedup
msg_zerocopy (non-zc) | 1120 | 1.12
msg_zerocopy (zc) | 997 | 1
io_uring -n1 -z0 | 1469 | 1.47
io_uring -n8 -z0 | 1780 | 1.78
io_uring -n1 -f | 1688 | 1.69
io_uring -n1 | 1774 | 1.77
io_uring -n8 -f | 2075 | 2.08
io_uring -n8 | 2265 | 2.27
note: it might be not too interesting to compare zc vs non-zc, the performance
relative difference can be shifted in favour of zerocopy by cutting constant
per-request overhead, and there are easy ways of doing that, e.g. by compiling
out unused features. Even more true for the table below as there was additional
noise taking a good quarter of CPU cycles.
Some data for UDP/IPv6 between a pair of NICs. 9/19 wasn't there at the time of
testing. All tests are CPU bound and so as expected reqs/s for zerocopy doesn't
vary much between different payload sizes. io_uring to msg_zerocopy ratio is not
too representative for reasons similar to described above.
payload | test | K reqs/s
___________________________________________
8192 | io_uring -n8 (dummy) | 599
| io_uring -n1 -z0 | 264
| io_uring -n8 -z0 | 302
| msg_zerocopy | 248
| msg_zerocopy -z | 183
| io_uring -n1 -f | 306
| io_uring -n1 | 318
| io_uring -n8 -f | 373
| io_uring -n8 | 401
4096 | io_uring -n8 (dummy) | 601
| io_uring -n1 -z0 | 303
| io_uring -n8 -z0 | 366
| msg_zerocopy | 278
| msg_zerocopy -z | 187
| io_uring -n1 -f | 317
| io_uring -n1 | 325
| io_uring -n8 -f | 387
| io_uring -n8 | 405
1024 | io_uring -n8 (dummy) | 601
| io_uring -n1 -z0 | 329
| io_uring -n8 -z0 | 407
| msg_zerocopy | 301
| msg_zerocopy -z | 186
| io_uring -n1 -f | 317
| io_uring -n1 | 327
| io_uring -n8 -f | 390
| io_uring -n8 | 403
512 | io_uring -n8 (dummy) | 601
| io_uring -n1 -z0 | 340
| io_uring -n8 -z0 | 417
| msg_zerocopy | 310
| msg_zerocopy -z | 186
| io_uring -n1 -f | 317
| io_uring -n1 | 328
| io_uring -n8 -f | 392
| io_uring -n8 | 406
128 | io_uring -n8 (dummy) | 602
| io_uring -n1 -z0 | 341
| io_uring -n8 -z0 | 428
| msg_zerocopy | 317
| msg_zerocopy -z | 188
| io_uring -n1 -f | 318
| io_uring -n1 | 331
| io_uring -n8 -f | 391
| io_uring -n8 | 408
https://github.com/isilence/linux/tree/zc_v2
https://github.com/isilence/liburing/tree/zc_v2
The Benchmark is <liburing>/test/send-zc,
send-zc [-f] [-n<N>] [-z0] -s<payload size> -D<dst ip> (-6|-4) [-t<sec>] udp
As a server you can use msg_zerocopy from in kernel's selftests, or a copy of
it at <liburing>/test/msg_zerocopy. No server is needed for dummy testing.
dummy setup:
sudo ip li add dummy0 type dummy && sudo ip li set dummy0 up mtu 65536
# make traffic for the specified IP to go through dummy0
sudo ip route add <ip_address> dev dummy0
v2: remove additional overhead for non-zc from skb_release_data() (Jonathan)
avoid msg propagation, hide extra bits of non-zc overhead
task_work based "buffer free" notifications
improve io_uring's notification refcounting
added 5/19, (no pfmemalloc tracking)
added 8/19 and 9/19 preventing small copies with zc
misc small changes
Pavel Begunkov (19):
skbuff: add SKBFL_DONT_ORPHAN flag
skbuff: pass a struct ubuf_info in msghdr
net: add zerocopy_sg_from_iter for bvec
net: optimise page get/free for bvec zc
net: don't track pfmemalloc for zc registered mem
ipv4/udp: add support msgdr::msg_ubuf
ipv6/udp: add support msgdr::msg_ubuf
ipv4: avoid partial copy for zc
ipv6: avoid partial copy for zc
io_uring: add send notifiers registration
io_uring: infrastructure for send zc notifications
io_uring: wire send zc request type
io_uring: add an option to flush zc notifications
io_uring: opcode independent fixed buf import
io_uring: sendzc with fixed buffers
io_uring: cache struct ubuf_info
io_uring: unclog ctx refs waiting with zc notifiers
io_uring: task_work for notification delivery
io_uring: optimise task referencing by notifiers
fs/io_uring.c | 440 +++++++++++++++++++++++++++++++++-
include/linux/skbuff.h | 46 ++--
include/linux/socket.h | 1 +
include/uapi/linux/io_uring.h | 14 ++
net/compat.c | 1 +
net/core/datagram.c | 58 +++++
net/core/skbuff.c | 16 +-
net/ipv4/ip_output.c | 55 +++--
net/ipv6/ip6_output.c | 54 ++++-
net/socket.c | 3 +
10 files changed, 633 insertions(+), 55 deletions(-)
--
2.34.1
next reply other threads:[~2021-12-21 15:35 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-21 15:35 Pavel Begunkov [this message]
2021-12-21 15:35 ` [RFC v2 01/19] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2022-01-11 13:51 ` Hao Xu
2022-01-11 15:50 ` Pavel Begunkov
2022-01-12 3:39 ` Hao Xu
2022-01-12 16:53 ` Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 03/19] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 04/19] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 05/19] net: don't track pfmemalloc for zc registered mem Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 06/19] ipv4/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 07/19] ipv6/udp: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 08/19] ipv4: avoid partial copy for zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 09/19] ipv6: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 10/19] io_uring: add send notifiers registration Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 11/19] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 12/19] io_uring: wire send zc request type Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 13/19] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 14/19] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 15/19] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 16/19] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 17/19] io_uring: unclog ctx refs waiting with zc notifiers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 18/19] io_uring: task_work for notification delivery Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 19/19] io_uring: optimise task referencing by notifiers Pavel Begunkov
2021-12-21 15:43 ` [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1640029579.git.asml.silence@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=io-uring@vger.kernel.org \
--cc=jonathan.lemon@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).