linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: io-uring@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Jakub Kicinski <kuba@kernel.org>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Willem de Bruijn <willemb@google.com>,
	Eric Dumazet <edumazet@google.com>,
	David Ahern <dsahern@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Pavel Begunkov <asml.silence@gmail.com>
Subject: [RFC v2 00/19] io_uring zerocopy tx
Date: Tue, 21 Dec 2021 15:35:22 +0000	[thread overview]
Message-ID: <cover.1640029579.git.asml.silence@gmail.com> (raw)

Update on io_uring zerocopy tx, still RFC. For v1 and design notes see

https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/

Absolute numbers (against dummy) got higher since v1, + ~10-12% requests/s for
the peak performance case. 5/19 brought a couple of percents, but most of it
came with 8/19 and 9/19 (+8-11% in numbers, 5-7% in profiles). It will also
be needed in the future for p2p. Any reason not to do alike for paged non-zc?
Small (under 100-150B) packets?

Most of checks are removed from non-zc paths. Implemented a bit trickier in
__ip_append_data(), but considering already existing assumptions around "from"
argument it should be fine.

Benchmarks for dummy netdev, UDP/IPv4, payload size=4096:
 -n<N> is how many requests we submit per syscall. From io_uring perspective -n1
       is wasteful and far from optimal, but included for comparison.
 -z0   disables zerocopy, just normal io_uring send requests
 -f    makes to flush "buffer free" notifications for every request

                        | K reqs/s | speedup
msg_zerocopy (non-zc)   | 1120     | 1.12
msg_zerocopy (zc)       | 997      | 1
io_uring -n1 -z0        | 1469     | 1.47
io_uring -n8 -z0        | 1780     | 1.78
io_uring -n1 -f         | 1688     | 1.69
io_uring -n1            | 1774     | 1.77
io_uring -n8 -f         | 2075     | 2.08
io_uring -n8            | 2265     | 2.27

note: it might be not too interesting to compare zc vs non-zc, the performance
relative difference can be shifted in favour of zerocopy by cutting constant
per-request overhead, and there are easy ways of doing that, e.g. by compiling
out unused features. Even more true for the table below as there was additional
noise taking a good quarter of CPU cycles.

Some data for UDP/IPv6 between a pair of NICs. 9/19 wasn't there at the time of
testing. All tests are CPU bound and so as expected reqs/s for zerocopy doesn't
vary much between different payload sizes. io_uring to msg_zerocopy ratio is not
too representative for reasons similar to described above.

payload | test                   | K reqs/s
___________________________________________ 
 8192   | io_uring -n8 (dummy)   | 599
        | io_uring -n1 -z0       | 264
        | io_uring -n8 -z0       | 302
        | msg_zerocopy           | 248
        | msg_zerocopy -z        | 183
        | io_uring -n1 -f        | 306
        | io_uring -n1           | 318
        | io_uring -n8 -f        | 373
        | io_uring -n8           | 401

 4096   | io_uring -n8 (dummy)   | 601
        | io_uring -n1 -z0       | 303
        | io_uring -n8 -z0       | 366
        | msg_zerocopy           | 278
        | msg_zerocopy -z        | 187
        | io_uring -n1 -f        | 317
        | io_uring -n1           | 325
        | io_uring -n8 -f        | 387
        | io_uring -n8           | 405

 1024   | io_uring -n8 (dummy)   | 601
        | io_uring -n1 -z0       | 329
        | io_uring -n8 -z0       | 407
        | msg_zerocopy           | 301
        | msg_zerocopy -z        | 186
        | io_uring -n1 -f        | 317
        | io_uring -n1           | 327
        | io_uring -n8 -f        | 390
        | io_uring -n8           | 403

 512    | io_uring -n8 (dummy)   | 601
        | io_uring -n1 -z0       | 340
        | io_uring -n8 -z0       | 417
        | msg_zerocopy           | 310
        | msg_zerocopy -z        | 186
        | io_uring -n1 -f        | 317
        | io_uring -n1           | 328
        | io_uring -n8 -f        | 392
        | io_uring -n8           | 406

 128    | io_uring -n8 (dummy)   | 602
        | io_uring -n1 -z0       | 341
        | io_uring -n8 -z0       | 428
        | msg_zerocopy           | 317
        | msg_zerocopy -z        | 188
        | io_uring -n1 -f        | 318
        | io_uring -n1           | 331
        | io_uring -n8 -f        | 391
        | io_uring -n8           | 408

https://github.com/isilence/linux/tree/zc_v2
https://github.com/isilence/liburing/tree/zc_v2

The Benchmark is <liburing>/test/send-zc,

send-zc [-f] [-n<N>] [-z0] -s<payload size> -D<dst ip> (-6|-4) [-t<sec>] udp

As a server you can use msg_zerocopy from in kernel's selftests, or a copy of
it at <liburing>/test/msg_zerocopy. No server is needed for dummy testing.

dummy setup:
sudo ip li add dummy0 type dummy && sudo ip li set dummy0 up mtu 65536
# make traffic for the specified IP to go through dummy0
sudo ip route add <ip_address> dev dummy0

v2: remove additional overhead for non-zc from skb_release_data() (Jonathan)
    avoid msg propagation, hide extra bits of non-zc overhead
    task_work based "buffer free" notifications
    improve io_uring's notification refcounting
    added 5/19, (no pfmemalloc tracking)
    added 8/19 and 9/19 preventing small copies with zc
    misc small changes

Pavel Begunkov (19):
  skbuff: add SKBFL_DONT_ORPHAN flag
  skbuff: pass a struct ubuf_info in msghdr
  net: add zerocopy_sg_from_iter for bvec
  net: optimise page get/free for bvec zc
  net: don't track pfmemalloc for zc registered mem
  ipv4/udp: add support msgdr::msg_ubuf
  ipv6/udp: add support msgdr::msg_ubuf
  ipv4: avoid partial copy for zc
  ipv6: avoid partial copy for zc
  io_uring: add send notifiers registration
  io_uring: infrastructure for send zc notifications
  io_uring: wire send zc request type
  io_uring: add an option to flush zc notifications
  io_uring: opcode independent fixed buf import
  io_uring: sendzc with fixed buffers
  io_uring: cache struct ubuf_info
  io_uring: unclog ctx refs waiting with zc notifiers
  io_uring: task_work for notification delivery
  io_uring: optimise task referencing by notifiers

 fs/io_uring.c                 | 440 +++++++++++++++++++++++++++++++++-
 include/linux/skbuff.h        |  46 ++--
 include/linux/socket.h        |   1 +
 include/uapi/linux/io_uring.h |  14 ++
 net/compat.c                  |   1 +
 net/core/datagram.c           |  58 +++++
 net/core/skbuff.c             |  16 +-
 net/ipv4/ip_output.c          |  55 +++--
 net/ipv6/ip6_output.c         |  54 ++++-
 net/socket.c                  |   3 +
 10 files changed, 633 insertions(+), 55 deletions(-)

-- 
2.34.1


             reply	other threads:[~2021-12-21 15:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-21 15:35 Pavel Begunkov [this message]
2021-12-21 15:35 ` [RFC v2 01/19] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2022-01-11 13:51   ` Hao Xu
2022-01-11 15:50     ` Pavel Begunkov
2022-01-12  3:39       ` Hao Xu
2022-01-12 16:53         ` Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 03/19] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 04/19] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 05/19] net: don't track pfmemalloc for zc registered mem Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 06/19] ipv4/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 07/19] ipv6/udp: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 08/19] ipv4: avoid partial copy for zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 09/19] ipv6: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 10/19] io_uring: add send notifiers registration Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 11/19] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 12/19] io_uring: wire send zc request type Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 13/19] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 14/19] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 15/19] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 16/19] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 17/19] io_uring: unclog ctx refs waiting with zc notifiers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 18/19] io_uring: task_work for notification delivery Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 19/19] io_uring: optimise task referencing by notifiers Pavel Begunkov
2021-12-21 15:43 ` [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1640029579.git.asml.silence@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=io-uring@vger.kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).