All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 00/19] io_uring zerocopy tx
@ 2021-12-21 15:35 Pavel Begunkov
  2021-12-21 15:35 ` [RFC v2 01/19] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
                   ` (19 more replies)
  0 siblings, 20 replies; 25+ messages in thread
From: Pavel Begunkov @ 2021-12-21 15:35 UTC (permalink / raw)
  To: io-uring, netdev, linux-kernel
  Cc: Jakub Kicinski, Jonathan Lemon, David S . Miller,
	Willem de Bruijn, Eric Dumazet, David Ahern, Jens Axboe,
	Pavel Begunkov

Update on io_uring zerocopy tx, still RFC. For v1 and design notes see

https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/

Absolute numbers (against dummy) got higher since v1, + ~10-12% requests/s for
the peak performance case. 5/19 brought a couple of percents, but most of it
came with 8/19 and 9/19 (+8-11% in numbers, 5-7% in profiles). It will also
be needed in the future for p2p. Any reason not to do alike for paged non-zc?
Small (under 100-150B) packets?

Most of checks are removed from non-zc paths. Implemented a bit trickier in
__ip_append_data(), but considering already existing assumptions around "from"
argument it should be fine.

Benchmarks for dummy netdev, UDP/IPv4, payload size=4096:
 -n<N> is how many requests we submit per syscall. From io_uring perspective -n1
       is wasteful and far from optimal, but included for comparison.
 -z0   disables zerocopy, just normal io_uring send requests
 -f    makes to flush "buffer free" notifications for every request

                        | K reqs/s | speedup
msg_zerocopy (non-zc)   | 1120     | 1.12
msg_zerocopy (zc)       | 997      | 1
io_uring -n1 -z0        | 1469     | 1.47
io_uring -n8 -z0        | 1780     | 1.78
io_uring -n1 -f         | 1688     | 1.69
io_uring -n1            | 1774     | 1.77
io_uring -n8 -f         | 2075     | 2.08
io_uring -n8            | 2265     | 2.27

note: it might be not too interesting to compare zc vs non-zc, the performance
relative difference can be shifted in favour of zerocopy by cutting constant
per-request overhead, and there are easy ways of doing that, e.g. by compiling
out unused features. Even more true for the table below as there was additional
noise taking a good quarter of CPU cycles.

Some data for UDP/IPv6 between a pair of NICs. 9/19 wasn't there at the time of
testing. All tests are CPU bound and so as expected reqs/s for zerocopy doesn't
vary much between different payload sizes. io_uring to msg_zerocopy ratio is not
too representative for reasons similar to described above.

payload | test                   | K reqs/s
___________________________________________ 
 8192   | io_uring -n8 (dummy)   | 599
        | io_uring -n1 -z0       | 264
        | io_uring -n8 -z0       | 302
        | msg_zerocopy           | 248
        | msg_zerocopy -z        | 183
        | io_uring -n1 -f        | 306
        | io_uring -n1           | 318
        | io_uring -n8 -f        | 373
        | io_uring -n8           | 401

 4096   | io_uring -n8 (dummy)   | 601
        | io_uring -n1 -z0       | 303
        | io_uring -n8 -z0       | 366
        | msg_zerocopy           | 278
        | msg_zerocopy -z        | 187
        | io_uring -n1 -f        | 317
        | io_uring -n1           | 325
        | io_uring -n8 -f        | 387
        | io_uring -n8           | 405

 1024   | io_uring -n8 (dummy)   | 601
        | io_uring -n1 -z0       | 329
        | io_uring -n8 -z0       | 407
        | msg_zerocopy           | 301
        | msg_zerocopy -z        | 186
        | io_uring -n1 -f        | 317
        | io_uring -n1           | 327
        | io_uring -n8 -f        | 390
        | io_uring -n8           | 403

 512    | io_uring -n8 (dummy)   | 601
        | io_uring -n1 -z0       | 340
        | io_uring -n8 -z0       | 417
        | msg_zerocopy           | 310
        | msg_zerocopy -z        | 186
        | io_uring -n1 -f        | 317
        | io_uring -n1           | 328
        | io_uring -n8 -f        | 392
        | io_uring -n8           | 406

 128    | io_uring -n8 (dummy)   | 602
        | io_uring -n1 -z0       | 341
        | io_uring -n8 -z0       | 428
        | msg_zerocopy           | 317
        | msg_zerocopy -z        | 188
        | io_uring -n1 -f        | 318
        | io_uring -n1           | 331
        | io_uring -n8 -f        | 391
        | io_uring -n8           | 408

https://github.com/isilence/linux/tree/zc_v2
https://github.com/isilence/liburing/tree/zc_v2

The Benchmark is <liburing>/test/send-zc,

send-zc [-f] [-n<N>] [-z0] -s<payload size> -D<dst ip> (-6|-4) [-t<sec>] udp

As a server you can use msg_zerocopy from in kernel's selftests, or a copy of
it at <liburing>/test/msg_zerocopy. No server is needed for dummy testing.

dummy setup:
sudo ip li add dummy0 type dummy && sudo ip li set dummy0 up mtu 65536
# make traffic for the specified IP to go through dummy0
sudo ip route add <ip_address> dev dummy0

v2: remove additional overhead for non-zc from skb_release_data() (Jonathan)
    avoid msg propagation, hide extra bits of non-zc overhead
    task_work based "buffer free" notifications
    improve io_uring's notification refcounting
    added 5/19, (no pfmemalloc tracking)
    added 8/19 and 9/19 preventing small copies with zc
    misc small changes

Pavel Begunkov (19):
  skbuff: add SKBFL_DONT_ORPHAN flag
  skbuff: pass a struct ubuf_info in msghdr
  net: add zerocopy_sg_from_iter for bvec
  net: optimise page get/free for bvec zc
  net: don't track pfmemalloc for zc registered mem
  ipv4/udp: add support msgdr::msg_ubuf
  ipv6/udp: add support msgdr::msg_ubuf
  ipv4: avoid partial copy for zc
  ipv6: avoid partial copy for zc
  io_uring: add send notifiers registration
  io_uring: infrastructure for send zc notifications
  io_uring: wire send zc request type
  io_uring: add an option to flush zc notifications
  io_uring: opcode independent fixed buf import
  io_uring: sendzc with fixed buffers
  io_uring: cache struct ubuf_info
  io_uring: unclog ctx refs waiting with zc notifiers
  io_uring: task_work for notification delivery
  io_uring: optimise task referencing by notifiers

 fs/io_uring.c                 | 440 +++++++++++++++++++++++++++++++++-
 include/linux/skbuff.h        |  46 ++--
 include/linux/socket.h        |   1 +
 include/uapi/linux/io_uring.h |  14 ++
 net/compat.c                  |   1 +
 net/core/datagram.c           |  58 +++++
 net/core/skbuff.c             |  16 +-
 net/ipv4/ip_output.c          |  55 +++--
 net/ipv6/ip6_output.c         |  54 ++++-
 net/socket.c                  |   3 +
 10 files changed, 633 insertions(+), 55 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-01-12 16:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-21 15:35 [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 01/19] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2022-01-11 13:51   ` Hao Xu
2022-01-11 15:50     ` Pavel Begunkov
2022-01-12  3:39       ` Hao Xu
2022-01-12 16:53         ` Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 03/19] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 04/19] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 05/19] net: don't track pfmemalloc for zc registered mem Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 06/19] ipv4/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 07/19] ipv6/udp: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 08/19] ipv4: avoid partial copy for zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 09/19] ipv6: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 10/19] io_uring: add send notifiers registration Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 11/19] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 12/19] io_uring: wire send zc request type Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 13/19] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 14/19] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 15/19] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 16/19] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 17/19] io_uring: unclog ctx refs waiting with zc notifiers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 18/19] io_uring: task_work for notification delivery Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 19/19] io_uring: optimise task referencing by notifiers Pavel Begunkov
2021-12-21 15:43 ` [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.