io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Ahern <dsahern@kernel.org>
To: Pavel Begunkov <asml.silence@gmail.com>,
	io-uring@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	Willem de Bruijn <willemb@google.com>,
	Jens Axboe <axboe@kernel.dk>,
	kernel-team@fb.com
Subject: Re: [PATCH net-next v4 00/27] io_uring zerocopy send
Date: Sun, 17 Jul 2022 20:19:20 -0600	[thread overview]
Message-ID: <812c3233-1b64-8a0d-f820-26b98ff6642d@kernel.org> (raw)
In-Reply-To: <bc48e2bb-37ee-5b7c-5a97-01e026de2ba4@gmail.com>

On 7/14/22 12:55 PM, Pavel Begunkov wrote:
>>>>> You dropped comments about TCP testing; any progress there? If not,
>>>>> can
>>>>> you relay any issues you are hitting?
>>>>
>>>> Not really a problem, but for me it's bottle necked at NIC bandwidth
>>>> (~3GB/s) for both zc and non-zc and doesn't even nearly saturate a CPU.
>>>> Was actually benchmarked by my colleague quite a while ago, but can't
>>>> find numbers. Probably need to at least add localhost numbers or grab
>>>> a better server.
>>>
>>> Testing localhost TCP with a hack (see below), it doesn't include
>>> refcounting optimisations I was testing UDP with and that will be
>>> sent afterwards. Numbers are in MB/s
>>>
>>> IO size | non-zc    | zc
>>> 1200    | 4174      | 4148
>>> 4096    | 7597      | 11228
>>
>> I am surprised by the low numbers; you should be able to saturate a 100G
>> link with TCP and ZC TX API.
> 
> It was a quick test with my laptop, not a super fast CPU, preemptible
> kernel, etc., and considering that the fact that it processes receives
> from in the same send syscall roughly doubles the overhead, 87Gb/s
> looks ok. It's not like MSG_ZEROCOPY would look much different, even
> more to that all sends here will be executed sequentially in io_uring,
> so no extra parallelism or so. As for 1200, I think 4GB/s is reasonable,
> it's just the kernel overhead per byte is too high, should be same with
> just send(2).

?
It's a stream socket so those sends are coalesced into MTU sized packets.

> 
>>> Because it's localhost, we also spend cycles here for the recv side.
>>> Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
>>> omitted optimisations will somewhat help. I don't consider it to be a
>>> blocker. but would be interesting to poke into later. One thing helping
>>> non-zc is that it squeezes a number of requests into a single page
>>> whenever zerocopy adds a new frag for every request.
>>>
>>> Can't say anything new for larger payloads, I'm still NIC-bound but
>>> looking at CPU utilisation zc doesn't drain as much cycles as non-zc.
>>> Also, I don't remember if mentioned before, but another catch is that
>>> with TCP it expects users to not be flushing notifications too much,
>>> because it forces it to allocate a new skb and lose a good chunk of
>>> benefits from using TCP.
>>
>> I had issues with TCP sockets and io_uring at the end of 2020:
>> https://www.spinics.net/lists/io-uring/msg05125.html
>>
>> have not tried anything recent (from 2022).
> 
> Haven't seen it back then. In general io_uring doesn't stop submitting
> requests if one request fails, at least because we're trying to execute
> requests asynchronously. And in general, requests can get executed
> out of order, so most probably submitting a bunch of requests to a single
> TCP sock without any ordering on io_uring side is likely a bug.

TCP socket buffer fills resulting in a partial send (i.e, for a given
sqe submission only part of the write/send succeeded). io_uring was not
handling that case.

I'll try to find some time to resurrect the iperf3 patch and try top of
tree kernel.

> 
> You can link io_uring requests, i.e. IOSQE_IO_LINK, guaranteeing
> execution ordering. And if you meant links in the message, I agree
> that it was not the best decision to consider len < sqe->len not
> an error and not breaking links, but it was later added that
> MSG_WAITALL would also change the success condition to
> len==sqe->len. But all that is relevant if you was using linking.
> 


  reply	other threads:[~2022-07-18  2:19 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 11:49 [PATCH net-next v4 00/27] io_uring zerocopy send Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 01/27] ipv4: avoid partial copy for zc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 02/27] ipv6: " Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 03/27] skbuff: don't mix ubuf_info from different sources Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 04/27] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 05/27] skbuff: carry external ubuf_info in msghdr Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 06/27] net: Allow custom iter handler " Pavel Begunkov
2022-07-11 12:20   ` Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 07/27] net: introduce managed frags infrastructure Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 08/27] net: introduce __skb_fill_page_desc_noacc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 09/27] ipv4/udp: support externally provided ubufs Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 10/27] ipv6/udp: " Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 11/27] tcp: " Pavel Begunkov
2022-07-08  4:06   ` David Ahern
2022-07-08 14:03     ` Pavel Begunkov
2022-07-13 23:38       ` David Ahern
2022-07-07 11:49 ` [PATCH net-next v4 12/27] io_uring: initialise msghdr::msg_ubuf Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 13/27] io_uring: export io_put_task() Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 14/27] io_uring: add zc notification infrastructure Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 15/27] io_uring: cache struct io_notif Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 16/27] io_uring: complete notifiers in tw Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 17/27] io_uring: add rsrc referencing for notifiers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 18/27] io_uring: add notification slot registration Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 19/27] io_uring: wire send zc request type Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 20/27] io_uring: account locked pages for non-fixed zc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 21/27] io_uring: allow to pass addr into sendzc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 22/27] io_uring: sendzc with fixed buffers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 23/27] io_uring: flush notifiers after sendzc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 24/27] io_uring: rename IORING_OP_FILES_UPDATE Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 25/27] io_uring: add zc notification flush requests Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 26/27] io_uring: enable managed frags with register buffers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 27/27] selftests/io_uring: test zerocopy send Pavel Begunkov
2022-07-08  4:10 ` [PATCH net-next v4 00/27] io_uring " David Ahern
2022-07-08 14:26   ` Pavel Begunkov
2022-07-11 12:56     ` Pavel Begunkov
2022-07-13 23:45       ` David Ahern
2022-07-14 18:55         ` Pavel Begunkov
2022-07-18  2:19           ` David Ahern [this message]
2022-07-20 13:32             ` Pavel Begunkov
2022-07-24 18:28             ` David Ahern
2022-07-27 10:51               ` Pavel Begunkov
2022-07-29 22:30                 ` David Ahern
2022-09-26 20:08               ` Pavel Begunkov
2022-09-28 19:31                 ` David Ahern
2022-09-28 20:11                   ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=812c3233-1b64-8a0d-f820-26b98ff6642d@kernel.org \
    --to=dsahern@kernel.org \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=io-uring@vger.kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).