All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: David Ahern <dsahern@kernel.org>,
	io-uring@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	Willem de Bruijn <willemb@google.com>,
	Jens Axboe <axboe@kernel.dk>,
	kernel-team@fb.com
Subject: Re: [PATCH net-next v4 00/27] io_uring zerocopy send
Date: Wed, 20 Jul 2022 14:32:12 +0100	[thread overview]
Message-ID: <6ff5f766-61ef-ae40-aea3-a00c651f94a0@gmail.com> (raw)
In-Reply-To: <812c3233-1b64-8a0d-f820-26b98ff6642d@kernel.org>

On 7/18/22 03:19, David Ahern wrote:
> On 7/14/22 12:55 PM, Pavel Begunkov wrote:
>>>>>> You dropped comments about TCP testing; any progress there? If not,
>>>>>> can
>>>>>> you relay any issues you are hitting?
>>>>>
>>>>> Not really a problem, but for me it's bottle necked at NIC bandwidth
>>>>> (~3GB/s) for both zc and non-zc and doesn't even nearly saturate a CPU.
>>>>> Was actually benchmarked by my colleague quite a while ago, but can't
>>>>> find numbers. Probably need to at least add localhost numbers or grab
>>>>> a better server.
>>>>
>>>> Testing localhost TCP with a hack (see below), it doesn't include
>>>> refcounting optimisations I was testing UDP with and that will be
>>>> sent afterwards. Numbers are in MB/s
>>>>
>>>> IO size | non-zc    | zc
>>>> 1200    | 4174      | 4148
>>>> 4096    | 7597      | 11228
>>>
>>> I am surprised by the low numbers; you should be able to saturate a 100G
>>> link with TCP and ZC TX API.
>>
>> It was a quick test with my laptop, not a super fast CPU, preemptible
>> kernel, etc., and considering that the fact that it processes receives
>> from in the same send syscall roughly doubles the overhead, 87Gb/s
>> looks ok. It's not like MSG_ZEROCOPY would look much different, even
>> more to that all sends here will be executed sequentially in io_uring,
>> so no extra parallelism or so. As for 1200, I think 4GB/s is reasonable,
>> it's just the kernel overhead per byte is too high, should be same with
>> just send(2).
> 
> ?
> It's a stream socket so those sends are coalesced into MTU sized packets.

That leaves syscall and io_uring overhead, locking the socket, etc.,
which still requires more cycles than just copying 1200 bytes. And
the used CPU is not blazingly fast, could be that a better CPU/setup
will saturate 100G

>>>> Because it's localhost, we also spend cycles here for the recv side.
>>>> Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
>>>> omitted optimisations will somewhat help. I don't consider it to be a
>>>> blocker. but would be interesting to poke into later. One thing helping
>>>> non-zc is that it squeezes a number of requests into a single page
>>>> whenever zerocopy adds a new frag for every request.
>>>>
>>>> Can't say anything new for larger payloads, I'm still NIC-bound but
>>>> looking at CPU utilisation zc doesn't drain as much cycles as non-zc.
>>>> Also, I don't remember if mentioned before, but another catch is that
>>>> with TCP it expects users to not be flushing notifications too much,
>>>> because it forces it to allocate a new skb and lose a good chunk of
>>>> benefits from using TCP.
>>>
>>> I had issues with TCP sockets and io_uring at the end of 2020:
>>> https://www.spinics.net/lists/io-uring/msg05125.html
>>>
>>> have not tried anything recent (from 2022).
>>
>> Haven't seen it back then. In general io_uring doesn't stop submitting
>> requests if one request fails, at least because we're trying to execute
>> requests asynchronously. And in general, requests can get executed
>> out of order, so most probably submitting a bunch of requests to a single
>> TCP sock without any ordering on io_uring side is likely a bug.
> 
> TCP socket buffer fills resulting in a partial send (i.e, for a given
> sqe submission only part of the write/send succeeded). io_uring was not
> handling that case.

Shouldn't have been different from send(2) with MSG_NOWAIT, can be short
and the user should handle it. Also I believe Jens pushed just recently
in-kernel retries on the io_uring side for TCP in such cases.

> I'll try to find some time to resurrect the iperf3 patch and try top of
> tree kernel.

Awesome


>> You can link io_uring requests, i.e. IOSQE_IO_LINK, guaranteeing
>> execution ordering. And if you meant links in the message, I agree
>> that it was not the best decision to consider len < sqe->len not
>> an error and not breaking links, but it was later added that
>> MSG_WAITALL would also change the success condition to
>> len==sqe->len. But all that is relevant if you was using linking.

-- 
Pavel Begunkov

  reply	other threads:[~2022-07-20 13:33 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 11:49 [PATCH net-next v4 00/27] io_uring zerocopy send Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 01/27] ipv4: avoid partial copy for zc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 02/27] ipv6: " Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 03/27] skbuff: don't mix ubuf_info from different sources Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 04/27] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 05/27] skbuff: carry external ubuf_info in msghdr Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 06/27] net: Allow custom iter handler " Pavel Begunkov
2022-07-11 12:20   ` Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 07/27] net: introduce managed frags infrastructure Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 08/27] net: introduce __skb_fill_page_desc_noacc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 09/27] ipv4/udp: support externally provided ubufs Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 10/27] ipv6/udp: " Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 11/27] tcp: " Pavel Begunkov
2022-07-08  4:06   ` David Ahern
2022-07-08 14:03     ` Pavel Begunkov
2022-07-13 23:38       ` David Ahern
2022-07-07 11:49 ` [PATCH net-next v4 12/27] io_uring: initialise msghdr::msg_ubuf Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 13/27] io_uring: export io_put_task() Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 14/27] io_uring: add zc notification infrastructure Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 15/27] io_uring: cache struct io_notif Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 16/27] io_uring: complete notifiers in tw Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 17/27] io_uring: add rsrc referencing for notifiers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 18/27] io_uring: add notification slot registration Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 19/27] io_uring: wire send zc request type Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 20/27] io_uring: account locked pages for non-fixed zc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 21/27] io_uring: allow to pass addr into sendzc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 22/27] io_uring: sendzc with fixed buffers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 23/27] io_uring: flush notifiers after sendzc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 24/27] io_uring: rename IORING_OP_FILES_UPDATE Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 25/27] io_uring: add zc notification flush requests Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 26/27] io_uring: enable managed frags with register buffers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 27/27] selftests/io_uring: test zerocopy send Pavel Begunkov
2022-07-08  4:10 ` [PATCH net-next v4 00/27] io_uring " David Ahern
2022-07-08 14:26   ` Pavel Begunkov
2022-07-11 12:56     ` Pavel Begunkov
2022-07-13 23:45       ` David Ahern
2022-07-14 18:55         ` Pavel Begunkov
2022-07-18  2:19           ` David Ahern
2022-07-20 13:32             ` Pavel Begunkov [this message]
2022-07-24 18:28             ` David Ahern
2022-07-27 10:51               ` Pavel Begunkov
2022-07-29 22:30                 ` David Ahern
2022-09-26 20:08               ` Pavel Begunkov
2022-09-28 19:31                 ` David Ahern
2022-09-28 20:11                   ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ff5f766-61ef-ae40-aea3-a00c651f94a0@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.