All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Hao Xu <haoxu@linux.alibaba.com>,
	io-uring@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Jakub Kicinski <kuba@kernel.org>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Willem de Bruijn <willemb@google.com>,
	Eric Dumazet <edumazet@google.com>,
	David Ahern <dsahern@kernel.org>, Jens Axboe <axboe@kernel.dk>
Subject: Re: [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr
Date: Wed, 12 Jan 2022 16:53:06 +0000	[thread overview]
Message-ID: <9701c478-6815-74c7-eedd-9a2f02458b35@gmail.com> (raw)
In-Reply-To: <cf5eb67e-05dc-3b8d-3e61-ddf9a9706265@linux.alibaba.com>

On 1/12/22 03:39, Hao Xu wrote:
> 在 2022/1/11 下午11:50, Pavel Begunkov 写道:
>> On 1/11/22 13:51, Hao Xu wrote:
>>> 在 2021/12/21 下午11:35, Pavel Begunkov 写道:
>>>> Instead of the net stack managing ubuf_info, allow to pass it in from
>>>> outside in a struct msghdr (in-kernel structure), so io_uring can make
>>>> use of it.
>>>>
>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>> ---
>>> Hi Pavel,
>>> I've some confusions here since I have a lack of
>>> network knowledge.
>>> The first one is why do we make ubuf_info visible
>>> for io_uring. Why not just follow the old MSG_ZEROCOPY
>>> logic?
>>
>> I assume you mean leaving allocation up and so in socket awhile the
>> patchset let's io_uring to manage and control ubufs. In short,
>> performance and out convenience
>>
>> TL;DR;
>> First, we want a nice and uniform API with io_uring, i.e. posting
>> an CQE instead of polling an err queue/etc., and for that the network
>> will need to know about io_uring ctx in some way. As an alternative it
>> may theoretically be registered in socket, but it'll quickly turn into
>> a huge mess, consider that it's a many to many relation b/w io_uring and
>> sockets. The fact that io_uring holds refs to files will only complicate
>> it.
> Make sense to me, thanks.
>>
>> It will also limit API. For instance, we won't be able to use a single
>> ubuf with several different sockets.
> Is there any use cases for this multiple sockets with single
> notification?

Don't know, scatter send maybe? It's just one of those moments when
a design that feels right (to me) yields more flexibility than was
initially planned, which is definitely a good thing


>> Another problem is performance, registration or some other tricks
>> would some additional sync. It'd also need sync on use, say it's
>> just one rcu_read, but the problem that it only adds up to complexity
>> and prevents some other optimisations. E.g. we amortise to ~0 atomics
>> getting refs on skb setups based on guarantees io_uring provides, and
>> not only. SKBFL_MANAGED_FRAGS can only work with pages being controlled
>> by the issuer, and so it needs some context as currently provided by
>> ubuf. io_uring also caches ubufs, which relies on io_uring locking, so
>> it removes kmalloc/free for almost zero overhead.
>>
>>
>>> The second one, my understanding about the buffer
>>> lifecycle is that the kernel side informs
>>> the userspace by a cqe generated by the ubuf_info
>>> callback that all the buffers attaching to the
>>> same notifier is now free to use when all the data
>>> is sent, then why is the flush in 13/19 needed as
>>> it is at the submission period?
>>
>> Probably I wasn't clear enough. A user has to flush a notifier, only
>> then it's expected to post an CQE after all buffers attached to it
>> are freed. io_uring holds one ubuf ref, which will be release on flush.
> I see, I saw another ref inc in skb_zcopy_set() which I previously
> misunderstood and thus thought there was only one refcount. Thanks!
>> I also need to add a way to flush without send.
>>
>> Will spend some time documenting for next iteration.

-- 
Pavel Begunkov

  reply	other threads:[~2022-01-12 16:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-21 15:35 [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 01/19] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2022-01-11 13:51   ` Hao Xu
2022-01-11 15:50     ` Pavel Begunkov
2022-01-12  3:39       ` Hao Xu
2022-01-12 16:53         ` Pavel Begunkov [this message]
2021-12-21 15:35 ` [RFC v2 03/19] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 04/19] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 05/19] net: don't track pfmemalloc for zc registered mem Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 06/19] ipv4/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 07/19] ipv6/udp: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 08/19] ipv4: avoid partial copy for zc Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 09/19] ipv6: " Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 10/19] io_uring: add send notifiers registration Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 11/19] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 12/19] io_uring: wire send zc request type Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 13/19] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 14/19] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 15/19] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 16/19] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 17/19] io_uring: unclog ctx refs waiting with zc notifiers Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 18/19] io_uring: task_work for notification delivery Pavel Begunkov
2021-12-21 15:35 ` [RFC v2 19/19] io_uring: optimise task referencing by notifiers Pavel Begunkov
2021-12-21 15:43 ` [RFC v2 00/19] io_uring zerocopy tx Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9701c478-6815-74c7-eedd-9a2f02458b35@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=haoxu@linux.alibaba.com \
    --cc=io-uring@vger.kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.