From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A253BC433FE for ; Wed, 12 Jan 2022 03:39:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350463AbiALDjG (ORCPT ); Tue, 11 Jan 2022 22:39:06 -0500 Received: from out30-57.freemail.mail.aliyun.com ([115.124.30.57]:57742 "EHLO out30-57.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234465AbiALDjE (ORCPT ); Tue, 11 Jan 2022 22:39:04 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R811e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=haoxu@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0V1ciBBb_1641958740; Received: from B-25KNML85-0107.local(mailfrom:haoxu@linux.alibaba.com fp:SMTPD_---0V1ciBBb_1641958740) by smtp.aliyun-inc.com(127.0.0.1); Wed, 12 Jan 2022 11:39:01 +0800 Subject: Re: [RFC v2 02/19] skbuff: pass a struct ubuf_info in msghdr To: Pavel Begunkov , io-uring@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jakub Kicinski , Jonathan Lemon , "David S . Miller" , Willem de Bruijn , Eric Dumazet , David Ahern , Jens Axboe References: <7dae2f61ee9a1ad38822870764fcafad43a3fe4e.1640029579.git.asml.silence@gmail.com> <4bc0e57b-ee3b-ae77-5d5d-213a48bdf4b0@gmail.com> From: Hao Xu Message-ID: Date: Wed, 12 Jan 2022 11:39:00 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <4bc0e57b-ee3b-ae77-5d5d-213a48bdf4b0@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/1/11 下午11:50, Pavel Begunkov 写道: > On 1/11/22 13:51, Hao Xu wrote: >> 在 2021/12/21 下午11:35, Pavel Begunkov 写道: >>> Instead of the net stack managing ubuf_info, allow to pass it in from >>> outside in a struct msghdr (in-kernel structure), so io_uring can make >>> use of it. >>> >>> Signed-off-by: Pavel Begunkov >>> --- >> Hi Pavel, >> I've some confusions here since I have a lack of >> network knowledge. >> The first one is why do we make ubuf_info visible >> for io_uring. Why not just follow the old MSG_ZEROCOPY >> logic? > > I assume you mean leaving allocation up and so in socket awhile the > patchset let's io_uring to manage and control ubufs. In short, > performance and out convenience > > TL;DR; > First, we want a nice and uniform API with io_uring, i.e. posting > an CQE instead of polling an err queue/etc., and for that the network > will need to know about io_uring ctx in some way. As an alternative it > may theoretically be registered in socket, but it'll quickly turn into > a huge mess, consider that it's a many to many relation b/w io_uring and > sockets. The fact that io_uring holds refs to files will only complicate > it. Make sense to me, thanks. > > It will also limit API. For instance, we won't be able to use a single > ubuf with several different sockets. Is there any use cases for this multiple sockets with single notification? > > Another problem is performance, registration or some other tricks > would some additional sync. It'd also need sync on use, say it's > just one rcu_read, but the problem that it only adds up to complexity > and prevents some other optimisations. E.g. we amortise to ~0 atomics > getting refs on skb setups based on guarantees io_uring provides, and > not only. SKBFL_MANAGED_FRAGS can only work with pages being controlled > by the issuer, and so it needs some context as currently provided by > ubuf. io_uring also caches ubufs, which relies on io_uring locking, so > it removes kmalloc/free for almost zero overhead. > > >> The second one, my understanding about the buffer >> lifecycle is that the kernel side informs >> the userspace by a cqe generated by the ubuf_info >> callback that all the buffers attaching to the >> same notifier is now free to use when all the data >> is sent, then why is the flush in 13/19 needed as >> it is at the submission period? > > Probably I wasn't clear enough. A user has to flush a notifier, only > then it's expected to post an CQE after all buffers attached to it > are freed. io_uring holds one ubuf ref, which will be release on flush. I see, I saw another ref inc in skb_zcopy_set() which I previously misunderstood and thus thought there was only one refcount. Thanks! > I also need to add a way to flush without send. > > Will spend some time documenting for next iteration. >