linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Stanislav Fomichev <sdf@google.com>
Cc: "Mina Almasry" <almasrymina@google.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"David Ahern" <dsahern@kernel.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>,
	"Willem de Bruijn" <willemb@google.com>,
	"Kaiyuan Zhang" <kaiyuanz@google.com>
Subject: Re: [RFC PATCH v3 10/12] tcp: RX path for devmem TCP
Date: Thu, 09 Nov 2023 12:05:37 +0100	[thread overview]
Message-ID: <3a1b5412bee202affc6a7cc74cd939e182b9a18e.camel@redhat.com> (raw)
In-Reply-To: <CAF=yD-JZ88j+44MYgX-=oYJngz4Z0zw6Y0V3nHXisZJtNu7q6A@mail.gmail.com>

On Mon, 2023-11-06 at 14:55 -0800, Willem de Bruijn wrote:
> On Mon, Nov 6, 2023 at 2:34 PM Stanislav Fomichev <sdf@google.com> wrote:
> > 
> > On 11/06, Willem de Bruijn wrote:
> > > > > IMHO, we need a better UAPI to receive the tokens and give them back to
> > > > > the kernel. CMSG + setsockopt(SO_DEVMEM_DONTNEED) get the job done,
> > > > > but look dated and hacky :-(
> > > > > 
> > > > > We should either do some kind of user/kernel shared memory queue to
> > > > > receive/return the tokens (similar to what Jonathan was doing in his
> > > > > proposal?)
> > > > 
> > > > I'll take a look at Jonathan's proposal, sorry, I'm not immediately
> > > > familiar but I wanted to respond :-) But is the suggestion here to
> > > > build a new kernel-user communication channel primitive for the
> > > > purpose of passing the information in the devmem cmsg? IMHO that seems
> > > > like an overkill. Why add 100-200 lines of code to the kernel to add
> > > > something that can already be done with existing primitives? I don't
> > > > see anything concretely wrong with cmsg & setsockopt approach, and if
> > > > we switch to something I'd prefer to switch to an existing primitive
> > > > for simplicity?
> > > > 
> > > > The only other existing primitive to pass data outside of the linear
> > > > buffer is the MSG_ERRQUEUE that is used for zerocopy. Is that
> > > > preferred? Any other suggestions or existing primitives I'm not aware
> > > > of?
> > > > 
> > > > > or bite the bullet and switch to io_uring.
> > > > > 
> > > > 
> > > > IMO io_uring & socket support are orthogonal, and one doesn't preclude
> > > > the other. As you know we like to use sockets and I believe there are
> > > > issues with io_uring adoption at Google that I'm not familiar with
> > > > (and could be wrong). I'm interested in exploring io_uring support as
> > > > a follow up but I think David Wei will be interested in io_uring
> > > > support as well anyway.
> > > 
> > > I also disagree that we need to replace a standard socket interface
> > > with something "faster", in quotes.
> > > 
> > > This interface is not the bottleneck to the target workload.
> > > 
> > > Replacing the synchronous sockets interface with something more
> > > performant for workloads where it is, is an orthogonal challenge.
> > > However we do that, I think that traditional sockets should continue
> > > to be supported.
> > > 
> > > The feature may already even work with io_uring, as both recvmsg with
> > > cmsg and setsockopt have io_uring support now.
> > 
> > I'm not really concerned with faster. I would prefer something cleaner :-)
> > 
> > Or maybe we should just have it documented. With some kind of path
> > towards beautiful world where we can create dynamic queues..
> 
> I suppose we just disagree on the elegance of the API.
> 
> The concise notification API returns tokens as a range for
> compression, encoding as two 32-bit unsigned integers start + length.
> It allows for even further batching by returning multiple such ranges
> in a single call.
> 
> This is analogous to the MSG_ZEROCOPY notification mechanism from
> kernel to user.
> 
> The synchronous socket syscall interface can be replaced by something
> asynchronous like io_uring. This already works today? Whatever
> asynchronous ring-based API would be selected, io_uring or otherwise,
> I think the concise notification encoding would remain as is.
> 
> Since this is an operation on a socket, I find a setsockopt the
> fitting interface.

FWIW, I think sockopt +cmsg is the right API. It would deserve some
explicit addition to the documentation, both in the kernel and in the
man-pages.

Cheers,

Paolo


  parent reply	other threads:[~2023-11-09 11:06 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-06  2:43 [RFC PATCH v3 00/12] Device Memory TCP Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 01/12] net: page_pool: factor out releasing DMA from releasing the page Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 02/12] net: page_pool: create hooks for custom page providers Mina Almasry
2023-11-07  7:44   ` Yunsheng Lin
2023-11-09 11:09   ` Paolo Abeni
2023-11-10 23:19   ` Jakub Kicinski
2023-11-13  3:28     ` Mina Almasry
2023-11-13 22:10       ` Jakub Kicinski
2023-11-06  2:44 ` [RFC PATCH v3 03/12] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2023-11-10 23:16   ` Jakub Kicinski
2023-11-06  2:44 ` [RFC PATCH v3 04/12] netdev: support binding dma-buf to netdevice Mina Almasry
2023-11-07  7:46   ` Yunsheng Lin
2023-11-07 21:59     ` Mina Almasry
2023-11-08  3:40       ` Yunsheng Lin
2023-11-09  2:22         ` Mina Almasry
2023-11-09  9:29           ` Yunsheng Lin
2023-11-08 23:47   ` David Wei
2023-11-09  2:25     ` Mina Almasry
2023-11-09  8:29   ` Paolo Abeni
2023-11-10  2:59     ` Mina Almasry
2023-11-10  7:38       ` Yunsheng Lin
2023-11-10  9:45         ` Mina Almasry
2023-11-10 23:19   ` Jakub Kicinski
2023-11-11  2:19     ` Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 05/12] netdev: netdevice devmem allocator Mina Almasry
2023-11-06 23:44   ` David Ahern
2023-11-07 22:10     ` Mina Almasry
2023-11-07 22:55       ` David Ahern
2023-11-07 23:03         ` Mina Almasry
2023-11-09  1:15           ` David Wei
2023-11-10 14:26           ` Pavel Begunkov
2023-11-11 17:19             ` David Ahern
2023-11-14 16:09               ` Pavel Begunkov
2023-11-09  1:00         ` David Wei
2023-11-08  3:48       ` Yunsheng Lin
2023-11-09  1:41         ` Mina Almasry
2023-11-07  7:45   ` Yunsheng Lin
2023-11-09  8:44   ` Paolo Abeni
2023-11-06  2:44 ` [RFC PATCH v3 06/12] memory-provider: dmabuf devmem memory provider Mina Almasry
2023-11-06 21:02   ` Stanislav Fomichev
2023-11-06 23:49   ` David Ahern
2023-11-08  0:02     ` Mina Almasry
2023-11-08  0:10       ` David Ahern
2023-11-10 23:16   ` Jakub Kicinski
2023-11-13  4:54     ` Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 07/12] page-pool: device memory support Mina Almasry
2023-11-07  8:00   ` Yunsheng Lin
2023-11-07 21:56     ` Mina Almasry
2023-11-08 10:56       ` Yunsheng Lin
2023-11-09  3:20         ` Mina Almasry
2023-11-09  9:30           ` Yunsheng Lin
2023-11-09 12:20             ` Mina Almasry
2023-11-09 13:23               ` Yunsheng Lin
2023-11-09  9:01   ` Paolo Abeni
2023-11-06  2:44 ` [RFC PATCH v3 08/12] net: support non paged skb frags Mina Almasry
2023-11-07  9:00   ` Yunsheng Lin
2023-11-07 21:19     ` Mina Almasry
2023-11-08 11:25       ` Yunsheng Lin
2023-11-09  9:14   ` Paolo Abeni
2023-11-10  4:06     ` Mina Almasry
2023-11-10 23:19   ` Jakub Kicinski
2023-11-13  6:05     ` Mina Almasry
2023-11-13 22:17       ` Jakub Kicinski
2023-11-06  2:44 ` [RFC PATCH v3 09/12] net: add support for skbs with unreadable frags Mina Almasry
2023-11-06 18:47   ` Stanislav Fomichev
2023-11-06 19:34     ` David Ahern
2023-11-06 20:31       ` Mina Almasry
2023-11-06 21:59         ` Stanislav Fomichev
2023-11-06 22:18           ` Mina Almasry
2023-11-06 22:59             ` Stanislav Fomichev
2023-11-06 23:27               ` Mina Almasry
2023-11-06 23:55                 ` Stanislav Fomichev
2023-11-07  0:07                   ` Willem de Bruijn
2023-11-07  0:14                     ` Stanislav Fomichev
2023-11-07  0:59                       ` Stanislav Fomichev
2023-11-07  2:23                         ` Willem de Bruijn
2023-11-07 17:44                           ` Stanislav Fomichev
2023-11-07 17:57                             ` Willem de Bruijn
2023-11-07 18:14                               ` Stanislav Fomichev
2023-11-07  0:20                     ` Mina Almasry
2023-11-07  1:06                       ` Stanislav Fomichev
2023-11-07 19:53                         ` Mina Almasry
2023-11-07 21:05                           ` Stanislav Fomichev
2023-11-07 21:17                             ` Eric Dumazet
2023-11-07 22:23                               ` Stanislav Fomichev
2023-11-10 23:17                                 ` Jakub Kicinski
2023-11-10 23:19                           ` Jakub Kicinski
2023-11-07  1:09                       ` David Ahern
2023-11-06 23:37             ` David Ahern
2023-11-07  0:03               ` Mina Almasry
2023-11-06 20:56   ` Stanislav Fomichev
2023-11-07  0:16   ` David Ahern
2023-11-07  0:23     ` Mina Almasry
2023-11-08 14:43   ` David Laight
2023-11-06  2:44 ` [RFC PATCH v3 10/12] tcp: RX path for devmem TCP Mina Almasry
2023-11-06 18:44   ` Stanislav Fomichev
2023-11-06 19:29     ` Mina Almasry
2023-11-06 21:14       ` Willem de Bruijn
2023-11-06 22:34         ` Stanislav Fomichev
2023-11-06 22:55           ` Willem de Bruijn
2023-11-06 23:32             ` Stanislav Fomichev
2023-11-06 23:55               ` David Ahern
2023-11-07  0:02                 ` Willem de Bruijn
2023-11-07 23:55                   ` Mina Almasry
2023-11-08  0:01                     ` David Ahern
2023-11-09  2:39                       ` Mina Almasry
2023-11-09 16:07                         ` Edward Cree
2023-12-08 20:12                           ` Pavel Begunkov
2023-11-09 11:05             ` Paolo Abeni [this message]
2023-11-10 23:16               ` Jakub Kicinski
2023-12-08 20:28             ` Pavel Begunkov
2023-12-08 20:09           ` Pavel Begunkov
2023-11-06 21:17       ` Stanislav Fomichev
2023-11-08 15:36         ` Edward Cree
2023-11-09 10:52   ` Paolo Abeni
2023-11-10 23:19   ` Jakub Kicinski
2023-11-06  2:44 ` [RFC PATCH v3 11/12] net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages Mina Almasry
2023-11-06  2:44 ` [RFC PATCH v3 12/12] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
2023-11-09 11:03   ` Paolo Abeni
2023-11-10 23:13   ` Jakub Kicinski
2023-11-11  2:27     ` Mina Almasry
2023-11-11  2:35       ` Jakub Kicinski
2023-11-13  4:08         ` Mina Almasry
2023-11-13 22:20           ` Jakub Kicinski
2023-11-10 23:17   ` Jakub Kicinski
2023-11-07 15:18 ` [RFC PATCH v3 00/12] Device Memory TCP David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3a1b5412bee202affc6a7cc74cd939e182b9a18e.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=almasrymina@google.com \
    --cc=arnd@arndb.de \
    --cc=christian.koenig@amd.com \
    --cc=davem@davemloft.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jeroendb@google.com \
    --cc=kaiyuanz@google.com \
    --cc=kuba@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pkaligineedi@google.com \
    --cc=sdf@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).