From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Kicinski Subject: Re: [PATCH bpf-next v2 0/5] xsk: fix bug when trying to use both copy and zero-copy mode Date: Tue, 2 Oct 2018 09:58:35 -0700 Message-ID: <20181002095835.6cd49d57@cakuba.netronome.com> References: <1538398297-14862-1-git-send-email-magnus.karlsson@intel.com> <20181001133146.1b8f3810@cakuba.netronome.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "Karlsson, Magnus" , =?UTF-8?B?QmrDtnJuIFQ=?= =?UTF-8?B?w7ZwZWw=?= , ast@kernel.org, Daniel Borkmann , Network Development , Jesper Dangaard Brouer To: Magnus Karlsson Return-path: Received: from mail-qk1-f196.google.com ([209.85.222.196]:34198 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726525AbeJBXnA (ORCPT ); Tue, 2 Oct 2018 19:43:00 -0400 Received: by mail-qk1-f196.google.com with SMTP id p6-v6so1603946qkg.1 for ; Tue, 02 Oct 2018 09:58:39 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2 Oct 2018 14:49:13 +0200, Magnus Karlsson wrote: > On Mon, Oct 1, 2018 at 10:34 PM Jakub Kicinski wrote: > > On Mon, 1 Oct 2018 14:51:32 +0200, Magnus Karlsson wrote: =20 > > > Jakub, please take a look at your patches. The last one I had to > > > change slightly to make it fit with the new interface > > > xdp_get_umem_from_qid(). An added bonus with this function is that we, > > > in the future, can also use it from the driver to get a umem, thus > > > simplifying driver implementations (and later remove the umem from the > > > NDO completely). Bj=C3=B6rn will mail patches, at a later point in ti= me, > > > using this in the i40e and ixgbe drivers, that removes a good chunk of > > > code from the ZC implementations. =20 > > > > Nice, drivers which don't follow the prepare/commit model of handling > > reconfigurations will benefit! > > =20 > > > I also made your code aware of Tx queues. If we create a socket that > > > only has a Tx queue, then the queue id will refer to a Tx queue id > > > only and could be larger than the available amount of Rx queues. > > > Please take a look at it. =20 > > > > The semantics of Tx queue id are slightly unclear. To me XDP is > > associated with Rx, so the qid in driver context can only refer to > > Rx queue and its associated XDP Tx queue. It does not mean the Tx > > queue stack uses, like it does for copy fallback. If one doesn't have > > a Rx queue $id, there will be no associated XDP Tx queue $id (in all > > drivers but Intel, and virtio, which use per-CPU Tx queues making TX > > queue even more meaningless). > > > > Its to be seen how others implement AF_XDP. My general feeling is > > that we should only talk about Rx queues in context of driver XDP. =20 >=20 > This is the way I see it. From an uapi point of view we can create a > socket that can only do Rx, only Tx or both. We then bind this socket > to a specific queue id on a device. If a packet is received on this > queue id it is sent (by the default xdpsock sample program) to the > socket. If a packet is sent on this socket it goes out on this same > queue id. If you have not registered an Rx ring (in user space) for > this socket, you cannot receive anything on this socket. And > conversely, if you have no Tx ring, you will not be able to send > anything. >=20 > But if we take a look at this from the driver perspective and the NDO > XDP_SETUP_XSK_UMEM, today it does not know anything about if Rx and Tx > rings have been setup in the socket. It will always initialize the HW > Rx and Tx queues of the supplied queue id. So with today's NDO > interface you will always get a Rx/Tx queue pair. In order to realize > the uapi above in an efficient manner and to support devices with more > Tx queues than Rx, we need to change the NDO. >=20 > Just as a note, in the applications I am used to work on, radio base > stations and other telecom apps, it is the common case to have many > more Tx queues than Rx queues just to be able to use scheduling, > shaping and other QoS features that are important on egress in those > systems. Therefore the interest in supporting Tx only queues. But > maybe this is just a weird case, do not know. It's a good case, it should be supported. I'm just wondering whether the API we have today is going to be the right one. So for i40e you actually allocate TX ring per RX ring? In ixgbe IIUC there is an XDP TX ring per core so regardless how many TX queues one requests there will actually be num_cpu_ids XDP TX queues... so even the check against RX isn't meaningful there. Hm.. Okay, I think what you've done is the safest bet, we can always relax the check later on. LGTM, sorry for the noise! :)