From mboxrd@z Thu Jan 1 00:00:00 1970 From: William Tu Subject: Re: [RFC PATCH v2 00/14] Introducing AF_XDP support Date: Tue, 10 Apr 2018 07:14:18 -0700 Message-ID: References: <20180327165919.17933-1-bjorn.topel@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: "Karlsson, Magnus" , Alexander Duyck , Alexander Duyck , John Fastabend , Alexei Starovoitov , Jesper Dangaard Brouer , Willem de Bruijn , Daniel Borkmann , Linux Kernel Network Developers , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , michael.lundkvist@ericsson.com, "Brandeburg, Jesse" , Anjali Singhai Jain , "Zhang, Qi Z" , ravineet.singh@ericsson.com To: =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= Return-path: Received: from mail-qt0-f196.google.com ([209.85.216.196]:41147 "EHLO mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752700AbeDJOPA (ORCPT ); Tue, 10 Apr 2018 10:15:00 -0400 Received: by mail-qt0-f196.google.com with SMTP id d3so13389327qth.8 for ; Tue, 10 Apr 2018 07:15:00 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Mon, Apr 9, 2018 at 11:47 PM, Bj=C3=B6rn T=C3=B6pel wrote: > 2018-04-09 23:51 GMT+02:00 William Tu : >> On Tue, Mar 27, 2018 at 9:59 AM, Bj=C3=B6rn T=C3=B6pel wrote: >>> From: Bj=C3=B6rn T=C3=B6pel >>> >>> This RFC introduces a new address family called AF_XDP that is >>> optimized for high performance packet processing and, in upcoming >>> patch sets, zero-copy semantics. In this v2 version, we have removed >>> all zero-copy related code in order to make it smaller, simpler and >>> hopefully more review friendly. This RFC only supports copy-mode for >>> the generic XDP path (XDP_SKB) for both RX and TX and copy-mode for RX >>> using the XDP_DRV path. Zero-copy support requires XDP and driver >>> changes that Jesper Dangaard Brouer is working on. Some of his work is >>> already on the mailing list for review. We will publish our zero-copy >>> support for RX and TX on top of his patch sets at a later point in >>> time. >>> >>> An AF_XDP socket (XSK) is created with the normal socket() >>> syscall. Associated with each XSK are two queues: the RX queue and the >>> TX queue. A socket can receive packets on the RX queue and it can send >>> packets on the TX queue. These queues are registered and sized with >>> the setsockopts XDP_RX_QUEUE and XDP_TX_QUEUE, respectively. It is >>> mandatory to have at least one of these queues for each socket. In >>> contrast to AF_PACKET V2/V3 these descriptor queues are separated from >>> packet buffers. An RX or TX descriptor points to a data buffer in a >>> memory area called a UMEM. RX and TX can share the same UMEM so that a >>> packet does not have to be copied between RX and TX. Moreover, if a >>> packet needs to be kept for a while due to a possible retransmit, the >>> descriptor that points to that packet can be changed to point to >>> another and reused right away. This again avoids copying data. >>> >>> This new dedicated packet buffer area is called a UMEM. It consists of >>> a number of equally size frames and each frame has a unique frame >>> id. A descriptor in one of the queues references a frame by >>> referencing its frame id. The user space allocates memory for this >>> UMEM using whatever means it feels is most appropriate (malloc, mmap, >>> huge pages, etc). This memory area is then registered with the kernel >>> using the new setsockopt XDP_UMEM_REG. The UMEM also has two queues: >>> the FILL queue and the COMPLETION queue. The fill queue is used by the >>> application to send down frame ids for the kernel to fill in with RX >>> packet data. References to these frames will then appear in the RX >>> queue of the XSK once they have been received. The completion queue, >>> on the other hand, contains frame ids that the kernel has transmitted >>> completely and can now be used again by user space, for either TX or >>> RX. Thus, the frame ids appearing in the completion queue are ids that >>> were previously transmitted using the TX queue. In summary, the RX and >>> FILL queues are used for the RX path and the TX and COMPLETION queues >>> are used for the TX path. >>> >> Can we register a UMEM to multiple device's queue? >> > > No, one UMEM, one netdev queue in this RFC. That being said, there's > nothing stopping a user from creating an additional UMEM, say UMEM', > pointing to the same memory as UMEM, but bound to another > netdev/queue. Note that the user space application has to make sure > that the buffer handling is sane (user/kernel frame ownership). > > We used to allow to share UMEM between unrelated sockets, but after > the introduction of the UMEM queues (fill/completion) that's no the > case any more. For the zero-copy scenario, having to manage multiple > DMA mappings per UMEM was a bit of a mess, so we went for the simpler > (current) solution with one UMEM per netdev/queue. > >> So far the l2fwd sample code is sending/receiving from the same >> queue. I'm thinking about forwarding packets from one device to another. >> Now I'm copying packets from one device's RX desc to another device's TX >> completion queue. But this introduces one extra copy. >> > > So you've setup two identical UMEMs? Then you can just forward the > incoming Rx descriptor to the other netdev's Tx queue. Note, that you > only need to copy the descriptor, not the actual frame data. > Thanks! I will give it a try, I guess you're saying I can do below: int sfd1; // for device1 int sfd2; // for device2 ... // create 2 umem umem1 =3D calloc(1, sizeof(*umem)); umem2 =3D calloc(1, sizeof(*umem)); // allocate 1 shared buffer, 1 xdp_umem_reg posix_memalign(&bufs, ...) mr.addr =3D (__u64)bufs; // shared for umem1,2 ... // umem reg the same mr setsockopt(sfd1, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) setsockopt(sfd2, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) // setup fill, completion, mmap for sfd1 and sfd2 ... Since both device can put frame data in 'bufs', I only need to copy the descs between 2 umem1 and umem2. Am I understand correct? Regards, William