From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCDFFC282CE for ; Wed, 13 Feb 2019 11:55:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ADC3D222B2 for ; Wed, 13 Feb 2019 11:55:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391941AbfBMLzn (ORCPT ); Wed, 13 Feb 2019 06:55:43 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45638 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388405AbfBMLzm (ORCPT ); Wed, 13 Feb 2019 06:55:42 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4D59F5AF68; Wed, 13 Feb 2019 11:55:41 +0000 (UTC) Received: from carbon (ovpn-200-42.brq.redhat.com [10.40.200.42]) by smtp.corp.redhat.com (Postfix) with ESMTP id 08F8B5D6B3; Wed, 13 Feb 2019 11:55:32 +0000 (UTC) Date: Wed, 13 Feb 2019 12:55:30 +0100 From: Jesper Dangaard Brouer To: Magnus Karlsson Cc: Jonathan Lemon , Magnus Karlsson , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , ast@kernel.org, Daniel Borkmann , Network Development , Jakub Kicinski , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , "Zhang, Qi Z" , xiaolong.ye@intel.com, brouer@redhat.com, "xdp-newbies@vger.kernel.org" Subject: Re: [PATCH bpf-next v4 0/2] libbpf: adding AF_XDP support Message-ID: <20190213125530.4a7fb8bc@carbon> In-Reply-To: References: <1549631126-29067-1-git-send-email-magnus.karlsson@intel.com> <36557463-D23A-432E-AA18-7731F43CEBA6@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 13 Feb 2019 11:55:41 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, 13 Feb 2019 12:32:47 +0100 Magnus Karlsson wrote: > On Mon, Feb 11, 2019 at 9:44 PM Jonathan Lemon wrote: > > > > On 8 Feb 2019, at 5:05, Magnus Karlsson wrote: > > > > > This patch proposes to add AF_XDP support to libbpf. The main reason > > > for this is to facilitate writing applications that use AF_XDP by > > > offering higher-level APIs that hide many of the details of the AF_XDP > > > uapi. This is in the same vein as libbpf facilitates XDP adoption by > > > offering easy-to-use higher level interfaces of XDP > > > functionality. Hopefully this will facilitate adoption of AF_XDP, make > > > applications using it simpler and smaller, and finally also make it > > > possible for applications to benefit from optimizations in the AF_XDP > > > user space access code. Previously, people just copied and pasted the > > > code from the sample application into their application, which is not > > > desirable. > > > > I like the idea of encapsulating the boilerplate logic in a library. > > > > I do think there is an important missing piece though - there should be > > some code which queries the netdev for how many queues are attached, and > > create the appropriate number of umem/AF_XDP sockets. > > > > I ran into this issue when testing the current AF_XDP code - on my test > > boxes, the mlx5 card has 55 channels (aka queues), so when the test program > > binds only to channel 0, nothing works as expected, since not all traffic > > is being intercepted. While obvious in hindsight, this took a while to > > track down. > > Yes, agreed. You are not the first one to stumble upon this problem > :-). Let me think a little bit on how to solve this in a good way. We > need this to be simple and intuitive, as you say. I see people hitting this with AF_XDP all the time... I had some backup-slides[2] in our FOSDEM presentation[1] that describe the issue, give the performance reason why and propose a workaround. [1] https://github.com/xdp-project/xdp-project/tree/master/conference/FOSDEM2019 [2] https://github.com/xdp-project/xdp-project/blob/master/conference/FOSDEM2019/xdp_building_block.org#backup-slides Alternative work-around * Create as many AF_XDP sockets as RXQs * Have userspace poll()/select on all sockets -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer * Backup Slides :export: ** Slide: Where does AF_XDP performance come from? :export: /Lock-free [[https://lwn.net/Articles/169961/][channel]] directly from driver RX-queue into AF_XDP socket/ - Single-Producer/Single-Consumer (SPSC) descriptor ring queues - *Single*-/Producer/ (SP) via bind to specific RX-*/queue id/* * NAPI-softirq assures only 1-CPU process 1-RX-queue id (per sched) - *Single*-/Consumer/ (SC) via 1-Application - *Bounded* buffer pool (UMEM) allocated by userspace (register with kernel) * Descriptor(s) in ring(s) point into UMEM * /No memory allocation/, but return frames to UMEM in timely manner - [[http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf][Transport signature]] Van Jacobson talked about * Replaced by XDP/eBPF program choosing to XDP_REDIRECT ** Slide: Details: Actually *four* SPSC ring queues :export: AF_XDP /socket/: Has /two rings/: *RX* and *TX* - Descriptor(s) in ring points into UMEM /UMEM/ consists of a number of equally sized chunks - Has /two rings/: *FILL* ring and *COMPLETION* ring - FILL ring: application gives kernel area to RX fill - COMPLETION ring: kernel tells app TX is done for area (can be reused) ** Slide: Gotcha by RX-queue id binding :export: AF_XDP bound to */single RX-queue id/* (for SPSC performance reasons) - NIC by default spreads flows with RSS-hashing over RX-queues * Traffic likely not hitting queue you expect - You *MUST* configure NIC *HW filters* to /steer to RX-queue id/ * Out of scope for XDP setup * Use ethtool or TC HW offloading for filter setup - *Alternative* work-around * /Create as many AF_XDP sockets as RXQs/ * Have userspace poll()/select on all sockets