From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: [bpf-next PATCH 0/4] xdp: introduce bulking for ndo_xdp_xmit API Date: Wed, 09 May 2018 15:12:54 +0200 Message-ID: <152587152136.20423.14493673928480468024.stgit@firesoul> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Cc: Christoph Hellwig , =?utf-8?b?QmrDtnJu?= =?utf-8?b?VMO2cGVs?= , Magnus Karlsson To: netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov , Jesper Dangaard Brouer Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:35828 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S935034AbeEINM6 (ORCPT ); Wed, 9 May 2018 09:12:58 -0400 Sender: netdev-owner@vger.kernel.org List-ID: This patchset change ndo_xdp_xmit API to take a bulk of xdp frames. When kernel is compiled with CONFIG_RETPOLINE, every indirect function pointer (branch) call hurts performance. For XDP this have a huge negative performance impact. This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but also prepares for further optimizations. The DMA APIs use of indirect function pointer calls is the primary source the regression. It is left for a followup patchset, to use bulking calls towards the DMA API (via the scatter-gatter calls). The other advantage of this API change is that drivers can easier amortize the cost of any sync/locking scheme, over the bulk of packets. The assumption of the current API is that the driver implemementing the NDO will also allocate a dedicated XDP TX queue for every CPU in the system. Which is not always possible or practical to configure. E.g. ixgbe cannot load an XDP program on a machine with more than 96 CPUs, due to limited hardware TX queues. E.g. virtio_net is hard to configure as it requires manually increasing the queues. E.g. tun driver chooses to use a per XDP frame producer lock modulo smp_processor_id over avail queues. --- Jesper Dangaard Brouer (4): bpf: devmap introduce dev_map_enqueue bpf: devmap prepare xdp frames for bulking xdp: add tracepoint for devmap like cpumap have xdp: change ndo_xdp_xmit API to support bulking drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 ++++- drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 21 +++- drivers/net/tun.c | 37 ++++--- drivers/net/virtio_net.c | 66 +++++++++--- include/linux/bpf.h | 15 ++- include/linux/netdevice.h | 14 ++- include/net/xdp.h | 1 include/trace/events/xdp.h | 50 +++++++++ kernel/bpf/devmap.c | 134 ++++++++++++++++++++++++- net/core/filter.c | 19 +--- net/core/xdp.c | 14 ++- samples/bpf/xdp_monitor_kern.c | 49 +++++++++ samples/bpf/xdp_monitor_user.c | 69 +++++++++++++ 14 files changed, 436 insertions(+), 81 deletions(-)