From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DC8BC169C4 for ; Tue, 29 Jan 2019 15:12:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 75CEB214DA for ; Tue, 29 Jan 2019 15:12:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728227AbfA2PMe (ORCPT ); Tue, 29 Jan 2019 10:12:34 -0500 Received: from mga05.intel.com ([192.55.52.43]:52833 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725811AbfA2PMe (ORCPT ); Tue, 29 Jan 2019 10:12:34 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Jan 2019 07:12:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,537,1539673200"; d="scan'208";a="270841234" Received: from mkarlsso-mobl.ger.corp.intel.com (HELO VM.isw.intel.com) ([10.103.211.42]) by orsmga004.jf.intel.com with ESMTP; 29 Jan 2019 07:12:30 -0800 From: Magnus Karlsson To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jakub.kicinski@netronome.com, bjorn.topel@gmail.com, qi.z.zhang@intel.com Cc: brouer@redhat.com Subject: [PATCH bpf-next v3 0/3] libbpf: adding AF_XDP support Date: Tue, 29 Jan 2019 16:12:14 +0100 Message-Id: <1548774737-16579-1-git-send-email-magnus.karlsson@intel.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch proposes to add AF_XDP support to libbpf. The main reason for this is to facilitate writing applications that use AF_XDP by offering higher-level APIs that hide many of the details of the AF_XDP uapi. This is in the same vein as libbpf facilitates XDP adoption by offering easy-to-use higher level interfaces of XDP functionality. Hopefully this will facilitate adoption of AF_XDP, make applications using it simpler and smaller, and finally also make it possible for applications to benefit from optimizations in the AF_XDP user space access code. Previously, people just copied and pasted the code from the sample application into their application, which is not desirable. The proposed interface is composed of two parts: * Low-level access interface to the four rings and the packet * High-level control plane interface for creating and setting up umems and AF_XDP sockets. This interface also loads a simple XDP program that routes all traffic on a queue up to the AF_XDP socket. The sample program has been updated to use this new interface and in that process it lost roughly 300 lines of code. I cannot detect any performance degradations due to the use of this library instead of the previous functions that were inlined in the sample application. But I did measure this on a slower machine and not the Broadwell that we normally use. The rings are now called xsk_ring and when a producer operates on it, it is xsk_ring_prod and for a consumer it is xsk_ring_cons. This way we can get some compile time error checking that the rings are used correctly. Comments and contenplations: * The current behaviour is that the library loads an XDP program (if requested to do so) but the clean up of this program is left to the application. It would be possible to implement this cleanup in the library, but it would require state to be kept on netdev level, which there is none at the moment, and the synchronization of this between processes. All this adding complexity. But when we get an XDP program per queue id, then it becomes trivial to also remove the XDP program when the application exits. This proposal from Jesper, Björn and others will also improve the performance of libbpf, since most of the XDP program code can be removed when that feature is supported. * In a future release, I am planning on adding a higher level data plane interface too. This will be based around recvmsg and sendmsg with the use of struct iovec for batching, without the user having to know anything about the underlying four rings of an AF_XDP socket. There will be one semantic difference though from the standard recvmsg and that is that the kernel will fill in the iovecs instead of the application. But the rest should be the same as the libc versions so that application writers feel at home. Patch 1: moves the pr_*() functions to a separate header file so that the AF_XDP code can also use them. Patch 2: adds AF_XDP support in libbpf Patch 3: updates the xdpsock sample application to use the libbpf functions. Changes v2 to v3: * Added automatic loading of a simple XDP program that routes all traffic on a queue up to the AF_XDP socket. This program loading can be disabled. * Updated function names to be consistent with the libbpf naming convention * Moved all code to xsk.[ch] * Removed all the XDP program loading code from the sample since this is now done by libbpf * The initialization functions now return a handle as suggested by Alexei * const statements added in the API where applicable. Changes v1 to v2: * Fixed cleanup of library state on error. * Moved API to initial version * Prefixed all public functions by xsk__ instead of xsk_ * Added comment about changed default ring sizes, batch size and umem size in the sample application commit message * The library now only creates an Rx or Tx ring if the respective parameter is != NULL Note that for zero-copy to work on FVL you need the following patch: "i40e: fix potential RX buffer starvation for AF_XDP". For ixgbe, you need a similar patch called "ixgbe: fix potential RX buffer starvation for AF_XDP" to make ZC work with libbpf. Both have been submitted to net. I based this patch set on bpf-next commit 3d2af27a84a8 ("Merge branch 'bpf-flow-dissector-tests'") Thanks: Magnus Magnus Karlsson (3): libbpf: move pr_*() functions to common header file libbpf: add support for using AF_XDP sockets samples/bpf: convert xdpsock to use libbpf for AF_XDP access samples/bpf/Makefile | 1 - samples/bpf/xdpsock.h | 11 - samples/bpf/xdpsock_kern.c | 56 --- samples/bpf/xdpsock_user.c | 695 +++++++++------------------------ tools/include/uapi/linux/if_xdp.h | 78 ++++ tools/lib/bpf/Build | 2 +- tools/lib/bpf/Makefile | 5 +- tools/lib/bpf/README.rst | 11 +- tools/lib/bpf/libbpf.c | 30 +- tools/lib/bpf/libbpf.map | 12 + tools/lib/bpf/libbpf_internal.h | 41 ++ tools/lib/bpf/xsk.c | 794 ++++++++++++++++++++++++++++++++++++++ tools/lib/bpf/xsk.h | 132 +++++++ 13 files changed, 1260 insertions(+), 608 deletions(-) delete mode 100644 samples/bpf/xdpsock.h delete mode 100644 samples/bpf/xdpsock_kern.c create mode 100644 tools/include/uapi/linux/if_xdp.h create mode 100644 tools/lib/bpf/libbpf_internal.h create mode 100644 tools/lib/bpf/xsk.c create mode 100644 tools/lib/bpf/xsk.h -- 2.7.4