All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/3] libbpf: adding AF_XDP support
@ 2019-01-29 15:12 Magnus Karlsson
  2019-01-29 15:12 ` [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file Magnus Karlsson
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Magnus Karlsson @ 2019-01-29 15:12 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jakub.kicinski, bjorn.topel, qi.z.zhang
  Cc: brouer

This patch proposes to add AF_XDP support to libbpf. The main reason
for this is to facilitate writing applications that use AF_XDP by
offering higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The proposed interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting up umems
  and AF_XDP sockets. This interface also loads a simple XDP program
  that routes all traffic on a queue up to the AF_XDP socket.

The sample program has been updated to use this new interface and in
that process it lost roughly 300 lines of code. I cannot detect any
performance degradations due to the use of this library instead of the
previous functions that were inlined in the sample application. But I
did measure this on a slower machine and not the Broadwell that we
normally use.

The rings are now called xsk_ring and when a producer operates on it, it
is xsk_ring_prod and for a consumer it is xsk_ring_cons. This way we
can get some compile time error checking that the rings are used
correctly.

Comments and contenplations:

* The current behaviour is that the library loads an XDP program (if
  requested to do so) but the clean up of this program is left to the
  application. It would be possible to implement this cleanup in the
  library, but it would require state to be kept on netdev level,
  which there is none at the moment, and the synchronization of this
  between processes. All this adding complexity. But when we get an
  XDP program per queue id, then it becomes trivial to also remove the
  XDP program when the application exits. This proposal from Jesper,
  Björn and others will also improve the performance of libbpf, since
  most of the XDP program code can be removed when that feature is
  supported.

* In a future release, I am planning on adding a higher level data
  plane interface too. This will be based around recvmsg and sendmsg
  with the use of struct iovec for batching, without the user having
  to know anything about the underlying four rings of an AF_XDP
  socket. There will be one semantic difference though from the
  standard recvmsg and that is that the kernel will fill in the iovecs
  instead of the application. But the rest should be the same as the
  libc versions so that application writers feel at home.

Patch 1: moves the pr_*() functions to a separate header file so that
         the AF_XDP code can also use them.
Patch 2: adds AF_XDP support in libbpf
Patch 3: updates the xdpsock sample application to use the libbpf functions.

Changes v2 to v3:
  * Added automatic loading of a simple XDP program that routes all
    traffic on a queue up to the AF_XDP socket. This program loading
    can be disabled.
  * Updated function names to be consistent with the libbpf naming
    convention
  * Moved all code to xsk.[ch]
  * Removed all the XDP program loading code from the sample since
    this is now done by libbpf
  * The initialization functions now return a handle as suggested by
    Alexei
  * const statements added in the API where applicable.
Changes v1 to v2:
  * Fixed cleanup of library state on error.
  * Moved API to initial version
  * Prefixed all public functions by xsk__ instead of xsk_
  * Added comment about changed default ring sizes, batch size and umem
    size in the sample application commit message
  * The library now only creates an Rx or Tx ring if the respective
    parameter is != NULL

Note that for zero-copy to work on FVL you need the following patch:
"i40e: fix potential RX buffer starvation for AF_XDP". For ixgbe,
you need a similar patch called "ixgbe: fix potential RX buffer
starvation for AF_XDP" to make ZC work with libbpf. Both have been
submitted to net.

I based this patch set on bpf-next commit 3d2af27a84a8 ("Merge branch 'bpf-flow-dissector-tests'")

Thanks: Magnus


Magnus Karlsson (3):
  libbpf: move pr_*() functions to common header file
  libbpf: add support for using AF_XDP sockets
  samples/bpf: convert xdpsock to use libbpf for AF_XDP access

 samples/bpf/Makefile              |   1 -
 samples/bpf/xdpsock.h             |  11 -
 samples/bpf/xdpsock_kern.c        |  56 ---
 samples/bpf/xdpsock_user.c        | 695 +++++++++------------------------
 tools/include/uapi/linux/if_xdp.h |  78 ++++
 tools/lib/bpf/Build               |   2 +-
 tools/lib/bpf/Makefile            |   5 +-
 tools/lib/bpf/README.rst          |  11 +-
 tools/lib/bpf/libbpf.c            |  30 +-
 tools/lib/bpf/libbpf.map          |  12 +
 tools/lib/bpf/libbpf_internal.h   |  41 ++
 tools/lib/bpf/xsk.c               | 794 ++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/xsk.h               | 132 +++++++
 13 files changed, 1260 insertions(+), 608 deletions(-)
 delete mode 100644 samples/bpf/xdpsock.h
 delete mode 100644 samples/bpf/xdpsock_kern.c
 create mode 100644 tools/include/uapi/linux/if_xdp.h
 create mode 100644 tools/lib/bpf/libbpf_internal.h
 create mode 100644 tools/lib/bpf/xsk.c
 create mode 100644 tools/lib/bpf/xsk.h

--
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
  2019-01-29 15:12 [PATCH bpf-next v3 0/3] libbpf: adding AF_XDP support Magnus Karlsson
@ 2019-01-29 15:12 ` Magnus Karlsson
  2019-01-31 10:04   ` Daniel Borkmann
                     ` (2 more replies)
  2019-01-29 15:12 ` [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets Magnus Karlsson
  2019-01-29 15:12 ` [PATCH bpf-next v3 3/3] samples/bpf: convert xdpsock to use libbpf for AF_XDP access Magnus Karlsson
  2 siblings, 3 replies; 16+ messages in thread
From: Magnus Karlsson @ 2019-01-29 15:12 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jakub.kicinski, bjorn.topel, qi.z.zhang
  Cc: brouer

Move the pr_*() functions in libbpf.c to a common header file called
libbpf_internal.h. This so that the later libbpf AF_XDP helper library
code in xsk.c can use these printing functions too.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 tools/lib/bpf/libbpf.c          | 30 +-----------------------------
 tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 29 deletions(-)
 create mode 100644 tools/lib/bpf/libbpf_internal.h

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2ccde17..1d7fe26 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -39,6 +39,7 @@
 #include <gelf.h>
 
 #include "libbpf.h"
+#include "libbpf_internal.h"
 #include "bpf.h"
 #include "btf.h"
 #include "str_error.h"
@@ -51,34 +52,6 @@
 #define BPF_FS_MAGIC		0xcafe4a11
 #endif
 
-#define __printf(a, b)	__attribute__((format(printf, a, b)))
-
-__printf(1, 2)
-static int __base_pr(const char *format, ...)
-{
-	va_list args;
-	int err;
-
-	va_start(args, format);
-	err = vfprintf(stderr, format, args);
-	va_end(args);
-	return err;
-}
-
-static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
-static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
-static __printf(1, 2) libbpf_print_fn_t __pr_debug;
-
-#define __pr(func, fmt, ...)	\
-do {				\
-	if ((func))		\
-		(func)("libbpf: " fmt, ##__VA_ARGS__); \
-} while (0)
-
-#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
-#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
-#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)
-
 void libbpf_set_print(libbpf_print_fn_t warn,
 		      libbpf_print_fn_t info,
 		      libbpf_print_fn_t debug)
@@ -96,7 +69,6 @@ void libbpf_set_print(libbpf_print_fn_t warn,
 		goto out;		\
 } while(0)
 
-
 /* Copied from tools/perf/util/util.h */
 #ifndef zfree
 # define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
new file mode 100644
index 0000000..951c078
--- /dev/null
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+
+/*
+ * Common internal libbpf functions and definitions.
+ *
+ * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
+ * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
+ * Copyright (C) 2015 Huawei Inc.
+ */
+#ifndef __LIBBPF_LIBBPF_INTERNAL_H
+#define __LIBBPF_LIBBPF_INTERNAL_H
+
+#define __printf(a, b)	__attribute__((format(printf, a, b)))
+
+__printf(1, 2)
+static int __base_pr(const char *format, ...)
+{
+	va_list args;
+	int err;
+
+	va_start(args, format);
+	err = vfprintf(stderr, format, args);
+	va_end(args);
+	return err;
+}
+
+static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
+static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
+static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_debug;
+
+#define __pr(func, fmt, ...)	\
+do {				\
+	if ((func))		\
+		(func)("libbpf: " fmt, ##__VA_ARGS__); \
+} while (0)
+
+#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
+#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
+#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)
+
+#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
  2019-01-29 15:12 [PATCH bpf-next v3 0/3] libbpf: adding AF_XDP support Magnus Karlsson
  2019-01-29 15:12 ` [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file Magnus Karlsson
@ 2019-01-29 15:12 ` Magnus Karlsson
  2019-01-31 10:05   ` Daniel Borkmann
  2019-01-31 10:29   ` Daniel Borkmann
  2019-01-29 15:12 ` [PATCH bpf-next v3 3/3] samples/bpf: convert xdpsock to use libbpf for AF_XDP access Magnus Karlsson
  2 siblings, 2 replies; 16+ messages in thread
From: Magnus Karlsson @ 2019-01-29 15:12 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jakub.kicinski, bjorn.topel, qi.z.zhang
  Cc: brouer

This commit adds AF_XDP support to libbpf. The main reason for this is
to facilitate writing applications that use AF_XDP by offering
higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting
  up umems and af_xdp sockets as well as a simple XDP program.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 tools/include/uapi/linux/if_xdp.h |  78 ++++
 tools/lib/bpf/Build               |   2 +-
 tools/lib/bpf/Makefile            |   5 +-
 tools/lib/bpf/README.rst          |  11 +-
 tools/lib/bpf/libbpf.map          |  12 +
 tools/lib/bpf/xsk.c               | 794 ++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/xsk.h               | 132 +++++++
 7 files changed, 1031 insertions(+), 3 deletions(-)
 create mode 100644 tools/include/uapi/linux/if_xdp.h
 create mode 100644 tools/lib/bpf/xsk.c
 create mode 100644 tools/lib/bpf/xsk.h

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
new file mode 100644
index 0000000..caed8b1
--- /dev/null
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * if_xdp: XDP socket user-space interface
+ * Copyright(c) 2018 Intel Corporation.
+ *
+ * Author(s): Björn Töpel <bjorn.topel@intel.com>
+ *	      Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#ifndef _LINUX_IF_XDP_H
+#define _LINUX_IF_XDP_H
+
+#include <linux/types.h>
+
+/* Options for the sxdp_flags field */
+#define XDP_SHARED_UMEM	(1 << 0)
+#define XDP_COPY	(1 << 1) /* Force copy-mode */
+#define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
+
+struct sockaddr_xdp {
+	__u16 sxdp_family;
+	__u16 sxdp_flags;
+	__u32 sxdp_ifindex;
+	__u32 sxdp_queue_id;
+	__u32 sxdp_shared_umem_fd;
+};
+
+struct xdp_ring_offset {
+	__u64 producer;
+	__u64 consumer;
+	__u64 desc;
+};
+
+struct xdp_mmap_offsets {
+	struct xdp_ring_offset rx;
+	struct xdp_ring_offset tx;
+	struct xdp_ring_offset fr; /* Fill */
+	struct xdp_ring_offset cr; /* Completion */
+};
+
+/* XDP socket options */
+#define XDP_MMAP_OFFSETS		1
+#define XDP_RX_RING			2
+#define XDP_TX_RING			3
+#define XDP_UMEM_REG			4
+#define XDP_UMEM_FILL_RING		5
+#define XDP_UMEM_COMPLETION_RING	6
+#define XDP_STATISTICS			7
+
+struct xdp_umem_reg {
+	__u64 addr; /* Start of packet data area */
+	__u64 len; /* Length of packet data area */
+	__u32 chunk_size;
+	__u32 headroom;
+};
+
+struct xdp_statistics {
+	__u64 rx_dropped; /* Dropped for reasons other than invalid desc */
+	__u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
+	__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
+};
+
+/* Pgoff for mmaping the rings */
+#define XDP_PGOFF_RX_RING			  0
+#define XDP_PGOFF_TX_RING		 0x80000000
+#define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
+#define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
+
+/* Rx/Tx descriptor */
+struct xdp_desc {
+	__u64 addr;
+	__u32 len;
+	__u32 options;
+};
+
+/* UMEM descriptor is __u64 */
+
+#endif /* _LINUX_IF_XDP_H */
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index bfd9bfc..ee9d536 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o
+libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o libbpf_probes.o xsk.o
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 8479162..761691b 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -164,6 +164,9 @@ $(BPF_IN): force elfdep bpfdep
 	@(test -f ../../include/uapi/linux/if_link.h -a -f ../../../include/uapi/linux/if_link.h && ( \
 	(diff -B ../../include/uapi/linux/if_link.h ../../../include/uapi/linux/if_link.h >/dev/null) || \
 	echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h'" >&2 )) || true
+	@(test -f ../../include/uapi/linux/if_xdp.h -a -f ../../../include/uapi/linux/if_xdp.h && ( \
+	(diff -B ../../include/uapi/linux/if_xdp.h ../../../include/uapi/linux/if_xdp.h >/dev/null) || \
+	echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'" >&2 )) || true
 	$(Q)$(MAKE) $(build)=libbpf
 
 $(OUTPUT)libbpf.so: $(BPF_IN)
@@ -174,7 +177,7 @@ $(OUTPUT)libbpf.a: $(BPF_IN)
 	$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
 
 $(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
-	$(QUIET_LINK)$(CXX) $^ -lelf -o $@
+	$(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@
 
 check: check_abi
 
diff --git a/tools/lib/bpf/README.rst b/tools/lib/bpf/README.rst
index 607aae4..45e3788 100644
--- a/tools/lib/bpf/README.rst
+++ b/tools/lib/bpf/README.rst
@@ -9,7 +9,7 @@ described here. It's recommended to follow these conventions whenever a
 new function or type is added to keep libbpf API clean and consistent.
 
 All types and functions provided by libbpf API should have one of the
-following prefixes: ``bpf_``, ``btf_``, ``libbpf_``.
+following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``.
 
 System call wrappers
 --------------------
@@ -62,6 +62,15 @@ Auxiliary functions and types that don't fit well in any of categories
 described above should have ``libbpf_`` prefix, e.g.
 ``libbpf_get_error`` or ``libbpf_prog_type_by_name``.
 
+AF_XDP functions
+-------------------
+
+AF_XDP functions should have an ``xsk_`` prefix, e.g.
+``xsk_umem__get_data`` or ``xsk_umem__create``. The interface consists
+of both low-level ring access functions and high-level configuration
+functions. These can be mixed and matched. Note that these functions
+are not reentrant for performance reasons.
+
 libbpf ABI
 ==========
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 266bc95..d294843 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -121,6 +121,18 @@ LIBBPF_0.0.1 {
 		libbpf_prog_type_by_name;
 		libbpf_set_print;
 		libbpf_strerror;
+		xsk_ring_cons__peek;
+		xsk_ring_cons__release;
+		xsk_ring_prod__reserve;
+		xsk_ring_prod__submit;
+		xsk_umem__get_data;
+		xsk_umem__get_data_raw;
+		xsk_umem__create;
+		xsk_socket__create;
+		xsk_umem__delete;
+		xsk_socket__delete;
+		xsk_umem__fd;
+		xsk_socket__fd;
 	local:
 		*;
 };
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
new file mode 100644
index 0000000..3bcade9
--- /dev/null
+++ b/tools/lib/bpf/xsk.c
@@ -0,0 +1,794 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+
+/*
+ * AF_XDP user-space access library.
+ *
+ * Copyright(c) 2018 - 2019 Intel Corporation.
+ *
+ * Author(s): Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <arpa/inet.h>
+#include <asm/barrier.h>
+#include <linux/compiler.h>
+#include <linux/filter.h>
+#include <linux/if_ether.h>
+#include <linux/if_link.h>
+#include <linux/if_packet.h>
+#include <linux/if_xdp.h>
+#include <net/if.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+
+#include "bpf.h"
+#include "libbpf.h"
+#include "libbpf_internal.h"
+#include "nlattr.h"
+#include "xsk.h"
+
+#ifndef SOL_XDP
+ #define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+ #define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+ #define PF_XDP AF_XDP
+#endif
+
+struct xsk_umem {
+	struct xsk_ring_prod *fill;
+	struct xsk_ring_cons *comp;
+	char *umem_area;
+	struct xsk_umem_config config;
+	int fd;
+	int refcount;
+};
+
+struct xsk_socket {
+	struct xsk_ring_cons *rx;
+	struct xsk_ring_prod *tx;
+	__u64 outstanding_tx;
+	struct xsk_umem *umem;
+	struct xsk_socket_config config;
+	int fd;
+	int xsks_map;
+	int ifindex;
+	int prog_fd;
+	int qidconf_map_fd;
+	int xsks_map_fd;
+	__u32 queue_id;
+};
+
+struct xsk_nl_info {
+	bool xdp_prog_attached;
+	int fd;
+};
+
+#define MAX_QUEUES 128
+
+/* For 32-bit systems, we need to use mmap2 as the offsets are 64-bit.
+ * Unfortunately, it is not part of glibc.
+ */
+static inline void *xsk_mmap(void *addr, size_t length, int prot, int flags,
+			     int fd, __u64 offset)
+{
+#ifdef __NR_mmap2
+	unsigned int page_shift = __builtin_ffs(getpagesize()) - 1;
+	long ret = syscall(__NR_mmap2, addr, length, prot, flags, fd,
+			   (off_t)(offset >> page_shift));
+
+	return (void *)ret;
+#else
+	return mmap(addr, length, prot, flags, fd, offset);
+#endif
+}
+
+static __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
+{
+	__u32 free_entries = r->cached_cons - r->cached_prod;
+
+	if (free_entries >= nb)
+		return free_entries;
+
+	/* Refresh the local tail pointer.
+	 * cached_cons is r->size bigger than the real consumer pointer so
+	 * that this addition can be avoided in the more frequently
+	 * executed code that computs free_entries in the beginning of
+	 * this function. Without this optimization it whould have been
+	 * free_entries = r->cached_prod - r->cached_cons + r->size.
+	 */
+	r->cached_cons = *r->consumer + r->size;
+
+	return r->cached_cons - r->cached_prod;
+}
+
+static __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
+{
+	__u32 entries = r->cached_prod - r->cached_cons;
+
+	if (entries == 0) {
+		r->cached_prod = *r->producer;
+		entries = r->cached_prod - r->cached_cons;
+	}
+
+	return (entries > nb) ? nb : entries;
+}
+
+size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod, size_t nb, __u32 *idx)
+{
+	if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
+		return 0;
+
+	*idx = prod->cached_prod;
+	prod->cached_prod += nb;
+
+	return nb;
+}
+
+void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
+{
+	/* Make sure everything has been written to the ring before signalling
+	 * this to the kernel.
+	 */
+	smp_wmb();
+
+	*prod->producer += nb;
+}
+
+size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, size_t nb, __u32 *idx)
+{
+	size_t entries = xsk_cons_nb_avail(cons, nb);
+
+	if (likely(entries > 0)) {
+		/* Make sure we do not speculatively read the data before
+		 * we have received the packet buffers from the ring.
+		 */
+		smp_rmb();
+
+		*idx = cons->cached_cons;
+		cons->cached_cons += entries;
+	}
+
+	return entries;
+}
+
+void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
+{
+	*cons->consumer += nb;
+}
+
+void *xsk_umem__get_data_raw(void *umem_area, __u64 addr)
+{
+	return &((char *)umem_area)[addr];
+}
+
+void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr)
+{
+	return &((char *)(umem->umem_area))[addr];
+}
+
+int xsk_umem__fd(const struct xsk_umem *umem)
+{
+	return umem ? umem->fd : -EINVAL;
+}
+
+int xsk_socket__fd(const struct xsk_socket *xsk)
+{
+	return xsk ? xsk->fd : -EINVAL;
+}
+
+static bool xsk_page_aligned(void *buffer)
+{
+	unsigned long addr = (unsigned long)buffer;
+
+	return !(addr & (getpagesize() - 1));
+}
+
+static void xsk_set_umem_config(struct xsk_umem_config *cfg,
+				const struct xsk_umem_config *usr_cfg)
+{
+	if (!usr_cfg) {
+		cfg->fill_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+		cfg->comp_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+		cfg->frame_size = XSK_UMEM__DEFAULT_FRAME_SIZE;
+		cfg->frame_headroom = XSK_UMEM__DEFAULT_FRAME_HEADROOM;
+		return;
+	}
+
+	cfg->fill_size = usr_cfg->fill_size;
+	cfg->comp_size = usr_cfg->comp_size;
+	cfg->frame_size = usr_cfg->frame_size;
+	cfg->frame_headroom = usr_cfg->frame_headroom;
+}
+
+static void xsk_set_xdp_socket_config(struct xsk_socket_config *cfg,
+				      const struct xsk_socket_config *usr_cfg)
+{
+	if (!usr_cfg) {
+		cfg->rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+		cfg->tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+		cfg->libbpf_flags = 0;
+		cfg->xdp_flags = 0;
+		cfg->bind_flags = 0;
+		return;
+	}
+
+	cfg->rx_size = usr_cfg->rx_size;
+	cfg->tx_size = usr_cfg->tx_size;
+	cfg->libbpf_flags = usr_cfg->libbpf_flags;
+	cfg->xdp_flags = usr_cfg->xdp_flags;
+	cfg->bind_flags = usr_cfg->bind_flags;
+}
+
+int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
+		     struct xsk_ring_prod *fill, struct xsk_ring_cons *comp,
+		     const struct xsk_umem_config *usr_config)
+{
+	struct xdp_mmap_offsets off;
+	struct xdp_umem_reg mr;
+	struct xsk_umem *umem;
+	socklen_t optlen;
+	int err;
+	void *map;
+
+	if (!umem_area || !umem_ptr || !fill || !comp)
+		return -EFAULT;
+	if (!size && !xsk_page_aligned(umem_area))
+		return -EINVAL;
+
+	umem = calloc(1, sizeof(*umem));
+	if (!umem)
+		return -ENOMEM;
+
+	umem->fd = socket(AF_XDP, SOCK_RAW, 0);
+	if (umem->fd < 0) {
+		err = -errno;
+		goto out_umem_alloc;
+	}
+
+	umem->umem_area = umem_area;
+	xsk_set_umem_config(&umem->config, usr_config);
+
+	mr.addr = (uintptr_t)umem_area;
+	mr.len = size;
+	mr.chunk_size = umem->config.frame_size;
+	mr.headroom = umem->config.frame_headroom;
+
+	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
+	if (err) {
+		err = -errno;
+		goto out_socket;
+	}
+	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_FILL_RING,
+			 &umem->config.fill_size,
+			 sizeof(umem->config.fill_size));
+	if (err) {
+		err = -errno;
+		goto out_socket;
+	}
+	err = setsockopt(umem->fd, SOL_XDP, XDP_UMEM_COMPLETION_RING,
+			 &umem->config.comp_size,
+			 sizeof(umem->config.comp_size));
+	if (err) {
+		err = -errno;
+		goto out_socket;
+	}
+
+	optlen = sizeof(off);
+	err = getsockopt(umem->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+	if (err) {
+		err = -errno;
+		goto out_socket;
+	}
+
+	map = xsk_mmap(NULL, off.fr.desc +
+		       umem->config.fill_size * sizeof(__u64),
+		       PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+		       umem->fd, XDP_UMEM_PGOFF_FILL_RING);
+	if (map == MAP_FAILED) {
+		err = -errno;
+		goto out_socket;
+	}
+
+	umem->fill = fill;
+	fill->mask = umem->config.fill_size - 1;
+	fill->size = umem->config.fill_size;
+	fill->producer = map + off.fr.producer;
+	fill->consumer = map + off.fr.consumer;
+	fill->ring = map + off.fr.desc;
+	fill->cached_cons = umem->config.fill_size;
+
+	map = xsk_mmap(NULL,
+		       off.cr.desc + umem->config.comp_size * sizeof(__u64),
+		       PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+		       umem->fd, XDP_UMEM_PGOFF_COMPLETION_RING);
+	if (map == MAP_FAILED) {
+		err = -errno;
+		goto out_mmap;
+	}
+
+	umem->comp = comp;
+	comp->mask = umem->config.comp_size - 1;
+	comp->size = umem->config.comp_size;
+	comp->producer = map + off.cr.producer;
+	comp->consumer = map + off.cr.consumer;
+	comp->ring = map + off.cr.desc;
+
+	*umem_ptr = umem;
+	return 0;
+
+out_mmap:
+	munmap(umem->fill,
+	       off.fr.desc + umem->config.fill_size * sizeof(__u64));
+out_socket:
+	close(umem->fd);
+out_umem_alloc:
+	free(umem);
+	return err;
+}
+
+static int xsk_parse_nl(void *cookie, void *msg, struct nlattr **tb)
+{
+	struct nlattr *tb_parsed[IFLA_XDP_MAX + 1];
+	struct xsk_nl_info *nl_info = cookie;
+	unsigned char mode;
+	int err;
+
+	(void)msg;
+
+	nl_info->xdp_prog_attached = false;
+	if (!tb[IFLA_XDP])
+		return 0;
+
+	err = libbpf_nla_parse_nested(tb_parsed, IFLA_XDP_MAX, tb[IFLA_XDP],
+				      NULL);
+	if (err)
+		return err;
+
+	if (!tb_parsed[IFLA_XDP_ATTACHED] || !tb_parsed[IFLA_XDP_FD])
+		return 0;
+
+	mode = libbpf_nla_getattr_u8(tb_parsed[IFLA_XDP_ATTACHED]);
+	if (mode == XDP_ATTACHED_NONE)
+		return 0;
+
+	nl_info->xdp_prog_attached = true;
+	nl_info->fd = libbpf_nla_getattr_u32(tb_parsed[IFLA_XDP_FD]);
+	return 0;
+}
+
+static bool xsk_xdp_prog_attached(struct xsk_socket *xsk)
+{
+	struct xsk_nl_info nl_info;
+	unsigned int nl_pid;
+	char err_buf[256];
+	int sock, err;
+
+	sock = libbpf_netlink_open(&nl_pid);
+	if (sock < 0)
+		return false;
+
+	nl_info.xdp_prog_attached = false;
+	nl_info.fd = 0;
+
+	err = libbpf_nl_get_link(sock, nl_pid, xsk_parse_nl, &nl_info);
+	if (err) {
+		libbpf_strerror(err, err_buf, sizeof(err_buf));
+		pr_warning("Error:\n%s\n", err_buf);
+		return false;
+	}
+
+	xsk->prog_fd = nl_info.fd;
+	return nl_info.xdp_prog_attached;
+}
+
+static int xsk_load_xdp_prog(struct xsk_socket *xsk)
+{
+	char bpf_log_buf[BPF_LOG_BUF_SIZE];
+	int err, prog_fd;
+
+	/* This is the C-program:
+	 * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
+	 * {
+	 *     int *qidconf, index = ctx->rx_queue_index;
+	 *
+	 *     // A set entry here means that the correspnding queue_id
+	 *     // has an active AF_XDP socket bound to it.
+	 *     qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
+	 *     if (!qidconf)
+	 *         return XDP_ABORTED;
+	 *
+	 *     if (*qidconf)
+	 *         return bpf_redirect_map(&xsks_map, index, 0);
+	 *
+	 *     return XDP_PASS;
+	 * }
+	 */
+	struct bpf_insn prog[] = {
+		/* r1 = *(u32 *)(r1 + 16) */
+		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
+		/* *(u32 *)(r10 - 4) = r1 */
+		BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
+		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+		BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
+		BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+		BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		/* if r1 == 0 goto +8 */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
+		BPF_MOV32_IMM(BPF_REG_0, 2),
+		/* r1 = *(u32 *)(r1 + 0) */
+		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
+		/* if r1 == 0 goto +5 */
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
+		/* r2 = *(u32 *)(r10 - 4) */
+		BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
+		BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
+		BPF_MOV32_IMM(BPF_REG_3, 0),
+		BPF_EMIT_CALL(BPF_FUNC_redirect_map),
+		/* The jumps are to this instruction */
+		BPF_EXIT_INSN(),
+	};
+	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
+
+	prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
+				   "LGPL-2.1 or BSD-2-Clause", 0, bpf_log_buf,
+				   BPF_LOG_BUF_SIZE);
+	if (prog_fd < 0) {
+		pr_warning("BPF log buffer:\n%s", bpf_log_buf);
+		return prog_fd;
+	}
+
+	err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags);
+	if (err)
+		return err;
+
+	xsk->prog_fd = prog_fd;
+	return 0;
+}
+
+static int xsk_create_bpf_maps(struct xsk_socket *xsk)
+{
+	int fd;
+
+	fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map",
+				 sizeof(int), sizeof(int), MAX_QUEUES, 0);
+	if (fd < 0)
+		return fd;
+	xsk->qidconf_map_fd = fd;
+
+	fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
+				 sizeof(int), sizeof(int), MAX_QUEUES, 0);
+	if (fd < 0)
+		return fd;
+	xsk->xsks_map_fd = fd;
+
+	return 0;
+}
+
+static int xsk_update_bpf_maps(struct xsk_socket *xsk, int qidconf_value,
+			       int xsks_value)
+{
+	bool qidconf_map_updated = false, xsks_map_updated = false;
+	struct bpf_prog_info prog_info = {};
+	__u32 prog_len = sizeof(prog_info);
+	struct bpf_map_info map_info;
+	__u32 map_len = sizeof(map_info);
+	__u32 *map_ids;
+	int reset_value = 0;
+	__u32 num_maps;
+	unsigned int i;
+	int err;
+
+	err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
+	if (err)
+		return err;
+
+	num_maps = prog_info.nr_map_ids;
+
+	map_ids = malloc(prog_info.nr_map_ids * sizeof(*map_ids));
+	if (!map_ids)
+		return -ENOMEM;
+
+	memset(&prog_info, 0, prog_len);
+	prog_info.nr_map_ids = num_maps;
+	prog_info.map_ids = (__u64)(unsigned long)map_ids;
+
+	err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
+	if (err)
+		return err;
+
+	for (i = 0; i < prog_info.nr_map_ids; i++) {
+		int fd;
+
+		fd = bpf_map_get_fd_by_id(map_ids[i]);
+		if (fd < 0) {
+			err = -errno;
+			goto out;
+		}
+
+		err = bpf_obj_get_info_by_fd(fd, &map_info, &map_len);
+		if (err)
+			goto out;
+
+		if (!strcmp(map_info.name, "qidconf_map")) {
+			err = bpf_map_update_elem(fd, &xsk->queue_id,
+						  &qidconf_value, 0);
+			if (err)
+				goto out;
+			qidconf_map_updated = true;
+			xsk->qidconf_map_fd = fd;
+		} else if (!strcmp(map_info.name, "xsks_map")) {
+			err = bpf_map_update_elem(fd, &xsk->queue_id,
+						  &xsks_value, 0);
+			if (err)
+				goto out;
+			xsks_map_updated = true;
+			xsk->xsks_map_fd = fd;
+		}
+
+		if (qidconf_map_updated && xsks_map_updated)
+			break;
+	}
+
+	if (!(qidconf_map_updated && xsks_map_updated)) {
+		err = -ENOENT;
+		goto out;
+	}
+
+	return 0;
+
+out:
+	if (qidconf_map_updated)
+		(void)bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id,
+					  &reset_value, 0);
+	if (xsks_map_updated)
+		(void)bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id,
+					  &reset_value, 0);
+	return err;
+}
+
+static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
+{
+	bool prog_loaded = false;
+	int err;
+
+	if (!xsk_xdp_prog_attached(xsk)) {
+		err = xsk_create_bpf_maps(xsk);
+		if (err)
+			goto out_load;
+
+		err = xsk_load_xdp_prog(xsk);
+		if (err)
+			return err;
+		prog_loaded = true;
+	}
+
+	err = xsk_update_bpf_maps(xsk, true, xsk->fd);
+	if (err)
+		goto out_load;
+
+	return 0;
+
+out_load:
+	if (prog_loaded)
+		close(xsk->prog_fd);
+	return err;
+}
+
+int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
+		       __u32 queue_id, struct xsk_umem *umem,
+		       struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
+		       const struct xsk_socket_config *usr_config)
+{
+	struct sockaddr_xdp sxdp = {};
+	struct xdp_mmap_offsets off;
+	struct xsk_socket *xsk;
+	socklen_t optlen;
+	void *map;
+	int err;
+
+	if (!umem || !xsk_ptr || !rx || !tx)
+		return -EFAULT;
+
+	if (umem->refcount) {
+		pr_warning("Error: shared umems not supported by libbpf.\n");
+		return -EBUSY;
+	}
+
+	xsk = calloc(1, sizeof(*xsk));
+	if (!xsk)
+		return -ENOMEM;
+
+	if (umem->refcount++ > 0) {
+		xsk->fd = socket(AF_XDP, SOCK_RAW, 0);
+		if (xsk->fd < 0) {
+			err = -errno;
+			goto out_xsk_alloc;
+		}
+	} else {
+		xsk->fd = umem->fd;
+	}
+
+	xsk->outstanding_tx = 0;
+	xsk->queue_id = queue_id;
+	xsk->umem = umem;
+	xsk->ifindex = if_nametoindex(ifname);
+	if (!xsk->ifindex) {
+		err = -errno;
+		goto out_socket;
+	}
+
+	xsk_set_xdp_socket_config(&xsk->config, usr_config);
+
+	if (rx) {
+		err = setsockopt(xsk->fd, SOL_XDP, XDP_RX_RING,
+				 &xsk->config.rx_size,
+				 sizeof(xsk->config.rx_size));
+		if (err) {
+			err = -errno;
+			goto out_socket;
+		}
+	}
+	if (tx) {
+		err = setsockopt(xsk->fd, SOL_XDP, XDP_TX_RING,
+				 &xsk->config.tx_size,
+				 sizeof(xsk->config.tx_size));
+		if (err) {
+			err = -errno;
+			goto out_socket;
+		}
+	}
+
+	optlen = sizeof(off);
+	err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+	if (err) {
+		err = -errno;
+		goto out_socket;
+	}
+
+	if (rx) {
+		map = xsk_mmap(NULL, off.rx.desc +
+			       xsk->config.rx_size * sizeof(struct xdp_desc),
+			       PROT_READ | PROT_WRITE,
+			       MAP_SHARED | MAP_POPULATE,
+			       xsk->fd, XDP_PGOFF_RX_RING);
+		if (map == MAP_FAILED) {
+			err = -errno;
+			goto out_socket;
+		}
+
+		rx->mask = xsk->config.rx_size - 1;
+		rx->size = xsk->config.rx_size;
+		rx->producer = map + off.rx.producer;
+		rx->consumer = map + off.rx.consumer;
+		rx->ring = map + off.rx.desc;
+	}
+	xsk->rx = rx;
+
+	if (tx) {
+		map = xsk_mmap(NULL, off.tx.desc +
+			       xsk->config.tx_size * sizeof(struct xdp_desc),
+			       PROT_READ | PROT_WRITE,
+			       MAP_SHARED | MAP_POPULATE,
+			       xsk->fd, XDP_PGOFF_TX_RING);
+		if (map == MAP_FAILED) {
+			err = -errno;
+			goto out_mmap_rx;
+		}
+
+		tx->mask = xsk->config.tx_size - 1;
+		tx->size = xsk->config.tx_size;
+		tx->producer = map + off.tx.producer;
+		tx->consumer = map + off.tx.consumer;
+		tx->ring = map + off.tx.desc;
+		tx->cached_cons = xsk->config.tx_size;
+	}
+	xsk->tx = tx;
+
+	sxdp.sxdp_family = PF_XDP;
+	sxdp.sxdp_ifindex = xsk->ifindex;
+	sxdp.sxdp_queue_id = xsk->queue_id;
+	sxdp.sxdp_flags = xsk->config.bind_flags;
+
+	err = bind(xsk->fd, (struct sockaddr *)&sxdp, sizeof(sxdp));
+	if (err) {
+		err = -errno;
+		goto out_mmap_tx;
+	}
+
+	if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
+		err = xsk_setup_xdp_prog(xsk);
+		if (err)
+			goto out_mmap_tx;
+	}
+
+	*xsk_ptr = xsk;
+	return 0;
+
+out_mmap_tx:
+	if (tx)
+		munmap(xsk->tx,
+		       off.tx.desc +
+		       xsk->config.tx_size * sizeof(struct xdp_desc));
+out_mmap_rx:
+	if (rx)
+		munmap(xsk->rx,
+		       off.rx.desc +
+		       xsk->config.rx_size * sizeof(struct xdp_desc));
+out_socket:
+	if (--umem->refcount)
+		close(xsk->fd);
+out_xsk_alloc:
+	free(xsk);
+	return err;
+}
+
+int xsk_umem__delete(struct xsk_umem *umem)
+{
+	struct xdp_mmap_offsets off;
+	socklen_t optlen;
+	int err;
+
+	if (!umem)
+		return 0;
+
+	if (umem->refcount)
+		return -EBUSY;
+
+	optlen = sizeof(off);
+	err = getsockopt(umem->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+	if (!err) {
+		munmap(umem->fill->ring,
+		       off.fr.desc + umem->config.fill_size * sizeof(__u64));
+		munmap(umem->comp->ring,
+		       off.cr.desc + umem->config.comp_size * sizeof(__u64));
+	}
+
+	close(umem->fd);
+	free(umem);
+
+	return 0;
+}
+
+void xsk_socket__delete(struct xsk_socket *xsk)
+{
+	struct xdp_mmap_offsets off;
+	socklen_t optlen;
+	int err;
+
+	if (!xsk)
+		return;
+
+	(void)xsk_update_bpf_maps(xsk, 0, 0);
+
+	optlen = sizeof(off);
+	err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+	if (!err) {
+		if (xsk->rx)
+			munmap(xsk->rx->ring,
+			       off.rx.desc +
+			       xsk->config.rx_size * sizeof(struct xdp_desc));
+		if (xsk->tx)
+			munmap(xsk->tx->ring,
+			       off.tx.desc +
+			       xsk->config.tx_size * sizeof(struct xdp_desc));
+	}
+
+	xsk->umem->refcount--;
+	/* Do not close an fd that also has an associated umem connected
+	 * to it.
+	 */
+	if (xsk->fd != xsk->umem->fd)
+		close(xsk->fd);
+	free(xsk);
+}
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
new file mode 100644
index 0000000..a177090
--- /dev/null
+++ b/tools/lib/bpf/xsk.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+
+/*
+ * AF_XDP user-space access library.
+ *
+ * Copyright(c) 2018 - 2019 Intel Corporation.
+ *
+ * Author(s): Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#ifndef __LIBBPF_XSK_H
+#define __LIBBPF_XSK_H
+
+#include <stdio.h>
+#include <stdint.h>
+#include <linux/if_xdp.h>
+
+#include "libbpf.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Do not access these members directly. Use the functions below. */
+#define DEFINE_XSK_RING(name) \
+struct name { \
+	__u32 cached_prod; \
+	__u32 cached_cons; \
+	__u32 mask; \
+	__u32 size; \
+	__u32 *producer; \
+	__u32 *consumer; \
+	void *ring; \
+}
+
+DEFINE_XSK_RING(xsk_ring_prod);
+DEFINE_XSK_RING(xsk_ring_cons);
+
+struct xsk_umem;
+struct xsk_socket;
+
+static inline __u64 *xsk_ring_prod__fill_addr(struct xsk_ring_prod *fill,
+					      __u32 idx)
+{
+	__u64 *addrs = (__u64 *)fill->ring;
+
+	return &addrs[idx & fill->mask];
+}
+
+static inline const __u64 *
+xsk_ring_cons__comp_addr(const struct xsk_ring_cons *comp, __u32 idx)
+{
+	const __u64 *addrs = (const __u64 *)comp->ring;
+
+	return &addrs[idx & comp->mask];
+}
+
+static inline struct xdp_desc *xsk_ring_prod__tx_desc(struct xsk_ring_prod *tx,
+						      __u32 idx)
+{
+	struct xdp_desc *descs = (struct xdp_desc *)tx->ring;
+
+	return &descs[idx & tx->mask];
+}
+
+static inline const struct xdp_desc *
+xsk_ring_cons__rx_desc(const struct xsk_ring_cons *rx, __u32 idx)
+{
+	const struct xdp_desc *descs = (const struct xdp_desc *)rx->ring;
+
+	return &descs[idx & rx->mask];
+}
+
+LIBBPF_API size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, size_t nb,
+				      __u32 *idx);
+LIBBPF_API void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb);
+LIBBPF_API size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod, size_t nb,
+					 __u32 *idx);
+LIBBPF_API void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb);
+
+LIBBPF_API void *xsk_umem__get_data_raw(void *umem_area, __u64 addr);
+LIBBPF_API void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr);
+
+LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
+LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
+
+#define XSK_RING_CONS__DEFAULT_NUM_DESCS      2048
+#define XSK_RING_PROD__DEFAULT_NUM_DESCS      2048
+#define XSK_UMEM__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
+#define XSK_UMEM__DEFAULT_FRAME_SIZE     (1 << XSK_UMEM__DEFAULT_FRAME_SHIFT)
+#define XSK_UMEM__DEFAULT_FRAME_HEADROOM 0
+
+struct xsk_umem_config {
+	__u32 fill_size;
+	__u32 comp_size;
+	__u32 frame_size;
+	__u32 frame_headroom;
+};
+
+/* Flags for the libbpf_flags field. */
+#define XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD (1 << 0)
+
+struct xsk_socket_config {
+	__u32 rx_size;
+	__u32 tx_size;
+	__u32 libbpf_flags;
+	__u32 xdp_flags;
+	__u16 bind_flags;
+};
+
+/* Set config to NULL to get the default configuration. */
+LIBBPF_API int xsk_umem__create(struct xsk_umem **umem,
+				void *umem_area, __u64 size,
+				struct xsk_ring_prod *fill,
+				struct xsk_ring_cons *comp,
+				const struct xsk_umem_config *config);
+LIBBPF_API int xsk_socket__create(struct xsk_socket **xsk,
+				  const char *ifname, __u32 queue_id,
+				  struct xsk_umem *umem,
+				  struct xsk_ring_cons *rx,
+				  struct xsk_ring_prod *tx,
+				  const struct xsk_socket_config *config);
+
+/* Returns 0 for success and -EBUSY if the umem is still in use. */
+LIBBPF_API int xsk_umem__delete(struct xsk_umem *umem);
+LIBBPF_API void xsk_socket__delete(struct xsk_socket *xsk);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif /* __LIBBPF_XSK_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v3 3/3] samples/bpf: convert xdpsock to use libbpf for AF_XDP access
  2019-01-29 15:12 [PATCH bpf-next v3 0/3] libbpf: adding AF_XDP support Magnus Karlsson
  2019-01-29 15:12 ` [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file Magnus Karlsson
  2019-01-29 15:12 ` [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets Magnus Karlsson
@ 2019-01-29 15:12 ` Magnus Karlsson
  2 siblings, 0 replies; 16+ messages in thread
From: Magnus Karlsson @ 2019-01-29 15:12 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jakub.kicinski, bjorn.topel, qi.z.zhang
  Cc: brouer

This commit converts the xdpsock sample application to use the AF_XDP
functions present in libbpf. This cuts down the size of it by nearly
300 lines of code.

The default ring sizes plus the batch size has been increased and the
size of the umem area has decreased. This so that the sample application
will provide higher throughput. Note also that the shared umem code
has been removed from the sample as this is not supported by libbpf
at this point in time.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 samples/bpf/Makefile       |   1 -
 samples/bpf/xdpsock.h      |  11 -
 samples/bpf/xdpsock_kern.c |  56 ----
 samples/bpf/xdpsock_user.c | 695 ++++++++++++---------------------------------
 4 files changed, 187 insertions(+), 576 deletions(-)
 delete mode 100644 samples/bpf/xdpsock.h
 delete mode 100644 samples/bpf/xdpsock_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 66ae15f..56d9445 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -163,7 +163,6 @@ always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
 always += cpustat_kern.o
 always += xdp_adjust_tail_kern.o
-always += xdpsock_kern.o
 always += xdp_fwd_kern.o
 always += task_fd_query_kern.o
 always += xdp_sample_pkts_kern.o
diff --git a/samples/bpf/xdpsock.h b/samples/bpf/xdpsock.h
deleted file mode 100644
index 533ab81..0000000
--- a/samples/bpf/xdpsock.h
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef XDPSOCK_H_
-#define XDPSOCK_H_
-
-/* Power-of-2 number of sockets */
-#define MAX_SOCKS 4
-
-/* Round-robin receive */
-#define RR_LB 0
-
-#endif /* XDPSOCK_H_ */
diff --git a/samples/bpf/xdpsock_kern.c b/samples/bpf/xdpsock_kern.c
deleted file mode 100644
index b8ccd08..0000000
--- a/samples/bpf/xdpsock_kern.c
+++ /dev/null
@@ -1,56 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#define KBUILD_MODNAME "foo"
-#include <uapi/linux/bpf.h>
-#include "bpf_helpers.h"
-
-#include "xdpsock.h"
-
-struct bpf_map_def SEC("maps") qidconf_map = {
-	.type		= BPF_MAP_TYPE_ARRAY,
-	.key_size	= sizeof(int),
-	.value_size	= sizeof(int),
-	.max_entries	= 1,
-};
-
-struct bpf_map_def SEC("maps") xsks_map = {
-	.type = BPF_MAP_TYPE_XSKMAP,
-	.key_size = sizeof(int),
-	.value_size = sizeof(int),
-	.max_entries = MAX_SOCKS,
-};
-
-struct bpf_map_def SEC("maps") rr_map = {
-	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
-	.key_size = sizeof(int),
-	.value_size = sizeof(unsigned int),
-	.max_entries = 1,
-};
-
-SEC("xdp_sock")
-int xdp_sock_prog(struct xdp_md *ctx)
-{
-	int *qidconf, key = 0, idx;
-	unsigned int *rr;
-
-	qidconf = bpf_map_lookup_elem(&qidconf_map, &key);
-	if (!qidconf)
-		return XDP_ABORTED;
-
-	if (*qidconf != ctx->rx_queue_index)
-		return XDP_PASS;
-
-#if RR_LB /* NB! RR_LB is configured in xdpsock.h */
-	rr = bpf_map_lookup_elem(&rr_map, &key);
-	if (!rr)
-		return XDP_ABORTED;
-
-	*rr = (*rr + 1) & (MAX_SOCKS - 1);
-	idx = *rr;
-#else
-	idx = 0;
-#endif
-
-	return bpf_redirect_map(&xsks_map, idx, 0);
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 57ecadc..647fa0d 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -1,7 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright(c) 2017 - 2018 Intel Corporation. */
 
-#include <assert.h>
 #include <errno.h>
 #include <getopt.h>
 #include <libgen.h>
@@ -27,11 +26,10 @@
 #include <poll.h>
 
 #include "bpf/libbpf.h"
+#include "bpf/xsk.h"
 #include "bpf_util.h"
 #include <bpf/bpf.h>
 
-#include "xdpsock.h"
-
 #ifndef SOL_XDP
 #define SOL_XDP 283
 #endif
@@ -44,17 +42,11 @@
 #define PF_XDP AF_XDP
 #endif
 
-#define NUM_FRAMES 131072
-#define FRAME_HEADROOM 0
-#define FRAME_SHIFT 11
-#define FRAME_SIZE 2048
-#define NUM_DESCS 1024
-#define BATCH_SIZE 16
-
-#define FQ_NUM_DESCS 1024
-#define CQ_NUM_DESCS 1024
+#define NUM_FRAMES (4 * 1024)
+#define BATCH_SIZE 64
 
 #define DEBUG_HEXDUMP 0
+#define MAX_SOCKS 8
 
 typedef __u64 u64;
 typedef __u32 u32;
@@ -73,53 +65,44 @@ static const char *opt_if = "";
 static int opt_ifindex;
 static int opt_queue;
 static int opt_poll;
-static int opt_shared_packet_buffer;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags;
 
-struct xdp_umem_uqueue {
-	u32 cached_prod;
-	u32 cached_cons;
-	u32 mask;
-	u32 size;
-	u32 *producer;
-	u32 *consumer;
-	u64 *ring;
-	void *map;
-};
-
-struct xdp_umem {
-	char *frames;
-	struct xdp_umem_uqueue fq;
-	struct xdp_umem_uqueue cq;
-	int fd;
-};
-
-struct xdp_uqueue {
-	u32 cached_prod;
-	u32 cached_cons;
-	u32 mask;
-	u32 size;
-	u32 *producer;
-	u32 *consumer;
-	struct xdp_desc *ring;
-	void *map;
+struct xsk_umem_info {
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+	struct xsk_umem *umem;
 };
 
-struct xdpsock {
-	struct xdp_uqueue rx;
-	struct xdp_uqueue tx;
-	int sfd;
-	struct xdp_umem *umem;
-	u32 outstanding_tx;
+struct xsk_socket_info {
+	struct xsk_ring_cons rx;
+	struct xsk_ring_prod tx;
+	struct xsk_umem_info *umem;
+	struct xsk_socket *xsk;
 	unsigned long rx_npkts;
 	unsigned long tx_npkts;
 	unsigned long prev_rx_npkts;
 	unsigned long prev_tx_npkts;
+	u32 outstanding_tx;
 };
 
 static int num_socks;
-struct xdpsock *xsks[MAX_SOCKS];
+struct xsk_socket_info *xsks[MAX_SOCKS];
+
+static void dump_stats(void);
+
+static void __exit_with_error(int error, const char *file, const char *func,
+			      int line)
+{
+	fprintf(stderr, "%s:%s:%i: errno: %d/\"%s\"\n", file, func,
+		line, error, strerror(error));
+	dump_stats();
+	bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
+	exit(EXIT_FAILURE);
+}
+
+#define exit_with_error(error) __exit_with_error(error, __FILE__, __func__, \
+						 __LINE__)
 
 static unsigned long get_nsecs(void)
 {
@@ -129,226 +112,12 @@ static unsigned long get_nsecs(void)
 	return ts.tv_sec * 1000000000UL + ts.tv_nsec;
 }
 
-static void dump_stats(void);
-
-#define lassert(expr)							\
-	do {								\
-		if (!(expr)) {						\
-			fprintf(stderr, "%s:%s:%i: Assertion failed: "	\
-				#expr ": errno: %d/\"%s\"\n",		\
-				__FILE__, __func__, __LINE__,		\
-				errno, strerror(errno));		\
-			dump_stats();					\
-			exit(EXIT_FAILURE);				\
-		}							\
-	} while (0)
-
-#define barrier() __asm__ __volatile__("": : :"memory")
-#ifdef __aarch64__
-#define u_smp_rmb() __asm__ __volatile__("dmb ishld": : :"memory")
-#define u_smp_wmb() __asm__ __volatile__("dmb ishst": : :"memory")
-#else
-#define u_smp_rmb() barrier()
-#define u_smp_wmb() barrier()
-#endif
-#define likely(x) __builtin_expect(!!(x), 1)
-#define unlikely(x) __builtin_expect(!!(x), 0)
-
 static const char pkt_data[] =
 	"\x3c\xfd\xfe\x9e\x7f\x71\xec\xb1\xd7\x98\x3a\xc0\x08\x00\x45\x00"
 	"\x00\x2e\x00\x00\x00\x00\x40\x11\x88\x97\x05\x08\x07\x08\xc8\x14"
 	"\x1e\x04\x10\x92\x10\x92\x00\x1a\x6d\xa3\x34\x33\x1f\x69\x40\x6b"
 	"\x54\x59\xb6\x14\x2d\x11\x44\xbf\xaf\xd9\xbe\xaa";
 
-static inline u32 umem_nb_free(struct xdp_umem_uqueue *q, u32 nb)
-{
-	u32 free_entries = q->cached_cons - q->cached_prod;
-
-	if (free_entries >= nb)
-		return free_entries;
-
-	/* Refresh the local tail pointer */
-	q->cached_cons = *q->consumer + q->size;
-
-	return q->cached_cons - q->cached_prod;
-}
-
-static inline u32 xq_nb_free(struct xdp_uqueue *q, u32 ndescs)
-{
-	u32 free_entries = q->cached_cons - q->cached_prod;
-
-	if (free_entries >= ndescs)
-		return free_entries;
-
-	/* Refresh the local tail pointer */
-	q->cached_cons = *q->consumer + q->size;
-	return q->cached_cons - q->cached_prod;
-}
-
-static inline u32 umem_nb_avail(struct xdp_umem_uqueue *q, u32 nb)
-{
-	u32 entries = q->cached_prod - q->cached_cons;
-
-	if (entries == 0) {
-		q->cached_prod = *q->producer;
-		entries = q->cached_prod - q->cached_cons;
-	}
-
-	return (entries > nb) ? nb : entries;
-}
-
-static inline u32 xq_nb_avail(struct xdp_uqueue *q, u32 ndescs)
-{
-	u32 entries = q->cached_prod - q->cached_cons;
-
-	if (entries == 0) {
-		q->cached_prod = *q->producer;
-		entries = q->cached_prod - q->cached_cons;
-	}
-
-	return (entries > ndescs) ? ndescs : entries;
-}
-
-static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq,
-					 struct xdp_desc *d,
-					 size_t nb)
-{
-	u32 i;
-
-	if (umem_nb_free(fq, nb) < nb)
-		return -ENOSPC;
-
-	for (i = 0; i < nb; i++) {
-		u32 idx = fq->cached_prod++ & fq->mask;
-
-		fq->ring[idx] = d[i].addr;
-	}
-
-	u_smp_wmb();
-
-	*fq->producer = fq->cached_prod;
-
-	return 0;
-}
-
-static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u64 *d,
-				      size_t nb)
-{
-	u32 i;
-
-	if (umem_nb_free(fq, nb) < nb)
-		return -ENOSPC;
-
-	for (i = 0; i < nb; i++) {
-		u32 idx = fq->cached_prod++ & fq->mask;
-
-		fq->ring[idx] = d[i];
-	}
-
-	u_smp_wmb();
-
-	*fq->producer = fq->cached_prod;
-
-	return 0;
-}
-
-static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq,
-					       u64 *d, size_t nb)
-{
-	u32 idx, i, entries = umem_nb_avail(cq, nb);
-
-	u_smp_rmb();
-
-	for (i = 0; i < entries; i++) {
-		idx = cq->cached_cons++ & cq->mask;
-		d[i] = cq->ring[idx];
-	}
-
-	if (entries > 0) {
-		u_smp_wmb();
-
-		*cq->consumer = cq->cached_cons;
-	}
-
-	return entries;
-}
-
-static inline void *xq_get_data(struct xdpsock *xsk, u64 addr)
-{
-	return &xsk->umem->frames[addr];
-}
-
-static inline int xq_enq(struct xdp_uqueue *uq,
-			 const struct xdp_desc *descs,
-			 unsigned int ndescs)
-{
-	struct xdp_desc *r = uq->ring;
-	unsigned int i;
-
-	if (xq_nb_free(uq, ndescs) < ndescs)
-		return -ENOSPC;
-
-	for (i = 0; i < ndescs; i++) {
-		u32 idx = uq->cached_prod++ & uq->mask;
-
-		r[idx].addr = descs[i].addr;
-		r[idx].len = descs[i].len;
-	}
-
-	u_smp_wmb();
-
-	*uq->producer = uq->cached_prod;
-	return 0;
-}
-
-static inline int xq_enq_tx_only(struct xdp_uqueue *uq,
-				 unsigned int id, unsigned int ndescs)
-{
-	struct xdp_desc *r = uq->ring;
-	unsigned int i;
-
-	if (xq_nb_free(uq, ndescs) < ndescs)
-		return -ENOSPC;
-
-	for (i = 0; i < ndescs; i++) {
-		u32 idx = uq->cached_prod++ & uq->mask;
-
-		r[idx].addr	= (id + i) << FRAME_SHIFT;
-		r[idx].len	= sizeof(pkt_data) - 1;
-	}
-
-	u_smp_wmb();
-
-	*uq->producer = uq->cached_prod;
-	return 0;
-}
-
-static inline int xq_deq(struct xdp_uqueue *uq,
-			 struct xdp_desc *descs,
-			 int ndescs)
-{
-	struct xdp_desc *r = uq->ring;
-	unsigned int idx;
-	int i, entries;
-
-	entries = xq_nb_avail(uq, ndescs);
-
-	u_smp_rmb();
-
-	for (i = 0; i < entries; i++) {
-		idx = uq->cached_cons++ & uq->mask;
-		descs[i] = r[idx];
-	}
-
-	if (entries > 0) {
-		u_smp_wmb();
-
-		*uq->consumer = uq->cached_cons;
-	}
-
-	return entries;
-}
-
 static void swap_mac_addresses(void *data)
 {
 	struct ether_header *eth = (struct ether_header *)data;
@@ -396,165 +165,66 @@ static void hex_dump(void *pkt, size_t length, u64 addr)
 	printf("\n");
 }
 
-static size_t gen_eth_frame(char *frame)
+static size_t gen_eth_frame(struct xsk_umem_info *umem, u64 addr)
 {
-	memcpy(frame, pkt_data, sizeof(pkt_data) - 1);
+	memcpy(xsk_umem__get_data(umem->umem, addr), pkt_data,
+	       sizeof(pkt_data) - 1);
 	return sizeof(pkt_data) - 1;
 }
 
-static struct xdp_umem *xdp_umem_configure(int sfd)
+static struct xsk_umem_info *xsk_configure_umem(void *buffer, u64 size)
 {
-	int fq_size = FQ_NUM_DESCS, cq_size = CQ_NUM_DESCS;
-	struct xdp_mmap_offsets off;
-	struct xdp_umem_reg mr;
-	struct xdp_umem *umem;
-	socklen_t optlen;
-	void *bufs;
+	struct xsk_umem_info *umem;
+	int ret;
 
 	umem = calloc(1, sizeof(*umem));
-	lassert(umem);
-
-	lassert(posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			       NUM_FRAMES * FRAME_SIZE) == 0);
-
-	mr.addr = (__u64)bufs;
-	mr.len = NUM_FRAMES * FRAME_SIZE;
-	mr.chunk_size = FRAME_SIZE;
-	mr.headroom = FRAME_HEADROOM;
-
-	lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0);
-	lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size,
-			   sizeof(int)) == 0);
-	lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_COMPLETION_RING, &cq_size,
-			   sizeof(int)) == 0);
-
-	optlen = sizeof(off);
-	lassert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off,
-			   &optlen) == 0);
-
-	umem->fq.map = mmap(0, off.fr.desc +
-			    FQ_NUM_DESCS * sizeof(u64),
-			    PROT_READ | PROT_WRITE,
-			    MAP_SHARED | MAP_POPULATE, sfd,
-			    XDP_UMEM_PGOFF_FILL_RING);
-	lassert(umem->fq.map != MAP_FAILED);
-
-	umem->fq.mask = FQ_NUM_DESCS - 1;
-	umem->fq.size = FQ_NUM_DESCS;
-	umem->fq.producer = umem->fq.map + off.fr.producer;
-	umem->fq.consumer = umem->fq.map + off.fr.consumer;
-	umem->fq.ring = umem->fq.map + off.fr.desc;
-	umem->fq.cached_cons = FQ_NUM_DESCS;
-
-	umem->cq.map = mmap(0, off.cr.desc +
-			     CQ_NUM_DESCS * sizeof(u64),
-			     PROT_READ | PROT_WRITE,
-			     MAP_SHARED | MAP_POPULATE, sfd,
-			     XDP_UMEM_PGOFF_COMPLETION_RING);
-	lassert(umem->cq.map != MAP_FAILED);
-
-	umem->cq.mask = CQ_NUM_DESCS - 1;
-	umem->cq.size = CQ_NUM_DESCS;
-	umem->cq.producer = umem->cq.map + off.cr.producer;
-	umem->cq.consumer = umem->cq.map + off.cr.consumer;
-	umem->cq.ring = umem->cq.map + off.cr.desc;
-
-	umem->frames = bufs;
-	umem->fd = sfd;
+	if (!umem)
+		exit_with_error(errno);
 
-	if (opt_bench == BENCH_TXONLY) {
-		int i;
-
-		for (i = 0; i < NUM_FRAMES * FRAME_SIZE; i += FRAME_SIZE)
-			(void)gen_eth_frame(&umem->frames[i]);
-	}
+	ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
+			       NULL);
+	if (ret)
+		exit_with_error(-ret);
 
 	return umem;
 }
 
-static struct xdpsock *xsk_configure(struct xdp_umem *umem)
+static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
 {
-	struct sockaddr_xdp sxdp = {};
-	struct xdp_mmap_offsets off;
-	int sfd, ndescs = NUM_DESCS;
-	struct xdpsock *xsk;
-	bool shared = true;
-	socklen_t optlen;
-	u64 i;
-
-	sfd = socket(PF_XDP, SOCK_RAW, 0);
-	lassert(sfd >= 0);
+	struct xsk_socket_config cfg;
+	struct xsk_socket_info *xsk;
+	int ret;
+	u32 idx;
+	int i;
 
 	xsk = calloc(1, sizeof(*xsk));
-	lassert(xsk);
-
-	xsk->sfd = sfd;
-	xsk->outstanding_tx = 0;
-
-	if (!umem) {
-		shared = false;
-		xsk->umem = xdp_umem_configure(sfd);
-	} else {
-		xsk->umem = umem;
-	}
-
-	lassert(setsockopt(sfd, SOL_XDP, XDP_RX_RING,
-			   &ndescs, sizeof(int)) == 0);
-	lassert(setsockopt(sfd, SOL_XDP, XDP_TX_RING,
-			   &ndescs, sizeof(int)) == 0);
-	optlen = sizeof(off);
-	lassert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off,
-			   &optlen) == 0);
-
-	/* Rx */
-	xsk->rx.map = mmap(NULL,
-			   off.rx.desc +
-			   NUM_DESCS * sizeof(struct xdp_desc),
-			   PROT_READ | PROT_WRITE,
-			   MAP_SHARED | MAP_POPULATE, sfd,
-			   XDP_PGOFF_RX_RING);
-	lassert(xsk->rx.map != MAP_FAILED);
-
-	if (!shared) {
-		for (i = 0; i < NUM_DESCS * FRAME_SIZE; i += FRAME_SIZE)
-			lassert(umem_fill_to_kernel(&xsk->umem->fq, &i, 1)
-				== 0);
-	}
-
-	/* Tx */
-	xsk->tx.map = mmap(NULL,
-			   off.tx.desc +
-			   NUM_DESCS * sizeof(struct xdp_desc),
-			   PROT_READ | PROT_WRITE,
-			   MAP_SHARED | MAP_POPULATE, sfd,
-			   XDP_PGOFF_TX_RING);
-	lassert(xsk->tx.map != MAP_FAILED);
-
-	xsk->rx.mask = NUM_DESCS - 1;
-	xsk->rx.size = NUM_DESCS;
-	xsk->rx.producer = xsk->rx.map + off.rx.producer;
-	xsk->rx.consumer = xsk->rx.map + off.rx.consumer;
-	xsk->rx.ring = xsk->rx.map + off.rx.desc;
-
-	xsk->tx.mask = NUM_DESCS - 1;
-	xsk->tx.size = NUM_DESCS;
-	xsk->tx.producer = xsk->tx.map + off.tx.producer;
-	xsk->tx.consumer = xsk->tx.map + off.tx.consumer;
-	xsk->tx.ring = xsk->tx.map + off.tx.desc;
-	xsk->tx.cached_cons = NUM_DESCS;
-
-	sxdp.sxdp_family = PF_XDP;
-	sxdp.sxdp_ifindex = opt_ifindex;
-	sxdp.sxdp_queue_id = opt_queue;
-
-	if (shared) {
-		sxdp.sxdp_flags = XDP_SHARED_UMEM;
-		sxdp.sxdp_shared_umem_fd = umem->fd;
-	} else {
-		sxdp.sxdp_flags = opt_xdp_bind_flags;
-	}
-
-	lassert(bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0);
+	if (!xsk)
+		exit_with_error(errno);
+
+	xsk->umem = umem;
+	cfg.rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
+	cfg.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
+	cfg.libbpf_flags = 0;
+	cfg.xdp_flags = opt_xdp_flags;
+	cfg.bind_flags = opt_xdp_bind_flags;
+	ret = xsk_socket__create(&xsk->xsk, opt_if, opt_queue, umem->umem,
+				 &xsk->rx, &xsk->tx, &cfg);
+	if (ret)
+		exit_with_error(-ret);
+
+
+	ret = xsk_ring_prod__reserve(&xsk->umem->fq,
+				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
+				     &idx);
+	if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
+		exit_with_error(-ret);
+	for (i = 0;
+	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
+		     XSK_UMEM__DEFAULT_FRAME_SIZE;
+	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
+	xsk_ring_prod__submit(&xsk->umem->fq,
+			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
 
 	return xsk;
 }
@@ -631,8 +301,12 @@ static void *poller(void *arg)
 
 static void int_exit(int sig)
 {
+	struct xsk_umem *umem = xsks[0]->umem->umem;
+
 	(void)sig;
 	dump_stats();
+	xsk_socket__delete(xsks[0]->xsk);
+	(void)xsk_umem__delete(umem);
 	bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
 	exit(EXIT_SUCCESS);
 }
@@ -644,7 +318,6 @@ static struct option long_options[] = {
 	{"interface", required_argument, 0, 'i'},
 	{"queue", required_argument, 0, 'q'},
 	{"poll", no_argument, 0, 'p'},
-	{"shared-buffer", no_argument, 0, 's'},
 	{"xdp-skb", no_argument, 0, 'S'},
 	{"xdp-native", no_argument, 0, 'N'},
 	{"interval", required_argument, 0, 'n'},
@@ -664,7 +337,6 @@ static void usage(const char *prog)
 		"  -i, --interface=n	Run on interface n\n"
 		"  -q, --queue=n	Use queue n (default 0)\n"
 		"  -p, --poll		Use poll syscall\n"
-		"  -s, --shared-buffer	Use shared packet buffer\n"
 		"  -S, --xdp-skb=n	Use XDP skb-mod\n"
 		"  -N, --xdp-native=n	Enfore XDP native mode\n"
 		"  -n, --interval=n	Specify statistics update interval (default 1 sec).\n"
@@ -703,9 +375,6 @@ static void parse_command_line(int argc, char **argv)
 		case 'q':
 			opt_queue = atoi(optarg);
 			break;
-		case 's':
-			opt_shared_packet_buffer = 1;
-			break;
 		case 'p':
 			opt_poll = 1;
 			break;
@@ -736,75 +405,104 @@ static void parse_command_line(int argc, char **argv)
 			opt_if);
 		usage(basename(argv[0]));
 	}
+
 }
 
-static void kick_tx(int fd)
+static void kick_tx(struct xsk_socket_info *xsk)
 {
 	int ret;
 
-	ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0);
+	ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
 	if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno == EBUSY)
 		return;
-	lassert(0);
+	exit_with_error(errno);
 }
 
-static inline void complete_tx_l2fwd(struct xdpsock *xsk)
+static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
 {
-	u64 descs[BATCH_SIZE];
+	u32 idx_cq, idx_fq;
 	unsigned int rcvd;
 	size_t ndescs;
 
 	if (!xsk->outstanding_tx)
 		return;
 
-	kick_tx(xsk->sfd);
+	kick_tx(xsk);
 	ndescs = (xsk->outstanding_tx > BATCH_SIZE) ? BATCH_SIZE :
-		 xsk->outstanding_tx;
+		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, ndescs);
+	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
-		umem_fill_to_kernel(&xsk->umem->fq, descs, rcvd);
+		unsigned int i;
+		int ret;
+
+		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+		while (ret != rcvd) {
+			if (ret < 0)
+				exit_with_error(-ret);
+			ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd,
+						     &idx_fq);
+		}
+		for (i = 0; i < rcvd; i++)
+			*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) =
+				*xsk_ring_cons__comp_addr(&xsk->umem->cq,
+							  idx_cq++);
+
+		xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
 		xsk->outstanding_tx -= rcvd;
 		xsk->tx_npkts += rcvd;
 	}
 }
 
-static inline void complete_tx_only(struct xdpsock *xsk)
+static inline void complete_tx_only(struct xsk_socket_info *xsk)
 {
-	u64 descs[BATCH_SIZE];
 	unsigned int rcvd;
+	u32 idx;
 
 	if (!xsk->outstanding_tx)
 		return;
 
-	kick_tx(xsk->sfd);
+	kick_tx(xsk);
 
-	rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, BATCH_SIZE);
+	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, BATCH_SIZE, &idx);
 	if (rcvd > 0) {
+		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
 		xsk->outstanding_tx -= rcvd;
 		xsk->tx_npkts += rcvd;
 	}
 }
 
-static void rx_drop(struct xdpsock *xsk)
+static void rx_drop(struct xsk_socket_info *xsk)
 {
-	struct xdp_desc descs[BATCH_SIZE];
 	unsigned int rcvd, i;
+	u32 idx_rx, idx_fq;
+	int ret;
 
-	rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
+	rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
 	if (!rcvd)
 		return;
 
+	ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+	while (ret != rcvd) {
+		if (ret < 0)
+			exit_with_error(-ret);
+		ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
+	}
+
 	for (i = 0; i < rcvd; i++) {
-		char *pkt = xq_get_data(xsk, descs[i].addr);
+		u64 addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+		u32 len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++)->len;
+		char *pkt = xsk_umem__get_data(xsk->umem->umem, addr);
 
-		hex_dump(pkt, descs[i].len, descs[i].addr);
+		hex_dump(pkt, len, addr);
+		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = addr;
 	}
 
+	xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+	xsk_ring_cons__release(&xsk->rx, rcvd);
 	xsk->rx_npkts += rcvd;
-
-	umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd);
 }
 
 static void rx_drop_all(void)
@@ -815,7 +513,7 @@ static void rx_drop_all(void)
 	memset(fds, 0, sizeof(fds));
 
 	for (i = 0; i < num_socks; i++) {
-		fds[i].fd = xsks[i]->sfd;
+		fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
 		fds[i].events = POLLIN;
 		timeout = 1000; /* 1sn */
 	}
@@ -832,14 +530,14 @@ static void rx_drop_all(void)
 	}
 }
 
-static void tx_only(struct xdpsock *xsk)
+static void tx_only(struct xsk_socket_info *xsk)
 {
 	int timeout, ret, nfds = 1;
 	struct pollfd fds[nfds + 1];
-	unsigned int idx = 0;
+	u32 idx, frame_nb = 0;
 
 	memset(fds, 0, sizeof(fds));
-	fds[0].fd = xsk->sfd;
+	fds[0].fd = xsk_socket__fd(xsk->xsk);
 	fds[0].events = POLLOUT;
 	timeout = 1000; /* 1sn */
 
@@ -849,50 +547,73 @@ static void tx_only(struct xdpsock *xsk)
 			if (ret <= 0)
 				continue;
 
-			if (fds[0].fd != xsk->sfd ||
-			    !(fds[0].revents & POLLOUT))
+			if (!(fds[0].revents & POLLOUT))
 				continue;
 		}
 
-		if (xq_nb_free(&xsk->tx, BATCH_SIZE) >= BATCH_SIZE) {
-			lassert(xq_enq_tx_only(&xsk->tx, idx, BATCH_SIZE) == 0);
+		if (xsk_ring_prod__reserve(&xsk->tx, BATCH_SIZE, &idx) ==
+		    BATCH_SIZE) {
+			unsigned int i;
+
+			for (i = 0; i < BATCH_SIZE; i++) {
+				xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr
+					= (frame_nb + i) <<
+					XSK_UMEM__DEFAULT_FRAME_SHIFT;
+				xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len =
+					sizeof(pkt_data) - 1;
+			}
 
+			xsk_ring_prod__submit(&xsk->tx, BATCH_SIZE);
 			xsk->outstanding_tx += BATCH_SIZE;
-			idx += BATCH_SIZE;
-			idx %= NUM_FRAMES;
+			frame_nb += BATCH_SIZE;
+			frame_nb %= NUM_FRAMES;
 		}
 
 		complete_tx_only(xsk);
 	}
 }
 
-static void l2fwd(struct xdpsock *xsk)
+static void l2fwd(struct xsk_socket_info *xsk)
 {
 	for (;;) {
-		struct xdp_desc descs[BATCH_SIZE];
 		unsigned int rcvd, i;
+		u32 idx_rx, idx_tx;
 		int ret;
 
 		for (;;) {
 			complete_tx_l2fwd(xsk);
 
-			rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
+			rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE,
+						   &idx_rx);
 			if (rcvd > 0)
 				break;
 		}
 
+		ret = xsk_ring_prod__reserve(&xsk->tx, rcvd, &idx_tx);
+		while (ret != rcvd) {
+			if (ret < 0)
+				exit_with_error(-ret);
+			ret = xsk_ring_prod__reserve(&xsk->tx, rcvd, &idx_tx);
+		}
+
 		for (i = 0; i < rcvd; i++) {
-			char *pkt = xq_get_data(xsk, descs[i].addr);
+			u64 addr = xsk_ring_cons__rx_desc(&xsk->rx,
+							  idx_rx)->addr;
+			u32 len = xsk_ring_cons__rx_desc(&xsk->rx,
+							 idx_rx++)->len;
+			char *pkt = xsk_umem__get_data(xsk->umem->umem, addr);
 
 			swap_mac_addresses(pkt);
 
-			hex_dump(pkt, descs[i].len, descs[i].addr);
+			hex_dump(pkt, len, addr);
+			xsk_ring_prod__tx_desc(&xsk->tx, idx_tx)->addr = addr;
+			xsk_ring_prod__tx_desc(&xsk->tx, idx_tx++)->len = len;
 		}
 
-		xsk->rx_npkts += rcvd;
+		xsk_ring_prod__submit(&xsk->tx, rcvd);
+		xsk_ring_cons__release(&xsk->rx, rcvd);
 
-		ret = xq_enq(&xsk->tx, descs, rcvd);
-		lassert(ret == 0);
+		xsk->rx_npkts += rcvd;
 		xsk->outstanding_tx += rcvd;
 	}
 }
@@ -900,15 +621,10 @@ static void l2fwd(struct xdpsock *xsk)
 int main(int argc, char **argv)
 {
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
-	struct bpf_prog_load_attr prog_load_attr = {
-		.prog_type	= BPF_PROG_TYPE_XDP,
-	};
-	int prog_fd, qidconf_map, xsks_map;
-	struct bpf_object *obj;
-	char xdp_filename[256];
-	struct bpf_map *map;
-	int i, ret, key = 0;
+	struct xsk_umem_info *umem;
 	pthread_t pt;
+	void *bufs;
+	int ret;
 
 	parse_command_line(argc, argv);
 
@@ -918,60 +634,22 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	snprintf(xdp_filename, sizeof(xdp_filename), "%s_kern.o", argv[0]);
-	prog_load_attr.file = xdp_filename;
+	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
+			     NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+	if (ret)
+		exit_with_error(ret);
 
-	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
-		exit(EXIT_FAILURE);
-	if (prog_fd < 0) {
-		fprintf(stderr, "ERROR: no program found: %s\n",
-			strerror(prog_fd));
-		exit(EXIT_FAILURE);
-	}
-
-	map = bpf_object__find_map_by_name(obj, "qidconf_map");
-	qidconf_map = bpf_map__fd(map);
-	if (qidconf_map < 0) {
-		fprintf(stderr, "ERROR: no qidconf map found: %s\n",
-			strerror(qidconf_map));
-		exit(EXIT_FAILURE);
-	}
-
-	map = bpf_object__find_map_by_name(obj, "xsks_map");
-	xsks_map = bpf_map__fd(map);
-	if (xsks_map < 0) {
-		fprintf(stderr, "ERROR: no xsks map found: %s\n",
-			strerror(xsks_map));
-		exit(EXIT_FAILURE);
-	}
-
-	if (bpf_set_link_xdp_fd(opt_ifindex, prog_fd, opt_xdp_flags) < 0) {
-		fprintf(stderr, "ERROR: link set xdp fd failed\n");
-		exit(EXIT_FAILURE);
-	}
-
-	ret = bpf_map_update_elem(qidconf_map, &key, &opt_queue, 0);
-	if (ret) {
-		fprintf(stderr, "ERROR: bpf_map_update_elem qidconf\n");
-		exit(EXIT_FAILURE);
-	}
-
-	/* Create sockets... */
-	xsks[num_socks++] = xsk_configure(NULL);
+       /* Create sockets... */
+	umem = xsk_configure_umem(bufs,
+				  NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE);
+	xsks[num_socks++] = xsk_configure_socket(umem);
 
-#if RR_LB
-	for (i = 0; i < MAX_SOCKS - 1; i++)
-		xsks[num_socks++] = xsk_configure(xsks[0]->umem);
-#endif
+	if (opt_bench == BENCH_TXONLY) {
+		int i;
 
-	/* ...and insert them into the map. */
-	for (i = 0; i < num_socks; i++) {
-		key = i;
-		ret = bpf_map_update_elem(xsks_map, &key, &xsks[i]->sfd, 0);
-		if (ret) {
-			fprintf(stderr, "ERROR: bpf_map_update_elem %d\n", i);
-			exit(EXIT_FAILURE);
-		}
+		for (i = 0; i < NUM_FRAMES * XSK_UMEM__DEFAULT_FRAME_SIZE;
+		     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+			(void)gen_eth_frame(umem, i);
 	}
 
 	signal(SIGINT, int_exit);
@@ -981,7 +659,8 @@ int main(int argc, char **argv)
 	setlocale(LC_ALL, "");
 
 	ret = pthread_create(&pt, NULL, poller, NULL);
-	lassert(ret == 0);
+	if (ret)
+		exit_with_error(ret);
 
 	prev_time = get_nsecs();
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
  2019-01-29 15:12 ` [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file Magnus Karlsson
@ 2019-01-31 10:04   ` Daniel Borkmann
  2019-01-31 10:42     ` Magnus Karlsson
  2019-01-31 18:52   ` Alexei Starovoitov
  2019-01-31 19:35   ` Y Song
  2 siblings, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2019-01-31 10:04 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jakub.kicinski,
	bjorn.topel, qi.z.zhang
  Cc: brouer

On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> Move the pr_*() functions in libbpf.c to a common header file called
> libbpf_internal.h. This so that the later libbpf AF_XDP helper library
> code in xsk.c can use these printing functions too.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  tools/lib/bpf/libbpf.c          | 30 +-----------------------------
>  tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 29 deletions(-)
>  create mode 100644 tools/lib/bpf/libbpf_internal.h
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 2ccde17..1d7fe26 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -39,6 +39,7 @@
>  #include <gelf.h>
>  
>  #include "libbpf.h"
> +#include "libbpf_internal.h"
>  #include "bpf.h"
>  #include "btf.h"
>  #include "str_error.h"
> @@ -51,34 +52,6 @@
>  #define BPF_FS_MAGIC		0xcafe4a11
>  #endif
>  
> -#define __printf(a, b)	__attribute__((format(printf, a, b)))
> -
> -__printf(1, 2)
> -static int __base_pr(const char *format, ...)
> -{
> -	va_list args;
> -	int err;
> -
> -	va_start(args, format);
> -	err = vfprintf(stderr, format, args);
> -	va_end(args);
> -	return err;
> -}
> -
> -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
> -
> -#define __pr(func, fmt, ...)	\
> -do {				\
> -	if ((func))		\
> -		(func)("libbpf: " fmt, ##__VA_ARGS__); \
> -} while (0)
> -
> -#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
> -#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
> -#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)
> -
>  void libbpf_set_print(libbpf_print_fn_t warn,
>  		      libbpf_print_fn_t info,
>  		      libbpf_print_fn_t debug)
> @@ -96,7 +69,6 @@ void libbpf_set_print(libbpf_print_fn_t warn,
>  		goto out;		\
>  } while(0)
>  
> -
>  /* Copied from tools/perf/util/util.h */
>  #ifndef zfree
>  # define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> new file mode 100644
> index 0000000..951c078
> --- /dev/null
> +++ b/tools/lib/bpf/libbpf_internal.h

Just really minor nit: lets name the header utils.h or such. It would also make
sense to move the zfree() and zclose() to this and make use of it.

> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +
> +/*
> + * Common internal libbpf functions and definitions.
> + *
> + * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
> + * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
> + * Copyright (C) 2015 Huawei Inc.
> + */
> +#ifndef __LIBBPF_LIBBPF_INTERNAL_H
> +#define __LIBBPF_LIBBPF_INTERNAL_H
> +
> +#define __printf(a, b)	__attribute__((format(printf, a, b)))
> +
> +__printf(1, 2)
> +static int __base_pr(const char *format, ...)
> +{
> +	va_list args;
> +	int err;
> +
> +	va_start(args, format);
> +	err = vfprintf(stderr, format, args);
> +	va_end(args);
> +	return err;
> +}
> +
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_debug;
> +
> +#define __pr(func, fmt, ...)	\
> +do {				\
> +	if ((func))		\
> +		(func)("libbpf: " fmt, ##__VA_ARGS__); \
> +} while (0)
> +
> +#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
> +#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
> +#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)
> +
> +#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
  2019-01-29 15:12 ` [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets Magnus Karlsson
@ 2019-01-31 10:05   ` Daniel Borkmann
  2019-01-31 10:44     ` Magnus Karlsson
  2019-01-31 10:29   ` Daniel Borkmann
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2019-01-31 10:05 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jakub.kicinski,
	bjorn.topel, qi.z.zhang
  Cc: brouer

On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> This commit adds AF_XDP support to libbpf. The main reason for this is
> to facilitate writing applications that use AF_XDP by offering
> higher-level APIs that hide many of the details of the AF_XDP
> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> offering easy-to-use higher level interfaces of XDP
> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> applications using it simpler and smaller, and finally also make it
> possible for applications to benefit from optimizations in the AF_XDP
> user space access code. Previously, people just copied and pasted the
> code from the sample application into their application, which is not
> desirable.
> 
> The interface is composed of two parts:
> 
> * Low-level access interface to the four rings and the packet
> * High-level control plane interface for creating and setting
>   up umems and af_xdp sockets as well as a simple XDP program.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
[...]
>  libbpf ABI
>  ==========
>  
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 266bc95..d294843 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -121,6 +121,18 @@ LIBBPF_0.0.1 {
>  		libbpf_prog_type_by_name;
>  		libbpf_set_print;
>  		libbpf_strerror;
> +		xsk_ring_cons__peek;
> +		xsk_ring_cons__release;
> +		xsk_ring_prod__reserve;
> +		xsk_ring_prod__submit;
> +		xsk_umem__get_data;
> +		xsk_umem__get_data_raw;
> +		xsk_umem__create;
> +		xsk_socket__create;
> +		xsk_umem__delete;
> +		xsk_socket__delete;
> +		xsk_umem__fd;
> +		xsk_socket__fd;

This definitely needs to be placed under LIBBPF_0.0.2 where we have bpf_probe_*
that were added in this cycle.

>  	local:
>  		*;
>  };
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
  2019-01-29 15:12 ` [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets Magnus Karlsson
  2019-01-31 10:05   ` Daniel Borkmann
@ 2019-01-31 10:29   ` Daniel Borkmann
  2019-01-31 13:55     ` Magnus Karlsson
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2019-01-31 10:29 UTC (permalink / raw)
  To: Magnus Karlsson, bjorn.topel, ast, netdev, jakub.kicinski,
	bjorn.topel, qi.z.zhang
  Cc: brouer

On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> This commit adds AF_XDP support to libbpf. The main reason for this is
> to facilitate writing applications that use AF_XDP by offering
> higher-level APIs that hide many of the details of the AF_XDP
> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> offering easy-to-use higher level interfaces of XDP
> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> applications using it simpler and smaller, and finally also make it
> possible for applications to benefit from optimizations in the AF_XDP
> user space access code. Previously, people just copied and pasted the
> code from the sample application into their application, which is not
> desirable.
> 
> The interface is composed of two parts:
> 
> * Low-level access interface to the four rings and the packet
> * High-level control plane interface for creating and setting
>   up umems and af_xdp sockets as well as a simple XDP program.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
[...]
> +
> +static __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
> +{
> +	__u32 free_entries = r->cached_cons - r->cached_prod;
> +
> +	if (free_entries >= nb)
> +		return free_entries;
> +
> +	/* Refresh the local tail pointer.
> +	 * cached_cons is r->size bigger than the real consumer pointer so
> +	 * that this addition can be avoided in the more frequently
> +	 * executed code that computs free_entries in the beginning of
> +	 * this function. Without this optimization it whould have been
> +	 * free_entries = r->cached_prod - r->cached_cons + r->size.
> +	 */
> +	r->cached_cons = *r->consumer + r->size;
> +
> +	return r->cached_cons - r->cached_prod;
> +}
> +
> +static __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
> +{
> +	__u32 entries = r->cached_prod - r->cached_cons;
> +
> +	if (entries == 0) {
> +		r->cached_prod = *r->producer;
> +		entries = r->cached_prod - r->cached_cons;
> +	}
> +
> +	return (entries > nb) ? nb : entries;
> +}
> +
> +size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod, size_t nb, __u32 *idx)
> +{
> +	if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
> +		return 0;
> +
> +	*idx = prod->cached_prod;
> +	prod->cached_prod += nb;
> +
> +	return nb;
> +}
> +
> +void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
> +{
> +	/* Make sure everything has been written to the ring before signalling
> +	 * this to the kernel.
> +	 */
> +	smp_wmb();
> +
> +	*prod->producer += nb;
> +}
> +
> +size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, size_t nb, __u32 *idx)
> +{
> +	size_t entries = xsk_cons_nb_avail(cons, nb);
> +
> +	if (likely(entries > 0)) {
> +		/* Make sure we do not speculatively read the data before
> +		 * we have received the packet buffers from the ring.
> +		 */
> +		smp_rmb();
> +
> +		*idx = cons->cached_cons;
> +		cons->cached_cons += entries;
> +	}
> +
> +	return entries;
> +}
> +
> +void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
> +{
> +	*cons->consumer += nb;
> +}
> +
> +void *xsk_umem__get_data_raw(void *umem_area, __u64 addr)
> +{
> +	return &((char *)umem_area)[addr];
> +}
> +
> +void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr)
> +{
> +	return &((char *)(umem->umem_area))[addr];
> +}

Shouldn't some of the above helpers for critical path be exposed as static
inline functions instead?

[...]
> +static int xsk_parse_nl(void *cookie, void *msg, struct nlattr **tb)
> +{
> +	struct nlattr *tb_parsed[IFLA_XDP_MAX + 1];
> +	struct xsk_nl_info *nl_info = cookie;
> +	unsigned char mode;
> +	int err;
> +
> +	(void)msg;

Unused?

> +	nl_info->xdp_prog_attached = false;
> +	if (!tb[IFLA_XDP])
> +		return 0;
> +
> +	err = libbpf_nla_parse_nested(tb_parsed, IFLA_XDP_MAX, tb[IFLA_XDP],
> +				      NULL);
> +	if (err)
> +		return err;
> +
> +	if (!tb_parsed[IFLA_XDP_ATTACHED] || !tb_parsed[IFLA_XDP_FD])
> +		return 0;
> +
> +	mode = libbpf_nla_getattr_u8(tb_parsed[IFLA_XDP_ATTACHED]);
> +	if (mode == XDP_ATTACHED_NONE)
> +		return 0;

Probably good to memset and/or init the passed struct xsk_nl_info (e.g.
fd to -1 etc) such that if some user did not we won't end up with garbage
values on return 0.

> +	nl_info->xdp_prog_attached = true;
> +	nl_info->fd = libbpf_nla_getattr_u32(tb_parsed[IFLA_XDP_FD]);
> +	return 0;
> +}
> +
> +static bool xsk_xdp_prog_attached(struct xsk_socket *xsk)
> +{
> +	struct xsk_nl_info nl_info;
> +	unsigned int nl_pid;
> +	char err_buf[256];
> +	int sock, err;
> +
> +	sock = libbpf_netlink_open(&nl_pid);
> +	if (sock < 0)
> +		return false;
> +
> +	nl_info.xdp_prog_attached = false;
> +	nl_info.fd = 0;
> +
> +	err = libbpf_nl_get_link(sock, nl_pid, xsk_parse_nl, &nl_info);
> +	if (err) {
> +		libbpf_strerror(err, err_buf, sizeof(err_buf));
> +		pr_warning("Error:\n%s\n", err_buf);
> +		return false;
> +	}
> +
> +	xsk->prog_fd = nl_info.fd;
> +	return nl_info.xdp_prog_attached;

Don't we leak sock here and in error above?

> +}
> +
> +static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> +{
> +	char bpf_log_buf[BPF_LOG_BUF_SIZE];
> +	int err, prog_fd;
> +
> +	/* This is the C-program:
> +	 * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
> +	 * {
> +	 *     int *qidconf, index = ctx->rx_queue_index;
> +	 *
> +	 *     // A set entry here means that the correspnding queue_id
> +	 *     // has an active AF_XDP socket bound to it.
> +	 *     qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
> +	 *     if (!qidconf)
> +	 *         return XDP_ABORTED;
> +	 *
> +	 *     if (*qidconf)
> +	 *         return bpf_redirect_map(&xsks_map, index, 0);
> +	 *
> +	 *     return XDP_PASS;
> +	 * }
> +	 */
> +	struct bpf_insn prog[] = {
> +		/* r1 = *(u32 *)(r1 + 16) */
> +		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
> +		/* *(u32 *)(r10 - 4) = r1 */
> +		BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
> +		BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +		BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
> +		BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
> +		BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
> +		BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> +		BPF_MOV32_IMM(BPF_REG_0, 0),
> +		/* if r1 == 0 goto +8 */
> +		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
> +		BPF_MOV32_IMM(BPF_REG_0, 2),
> +		/* r1 = *(u32 *)(r1 + 0) */
> +		BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
> +		/* if r1 == 0 goto +5 */
> +		BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
> +		/* r2 = *(u32 *)(r10 - 4) */
> +		BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
> +		BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
> +		BPF_MOV32_IMM(BPF_REG_3, 0),
> +		BPF_EMIT_CALL(BPF_FUNC_redirect_map),
> +		/* The jumps are to this instruction */
> +		BPF_EXIT_INSN(),
> +	};
> +	size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
> +
> +	prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
> +				   "LGPL-2.1 or BSD-2-Clause", 0, bpf_log_buf,
> +				   BPF_LOG_BUF_SIZE);
> +	if (prog_fd < 0) {
> +		pr_warning("BPF log buffer:\n%s", bpf_log_buf);
> +		return prog_fd;
> +	}
> +
> +	err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags);
> +	if (err)
> +		return err;

Leaks prog_fd on error.

> +	xsk->prog_fd = prog_fd;
> +	return 0;
> +}
> +
> +static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> +{
> +	int fd;
> +
> +	fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map",
> +				 sizeof(int), sizeof(int), MAX_QUEUES, 0);
> +	if (fd < 0)
> +		return fd;
> +	xsk->qidconf_map_fd = fd;
> +
> +	fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
> +				 sizeof(int), sizeof(int), MAX_QUEUES, 0);
> +	if (fd < 0)
> +		return fd;

Leaks first map fd on error.

> +	xsk->xsks_map_fd = fd;
> +
> +	return 0;
> +}
> +
> +static int xsk_update_bpf_maps(struct xsk_socket *xsk, int qidconf_value,
> +			       int xsks_value)
> +{
> +	bool qidconf_map_updated = false, xsks_map_updated = false;
> +	struct bpf_prog_info prog_info = {};
> +	__u32 prog_len = sizeof(prog_info);
> +	struct bpf_map_info map_info;
> +	__u32 map_len = sizeof(map_info);
> +	__u32 *map_ids;
> +	int reset_value = 0;
> +	__u32 num_maps;
> +	unsigned int i;
> +	int err;
> +
> +	err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> +	if (err)
> +		return err;
> +
> +	num_maps = prog_info.nr_map_ids;
> +
> +	map_ids = malloc(prog_info.nr_map_ids * sizeof(*map_ids));

calloc()?

> +	if (!map_ids)
> +		return -ENOMEM;
> +
> +	memset(&prog_info, 0, prog_len);
> +	prog_info.nr_map_ids = num_maps;
> +	prog_info.map_ids = (__u64)(unsigned long)map_ids;
> +
> +	err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> +	if (err)
> +		return err;

Leaks map_ids on error.

> +
> +	for (i = 0; i < prog_info.nr_map_ids; i++) {
> +		int fd;
> +
> +		fd = bpf_map_get_fd_by_id(map_ids[i]);
> +		if (fd < 0) {
> +			err = -errno;
> +			goto out;
> +		}
> +
> +		err = bpf_obj_get_info_by_fd(fd, &map_info, &map_len);
> +		if (err)
> +			goto out;

close(fd) on error, also the case for below, please double check everything
so that no fd leaks by accident into the app.

> +
> +		if (!strcmp(map_info.name, "qidconf_map")) {
> +			err = bpf_map_update_elem(fd, &xsk->queue_id,
> +						  &qidconf_value, 0);
> +			if (err)
> +				goto out;
> +			qidconf_map_updated = true;
> +			xsk->qidconf_map_fd = fd;
> +		} else if (!strcmp(map_info.name, "xsks_map")) {
> +			err = bpf_map_update_elem(fd, &xsk->queue_id,
> +						  &xsks_value, 0);
> +			if (err)
> +				goto out;> +			xsks_map_updated = true;
> +			xsk->xsks_map_fd = fd;
> +		}
> +
> +		if (qidconf_map_updated && xsks_map_updated)
> +			break;
> +	}
> +
> +	if (!(qidconf_map_updated && xsks_map_updated)) {
> +		err = -ENOENT;
> +		goto out;
> +	}
> +
> +	return 0;
> +
> +out:
> +	if (qidconf_map_updated)
> +		(void)bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id,
> +					  &reset_value, 0);
> +	if (xsks_map_updated)
> +		(void)bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id,
> +					  &reset_value, 0);
> +	return err;
> +}
> +
> +static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> +{
> +	bool prog_loaded = false;
> +	int err;
> +
> +	if (!xsk_xdp_prog_attached(xsk)) {
> +		err = xsk_create_bpf_maps(xsk);
> +		if (err)
> +			goto out_load;
> +
> +		err = xsk_load_xdp_prog(xsk);
> +		if (err)
> +			return err;

Needs to undo created maps on error.

> +		prog_loaded = true;
> +	}
> +
> +	err = xsk_update_bpf_maps(xsk, true, xsk->fd);
> +	if (err)
> +		goto out_load;
> +
> +	return 0;
> +
> +out_load:
> +	if (prog_loaded)
> +		close(xsk->prog_fd);

Ditto

> +	return err;
> +}
> +
> +int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> +		       __u32 queue_id, struct xsk_umem *umem,
> +		       struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
> +		       const struct xsk_socket_config *usr_config)
> +{
> +	struct sockaddr_xdp sxdp = {};
> +	struct xdp_mmap_offsets off;
> +	struct xsk_socket *xsk;
> +	socklen_t optlen;
> +	void *map;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
  2019-01-31 10:04   ` Daniel Borkmann
@ 2019-01-31 10:42     ` Magnus Karlsson
  0 siblings, 0 replies; 16+ messages in thread
From: Magnus Karlsson @ 2019-01-31 10:42 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Magnus Karlsson, Björn Töpel, ast, Network Development,
	Jakub Kicinski, Björn Töpel, Zhang, Qi Z,
	Jesper Dangaard Brouer

On Thu, Jan 31, 2019 at 11:07 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> > Move the pr_*() functions in libbpf.c to a common header file called
> > libbpf_internal.h. This so that the later libbpf AF_XDP helper library
> > code in xsk.c can use these printing functions too.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > ---
> >  tools/lib/bpf/libbpf.c          | 30 +-----------------------------
> >  tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 42 insertions(+), 29 deletions(-)
> >  create mode 100644 tools/lib/bpf/libbpf_internal.h
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 2ccde17..1d7fe26 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -39,6 +39,7 @@
> >  #include <gelf.h>
> >
> >  #include "libbpf.h"
> > +#include "libbpf_internal.h"
> >  #include "bpf.h"
> >  #include "btf.h"
> >  #include "str_error.h"
> > @@ -51,34 +52,6 @@
> >  #define BPF_FS_MAGIC         0xcafe4a11
> >  #endif
> >
> > -#define __printf(a, b)       __attribute__((format(printf, a, b)))
> > -
> > -__printf(1, 2)
> > -static int __base_pr(const char *format, ...)
> > -{
> > -     va_list args;
> > -     int err;
> > -
> > -     va_start(args, format);
> > -     err = vfprintf(stderr, format, args);
> > -     va_end(args);
> > -     return err;
> > -}
> > -
> > -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> > -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> > -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
> > -
> > -#define __pr(func, fmt, ...) \
> > -do {                         \
> > -     if ((func))             \
> > -             (func)("libbpf: " fmt, ##__VA_ARGS__); \
> > -} while (0)
> > -
> > -#define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__)
> > -#define pr_info(fmt, ...)    __pr(__pr_info, fmt, ##__VA_ARGS__)
> > -#define pr_debug(fmt, ...)   __pr(__pr_debug, fmt, ##__VA_ARGS__)
> > -
> >  void libbpf_set_print(libbpf_print_fn_t warn,
> >                     libbpf_print_fn_t info,
> >                     libbpf_print_fn_t debug)
> > @@ -96,7 +69,6 @@ void libbpf_set_print(libbpf_print_fn_t warn,
> >               goto out;               \
> >  } while(0)
> >
> > -
> >  /* Copied from tools/perf/util/util.h */
> >  #ifndef zfree
> >  # define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
> > diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> > new file mode 100644
> > index 0000000..951c078
> > --- /dev/null
> > +++ b/tools/lib/bpf/libbpf_internal.h
>
> Just really minor nit: lets name the header utils.h or such. It would also make
> sense to move the zfree() and zclose() to this and make use of it.

Will fix.

Thanks: Magnus

> > @@ -0,0 +1,41 @@
> > +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> > +
> > +/*
> > + * Common internal libbpf functions and definitions.
> > + *
> > + * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
> > + * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
> > + * Copyright (C) 2015 Huawei Inc.
> > + */
> > +#ifndef __LIBBPF_LIBBPF_INTERNAL_H
> > +#define __LIBBPF_LIBBPF_INTERNAL_H
> > +
> > +#define __printf(a, b)       __attribute__((format(printf, a, b)))
> > +
> > +__printf(1, 2)
> > +static int __base_pr(const char *format, ...)
> > +{
> > +     va_list args;
> > +     int err;
> > +
> > +     va_start(args, format);
> > +     err = vfprintf(stderr, format, args);
> > +     va_end(args);
> > +     return err;
> > +}
> > +
> > +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> > +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> > +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_debug;
> > +
> > +#define __pr(func, fmt, ...) \
> > +do {                         \
> > +     if ((func))             \
> > +             (func)("libbpf: " fmt, ##__VA_ARGS__); \
> > +} while (0)
> > +
> > +#define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__)
> > +#define pr_info(fmt, ...)    __pr(__pr_info, fmt, ##__VA_ARGS__)
> > +#define pr_debug(fmt, ...)   __pr(__pr_debug, fmt, ##__VA_ARGS__)
> > +
> > +#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
> >
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
  2019-01-31 10:05   ` Daniel Borkmann
@ 2019-01-31 10:44     ` Magnus Karlsson
  0 siblings, 0 replies; 16+ messages in thread
From: Magnus Karlsson @ 2019-01-31 10:44 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Magnus Karlsson, Björn Töpel, ast, Network Development,
	Jakub Kicinski, Björn Töpel, Zhang, Qi Z,
	Jesper Dangaard Brouer

On Thu, Jan 31, 2019 at 11:07 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> > This commit adds AF_XDP support to libbpf. The main reason for this is
> > to facilitate writing applications that use AF_XDP by offering
> > higher-level APIs that hide many of the details of the AF_XDP
> > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > offering easy-to-use higher level interfaces of XDP
> > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > applications using it simpler and smaller, and finally also make it
> > possible for applications to benefit from optimizations in the AF_XDP
> > user space access code. Previously, people just copied and pasted the
> > code from the sample application into their application, which is not
> > desirable.
> >
> > The interface is composed of two parts:
> >
> > * Low-level access interface to the four rings and the packet
> > * High-level control plane interface for creating and setting
> >   up umems and af_xdp sockets as well as a simple XDP program.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> [...]
> >  libbpf ABI
> >  ==========
> >
> > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> > index 266bc95..d294843 100644
> > --- a/tools/lib/bpf/libbpf.map
> > +++ b/tools/lib/bpf/libbpf.map
> > @@ -121,6 +121,18 @@ LIBBPF_0.0.1 {
> >               libbpf_prog_type_by_name;
> >               libbpf_set_print;
> >               libbpf_strerror;
> > +             xsk_ring_cons__peek;
> > +             xsk_ring_cons__release;
> > +             xsk_ring_prod__reserve;
> > +             xsk_ring_prod__submit;
> > +             xsk_umem__get_data;
> > +             xsk_umem__get_data_raw;
> > +             xsk_umem__create;
> > +             xsk_socket__create;
> > +             xsk_umem__delete;
> > +             xsk_socket__delete;
> > +             xsk_umem__fd;
> > +             xsk_socket__fd;
>
> This definitely needs to be placed under LIBBPF_0.0.2 where we have bpf_probe_*
> that were added in this cycle.

OK. Will move it to LIBBPF_0.0.2.

Thanks: Magnus

>
> >       local:
> >               *;
> >  };
> > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
  2019-01-31 10:29   ` Daniel Borkmann
@ 2019-01-31 13:55     ` Magnus Karlsson
  2019-01-31 14:05       ` Maciej Fijalkowski
  0 siblings, 1 reply; 16+ messages in thread
From: Magnus Karlsson @ 2019-01-31 13:55 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Magnus Karlsson, Björn Töpel, ast, Network Development,
	Jakub Kicinski, Björn Töpel, Zhang, Qi Z,
	Jesper Dangaard Brouer

On Thu, Jan 31, 2019 at 11:31 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> > This commit adds AF_XDP support to libbpf. The main reason for this is
> > to facilitate writing applications that use AF_XDP by offering
> > higher-level APIs that hide many of the details of the AF_XDP
> > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > offering easy-to-use higher level interfaces of XDP
> > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > applications using it simpler and smaller, and finally also make it
> > possible for applications to benefit from optimizations in the AF_XDP
> > user space access code. Previously, people just copied and pasted the
> > code from the sample application into their application, which is not
> > desirable.
> >
> > The interface is composed of two parts:
> >
> > * Low-level access interface to the four rings and the packet
> > * High-level control plane interface for creating and setting
> >   up umems and af_xdp sockets as well as a simple XDP program.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> [...]
> > +
> > +static __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
> > +{
> > +     __u32 free_entries = r->cached_cons - r->cached_prod;
> > +
> > +     if (free_entries >= nb)
> > +             return free_entries;
> > +
> > +     /* Refresh the local tail pointer.
> > +      * cached_cons is r->size bigger than the real consumer pointer so
> > +      * that this addition can be avoided in the more frequently
> > +      * executed code that computs free_entries in the beginning of
> > +      * this function. Without this optimization it whould have been
> > +      * free_entries = r->cached_prod - r->cached_cons + r->size.
> > +      */
> > +     r->cached_cons = *r->consumer + r->size;
> > +
> > +     return r->cached_cons - r->cached_prod;
> > +}
> > +
> > +static __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
> > +{
> > +     __u32 entries = r->cached_prod - r->cached_cons;
> > +
> > +     if (entries == 0) {
> > +             r->cached_prod = *r->producer;
> > +             entries = r->cached_prod - r->cached_cons;
> > +     }
> > +
> > +     return (entries > nb) ? nb : entries;
> > +}
> > +
> > +size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod, size_t nb, __u32 *idx)
> > +{
> > +     if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
> > +             return 0;
> > +
> > +     *idx = prod->cached_prod;
> > +     prod->cached_prod += nb;
> > +
> > +     return nb;
> > +}
> > +
> > +void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
> > +{
> > +     /* Make sure everything has been written to the ring before signalling
> > +      * this to the kernel.
> > +      */
> > +     smp_wmb();
> > +
> > +     *prod->producer += nb;
> > +}
> > +
> > +size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, size_t nb, __u32 *idx)
> > +{
> > +     size_t entries = xsk_cons_nb_avail(cons, nb);
> > +
> > +     if (likely(entries > 0)) {
> > +             /* Make sure we do not speculatively read the data before
> > +              * we have received the packet buffers from the ring.
> > +              */
> > +             smp_rmb();
> > +
> > +             *idx = cons->cached_cons;
> > +             cons->cached_cons += entries;
> > +     }
> > +
> > +     return entries;
> > +}
> > +
> > +void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
> > +{
> > +     *cons->consumer += nb;
> > +}
> > +
> > +void *xsk_umem__get_data_raw(void *umem_area, __u64 addr)
> > +{
> > +     return &((char *)umem_area)[addr];
> > +}
> > +
> > +void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr)
> > +{
> > +     return &((char *)(umem->umem_area))[addr];
> > +}
>
> Shouldn't some of the above helpers for critical path be exposed as static
> inline functions instead?

Could be. I will make some experiments to see how much performance
would improve. Kept them as non-static so that they could be changed
if someone thought of a faster way of doing these operations. But
might be unnecessary, since we can change parts of it even if they are
static. I will measure and come back to you.

> [...]
> > +static int xsk_parse_nl(void *cookie, void *msg, struct nlattr **tb)
> > +{
> > +     struct nlattr *tb_parsed[IFLA_XDP_MAX + 1];
> > +     struct xsk_nl_info *nl_info = cookie;
> > +     unsigned char mode;
> > +     int err;
> > +
> > +     (void)msg;
>
> Unused?

Yes.

> > +     nl_info->xdp_prog_attached = false;
> > +     if (!tb[IFLA_XDP])
> > +             return 0;
> > +
> > +     err = libbpf_nla_parse_nested(tb_parsed, IFLA_XDP_MAX, tb[IFLA_XDP],
> > +                                   NULL);
> > +     if (err)
> > +             return err;
> > +
> > +     if (!tb_parsed[IFLA_XDP_ATTACHED] || !tb_parsed[IFLA_XDP_FD])
> > +             return 0;
> > +
> > +     mode = libbpf_nla_getattr_u8(tb_parsed[IFLA_XDP_ATTACHED]);
> > +     if (mode == XDP_ATTACHED_NONE)
> > +             return 0;
>
> Probably good to memset and/or init the passed struct xsk_nl_info (e.g.
> fd to -1 etc) such that if some user did not we won't end up with garbage
> values on return 0.

Will do.

> > +     nl_info->xdp_prog_attached = true;
> > +     nl_info->fd = libbpf_nla_getattr_u32(tb_parsed[IFLA_XDP_FD]);
> > +     return 0;
> > +}
> > +
> > +static bool xsk_xdp_prog_attached(struct xsk_socket *xsk)
> > +{
> > +     struct xsk_nl_info nl_info;
> > +     unsigned int nl_pid;
> > +     char err_buf[256];
> > +     int sock, err;
> > +
> > +     sock = libbpf_netlink_open(&nl_pid);
> > +     if (sock < 0)
> > +             return false;
> > +
> > +     nl_info.xdp_prog_attached = false;
> > +     nl_info.fd = 0;
> > +
> > +     err = libbpf_nl_get_link(sock, nl_pid, xsk_parse_nl, &nl_info);
> > +     if (err) {
> > +             libbpf_strerror(err, err_buf, sizeof(err_buf));
> > +             pr_warning("Error:\n%s\n", err_buf);
> > +             return false;
> > +     }
> > +
> > +     xsk->prog_fd = nl_info.fd;
> > +     return nl_info.xdp_prog_attached;
>
> Don't we leak sock here and in error above?

Will fix the fd leaks that you have pointed out here and below. And go
through the code to see if there are more fd leaks. Do I ever learn
;-).

Thanks for your review. Appreciated.

/Magnus

> > +}
> > +
> > +static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> > +{
> > +     char bpf_log_buf[BPF_LOG_BUF_SIZE];
> > +     int err, prog_fd;
> > +
> > +     /* This is the C-program:
> > +      * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
> > +      * {
> > +      *     int *qidconf, index = ctx->rx_queue_index;
> > +      *
> > +      *     // A set entry here means that the correspnding queue_id
> > +      *     // has an active AF_XDP socket bound to it.
> > +      *     qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
> > +      *     if (!qidconf)
> > +      *         return XDP_ABORTED;
> > +      *
> > +      *     if (*qidconf)
> > +      *         return bpf_redirect_map(&xsks_map, index, 0);
> > +      *
> > +      *     return XDP_PASS;
> > +      * }
> > +      */
> > +     struct bpf_insn prog[] = {
> > +             /* r1 = *(u32 *)(r1 + 16) */
> > +             BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
> > +             /* *(u32 *)(r10 - 4) = r1 */
> > +             BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
> > +             BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > +             BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
> > +             BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
> > +             BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
> > +             BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> > +             BPF_MOV32_IMM(BPF_REG_0, 0),
> > +             /* if r1 == 0 goto +8 */
> > +             BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
> > +             BPF_MOV32_IMM(BPF_REG_0, 2),
> > +             /* r1 = *(u32 *)(r1 + 0) */
> > +             BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
> > +             /* if r1 == 0 goto +5 */
> > +             BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
> > +             /* r2 = *(u32 *)(r10 - 4) */
> > +             BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
> > +             BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
> > +             BPF_MOV32_IMM(BPF_REG_3, 0),
> > +             BPF_EMIT_CALL(BPF_FUNC_redirect_map),
> > +             /* The jumps are to this instruction */
> > +             BPF_EXIT_INSN(),
> > +     };
> > +     size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
> > +
> > +     prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
> > +                                "LGPL-2.1 or BSD-2-Clause", 0, bpf_log_buf,
> > +                                BPF_LOG_BUF_SIZE);
> > +     if (prog_fd < 0) {
> > +             pr_warning("BPF log buffer:\n%s", bpf_log_buf);
> > +             return prog_fd;
> > +     }
> > +
> > +     err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags);
> > +     if (err)
> > +             return err;
>
> Leaks prog_fd on error.
>
> > +     xsk->prog_fd = prog_fd;
> > +     return 0;
> > +}
> > +
> > +static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> > +{
> > +     int fd;
> > +
> > +     fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map",
> > +                              sizeof(int), sizeof(int), MAX_QUEUES, 0);
> > +     if (fd < 0)
> > +             return fd;
> > +     xsk->qidconf_map_fd = fd;
> > +
> > +     fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
> > +                              sizeof(int), sizeof(int), MAX_QUEUES, 0);
> > +     if (fd < 0)
> > +             return fd;
>
> Leaks first map fd on error.
>
> > +     xsk->xsks_map_fd = fd;
> > +
> > +     return 0;
> > +}
> > +
> > +static int xsk_update_bpf_maps(struct xsk_socket *xsk, int qidconf_value,
> > +                            int xsks_value)
> > +{
> > +     bool qidconf_map_updated = false, xsks_map_updated = false;
> > +     struct bpf_prog_info prog_info = {};
> > +     __u32 prog_len = sizeof(prog_info);
> > +     struct bpf_map_info map_info;
> > +     __u32 map_len = sizeof(map_info);
> > +     __u32 *map_ids;
> > +     int reset_value = 0;
> > +     __u32 num_maps;
> > +     unsigned int i;
> > +     int err;
> > +
> > +     err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> > +     if (err)
> > +             return err;
> > +
> > +     num_maps = prog_info.nr_map_ids;
> > +
> > +     map_ids = malloc(prog_info.nr_map_ids * sizeof(*map_ids));
>
> calloc()?
>
> > +     if (!map_ids)
> > +             return -ENOMEM;
> > +
> > +     memset(&prog_info, 0, prog_len);
> > +     prog_info.nr_map_ids = num_maps;
> > +     prog_info.map_ids = (__u64)(unsigned long)map_ids;
> > +
> > +     err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> > +     if (err)
> > +             return err;
>
> Leaks map_ids on error.
>
> > +
> > +     for (i = 0; i < prog_info.nr_map_ids; i++) {
> > +             int fd;
> > +
> > +             fd = bpf_map_get_fd_by_id(map_ids[i]);
> > +             if (fd < 0) {
> > +                     err = -errno;
> > +                     goto out;
> > +             }
> > +
> > +             err = bpf_obj_get_info_by_fd(fd, &map_info, &map_len);
> > +             if (err)
> > +                     goto out;
>
> close(fd) on error, also the case for below, please double check everything
> so that no fd leaks by accident into the app.
>
> > +
> > +             if (!strcmp(map_info.name, "qidconf_map")) {
> > +                     err = bpf_map_update_elem(fd, &xsk->queue_id,
> > +                                               &qidconf_value, 0);
> > +                     if (err)
> > +                             goto out;
> > +                     qidconf_map_updated = true;
> > +                     xsk->qidconf_map_fd = fd;
> > +             } else if (!strcmp(map_info.name, "xsks_map")) {
> > +                     err = bpf_map_update_elem(fd, &xsk->queue_id,
> > +                                               &xsks_value, 0);
> > +                     if (err)
> > +                             goto out;> +                    xsks_map_updated = true;
> > +                     xsk->xsks_map_fd = fd;
> > +             }
> > +
> > +             if (qidconf_map_updated && xsks_map_updated)
> > +                     break;
> > +     }
> > +
> > +     if (!(qidconf_map_updated && xsks_map_updated)) {
> > +             err = -ENOENT;
> > +             goto out;
> > +     }
> > +
> > +     return 0;
> > +
> > +out:
> > +     if (qidconf_map_updated)
> > +             (void)bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id,
> > +                                       &reset_value, 0);
> > +     if (xsks_map_updated)
> > +             (void)bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id,
> > +                                       &reset_value, 0);
> > +     return err;
> > +}
> > +
> > +static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> > +{
> > +     bool prog_loaded = false;
> > +     int err;
> > +
> > +     if (!xsk_xdp_prog_attached(xsk)) {
> > +             err = xsk_create_bpf_maps(xsk);
> > +             if (err)
> > +                     goto out_load;
> > +
> > +             err = xsk_load_xdp_prog(xsk);
> > +             if (err)
> > +                     return err;
>
> Needs to undo created maps on error.
>
> > +             prog_loaded = true;
> > +     }
> > +
> > +     err = xsk_update_bpf_maps(xsk, true, xsk->fd);
> > +     if (err)
> > +             goto out_load;
> > +
> > +     return 0;
> > +
> > +out_load:
> > +     if (prog_loaded)
> > +             close(xsk->prog_fd);
>
> Ditto
>
> > +     return err;
> > +}
> > +
> > +int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> > +                    __u32 queue_id, struct xsk_umem *umem,
> > +                    struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
> > +                    const struct xsk_socket_config *usr_config)
> > +{
> > +     struct sockaddr_xdp sxdp = {};
> > +     struct xdp_mmap_offsets off;
> > +     struct xsk_socket *xsk;
> > +     socklen_t optlen;
> > +     void *map;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
  2019-01-31 13:55     ` Magnus Karlsson
@ 2019-01-31 14:05       ` Maciej Fijalkowski
  2019-01-31 14:23         ` Magnus Karlsson
  0 siblings, 1 reply; 16+ messages in thread
From: Maciej Fijalkowski @ 2019-01-31 14:05 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Daniel Borkmann, Magnus Karlsson, Björn Töpel, ast,
	Network Development, Jakub Kicinski, Björn Töpel,
	Zhang, Qi Z, Jesper Dangaard Brouer

On Thu, 31 Jan 2019 14:55:06 +0100
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Thu, Jan 31, 2019 at 11:31 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 01/29/2019 04:12 PM, Magnus Karlsson wrote:  
> > > This commit adds AF_XDP support to libbpf. The main reason for this is
> > > to facilitate writing applications that use AF_XDP by offering
> > > higher-level APIs that hide many of the details of the AF_XDP
> > > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > > offering easy-to-use higher level interfaces of XDP
> > > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > > applications using it simpler and smaller, and finally also make it
> > > possible for applications to benefit from optimizations in the AF_XDP
> > > user space access code. Previously, people just copied and pasted the
> > > code from the sample application into their application, which is not
> > > desirable.
> > >
> > > The interface is composed of two parts:
> > >
> > > * Low-level access interface to the four rings and the packet
> > > * High-level control plane interface for creating and setting
> > >   up umems and af_xdp sockets as well as a simple XDP program.
> > >
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>  
> > [...]  
> > > +
> > > +static __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
> > > +{
> > > +     __u32 free_entries = r->cached_cons - r->cached_prod;
> > > +
> > > +     if (free_entries >= nb)
> > > +             return free_entries;
> > > +
> > > +     /* Refresh the local tail pointer.
> > > +      * cached_cons is r->size bigger than the real consumer pointer so
> > > +      * that this addition can be avoided in the more frequently
> > > +      * executed code that computs free_entries in the beginning of
> > > +      * this function. Without this optimization it whould have been
> > > +      * free_entries = r->cached_prod - r->cached_cons + r->size.
> > > +      */
> > > +     r->cached_cons = *r->consumer + r->size;
> > > +
> > > +     return r->cached_cons - r->cached_prod;
> > > +}
> > > +
> > > +static __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
> > > +{
> > > +     __u32 entries = r->cached_prod - r->cached_cons;
> > > +
> > > +     if (entries == 0) {
> > > +             r->cached_prod = *r->producer;
> > > +             entries = r->cached_prod - r->cached_cons;
> > > +     }
> > > +
> > > +     return (entries > nb) ? nb : entries;
> > > +}
> > > +
> > > +size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod, size_t nb, __u32 *idx)
> > > +{
> > > +     if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
> > > +             return 0;
> > > +
> > > +     *idx = prod->cached_prod;
> > > +     prod->cached_prod += nb;
> > > +
> > > +     return nb;
> > > +}
> > > +
> > > +void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
> > > +{
> > > +     /* Make sure everything has been written to the ring before signalling
> > > +      * this to the kernel.
> > > +      */
> > > +     smp_wmb();
> > > +
> > > +     *prod->producer += nb;
> > > +}
> > > +
> > > +size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, size_t nb, __u32 *idx)
> > > +{
> > > +     size_t entries = xsk_cons_nb_avail(cons, nb);
> > > +
> > > +     if (likely(entries > 0)) {
> > > +             /* Make sure we do not speculatively read the data before
> > > +              * we have received the packet buffers from the ring.
> > > +              */
> > > +             smp_rmb();
> > > +
> > > +             *idx = cons->cached_cons;
> > > +             cons->cached_cons += entries;
> > > +     }
> > > +
> > > +     return entries;
> > > +}
> > > +
> > > +void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
> > > +{
> > > +     *cons->consumer += nb;
> > > +}
> > > +
> > > +void *xsk_umem__get_data_raw(void *umem_area, __u64 addr)
> > > +{
> > > +     return &((char *)umem_area)[addr];
> > > +}
> > > +
> > > +void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr)
> > > +{
> > > +     return &((char *)(umem->umem_area))[addr];
> > > +}  
> >
> > Shouldn't some of the above helpers for critical path be exposed as static
> > inline functions instead?  
> 
> Could be. I will make some experiments to see how much performance
> would improve. Kept them as non-static so that they could be changed
> if someone thought of a faster way of doing these operations. But
> might be unnecessary, since we can change parts of it even if they are
> static. I will measure and come back to you.
> 
> > [...]  
> > > +static int xsk_parse_nl(void *cookie, void *msg, struct nlattr **tb)
> > > +{
> > > +     struct nlattr *tb_parsed[IFLA_XDP_MAX + 1];
> > > +     struct xsk_nl_info *nl_info = cookie;
> > > +     unsigned char mode;
> > > +     int err;
> > > +
> > > +     (void)msg;  
> >
> > Unused?  
> 
> Yes.
>
IMO we should make sure here that we actually do the parsing for xsk->ifindex
only. So nl_info should have an ifindex field which will be set to xsk->ifindex
and here we will do:

struct ifinfomsg *ifinfo = msg;
if (nl_info->ifindex && nl_info->ifindex != ifinfo->ifi_index)
	return 0;

RTM_GETLINK with NLM_F_DUMP flag will give you information from all interfaces
so let's suppose that you are running the xdpsock on eth0 but on eth1 there's a
xdp prog already running. IIUC here you might have a situation that
xsk_xdp_prog_attached will return true because you did the parsing for every
interface, eth1 has xdp prog attached so the nl_info will be filled. Then, in
xsk_setup_xdp_prog you won't create maps.

Check against the xsk->ifindex at the beginning of xsk_parse_nl will prevent
this kind of behavior.

> > > +     nl_info->xdp_prog_attached = false;
> > > +     if (!tb[IFLA_XDP])
> > > +             return 0;
> > > +
> > > +     err = libbpf_nla_parse_nested(tb_parsed, IFLA_XDP_MAX, tb[IFLA_XDP],
> > > +                                   NULL);
> > > +     if (err)
> > > +             return err;
> > > +
> > > +     if (!tb_parsed[IFLA_XDP_ATTACHED] || !tb_parsed[IFLA_XDP_FD])
> > > +             return 0;
> > > +
> > > +     mode = libbpf_nla_getattr_u8(tb_parsed[IFLA_XDP_ATTACHED]);
> > > +     if (mode == XDP_ATTACHED_NONE)
> > > +             return 0;  
> >
> > Probably good to memset and/or init the passed struct xsk_nl_info (e.g.
> > fd to -1 etc) such that if some user did not we won't end up with garbage
> > values on return 0.  
> 
> Will do.
> 
> > > +     nl_info->xdp_prog_attached = true;
> > > +     nl_info->fd = libbpf_nla_getattr_u32(tb_parsed[IFLA_XDP_FD]);
> > > +     return 0;
> > > +}
> > > +
> > > +static bool xsk_xdp_prog_attached(struct xsk_socket *xsk)
> > > +{
> > > +     struct xsk_nl_info nl_info;
> > > +     unsigned int nl_pid;
> > > +     char err_buf[256];
> > > +     int sock, err;
> > > +
> > > +     sock = libbpf_netlink_open(&nl_pid);
> > > +     if (sock < 0)
> > > +             return false;
> > > +
> > > +     nl_info.xdp_prog_attached = false;
> > > +     nl_info.fd = 0;
> > > +
> > > +     err = libbpf_nl_get_link(sock, nl_pid, xsk_parse_nl, &nl_info);
> > > +     if (err) {
> > > +             libbpf_strerror(err, err_buf, sizeof(err_buf));
> > > +             pr_warning("Error:\n%s\n", err_buf);
> > > +             return false;
> > > +     }
> > > +
> > > +     xsk->prog_fd = nl_info.fd;
> > > +     return nl_info.xdp_prog_attached;  
> >
> > Don't we leak sock here and in error above?  
> 
> Will fix the fd leaks that you have pointed out here and below. And go
> through the code to see if there are more fd leaks. Do I ever learn
> ;-).
> 
> Thanks for your review. Appreciated.
> 
> /Magnus
> 
> > > +}
> > > +
> > > +static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> > > +{
> > > +     char bpf_log_buf[BPF_LOG_BUF_SIZE];
> > > +     int err, prog_fd;
> > > +
> > > +     /* This is the C-program:
> > > +      * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
> > > +      * {
> > > +      *     int *qidconf, index = ctx->rx_queue_index;
> > > +      *
> > > +      *     // A set entry here means that the correspnding queue_id
> > > +      *     // has an active AF_XDP socket bound to it.
> > > +      *     qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
> > > +      *     if (!qidconf)
> > > +      *         return XDP_ABORTED;
> > > +      *
> > > +      *     if (*qidconf)
> > > +      *         return bpf_redirect_map(&xsks_map, index, 0);
> > > +      *
> > > +      *     return XDP_PASS;
> > > +      * }
> > > +      */
> > > +     struct bpf_insn prog[] = {
> > > +             /* r1 = *(u32 *)(r1 + 16) */
> > > +             BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
> > > +             /* *(u32 *)(r10 - 4) = r1 */
> > > +             BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
> > > +             BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > > +             BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
> > > +             BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
> > > +             BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
> > > +             BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> > > +             BPF_MOV32_IMM(BPF_REG_0, 0),
> > > +             /* if r1 == 0 goto +8 */
> > > +             BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
> > > +             BPF_MOV32_IMM(BPF_REG_0, 2),
> > > +             /* r1 = *(u32 *)(r1 + 0) */
> > > +             BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
> > > +             /* if r1 == 0 goto +5 */
> > > +             BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
> > > +             /* r2 = *(u32 *)(r10 - 4) */
> > > +             BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
> > > +             BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
> > > +             BPF_MOV32_IMM(BPF_REG_3, 0),
> > > +             BPF_EMIT_CALL(BPF_FUNC_redirect_map),
> > > +             /* The jumps are to this instruction */
> > > +             BPF_EXIT_INSN(),
> > > +     };
> > > +     size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
> > > +
> > > +     prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
> > > +                                "LGPL-2.1 or BSD-2-Clause", 0, bpf_log_buf,
> > > +                                BPF_LOG_BUF_SIZE);
> > > +     if (prog_fd < 0) {
> > > +             pr_warning("BPF log buffer:\n%s", bpf_log_buf);
> > > +             return prog_fd;
> > > +     }
> > > +
> > > +     err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags);
> > > +     if (err)
> > > +             return err;  
> >
> > Leaks prog_fd on error.
> >  
> > > +     xsk->prog_fd = prog_fd;
> > > +     return 0;
> > > +}
> > > +
> > > +static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> > > +{
> > > +     int fd;
> > > +
> > > +     fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map",
> > > +                              sizeof(int), sizeof(int), MAX_QUEUES, 0);
> > > +     if (fd < 0)
> > > +             return fd;
> > > +     xsk->qidconf_map_fd = fd;
> > > +
> > > +     fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
> > > +                              sizeof(int), sizeof(int), MAX_QUEUES, 0);
> > > +     if (fd < 0)
> > > +             return fd;  
> >
> > Leaks first map fd on error.
> >  
> > > +     xsk->xsks_map_fd = fd;
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static int xsk_update_bpf_maps(struct xsk_socket *xsk, int qidconf_value,
> > > +                            int xsks_value)
> > > +{
> > > +     bool qidconf_map_updated = false, xsks_map_updated = false;
> > > +     struct bpf_prog_info prog_info = {};
> > > +     __u32 prog_len = sizeof(prog_info);
> > > +     struct bpf_map_info map_info;
> > > +     __u32 map_len = sizeof(map_info);
> > > +     __u32 *map_ids;
> > > +     int reset_value = 0;
> > > +     __u32 num_maps;
> > > +     unsigned int i;
> > > +     int err;
> > > +
> > > +     err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> > > +     if (err)
> > > +             return err;
> > > +
> > > +     num_maps = prog_info.nr_map_ids;
> > > +
> > > +     map_ids = malloc(prog_info.nr_map_ids * sizeof(*map_ids));  
> >
> > calloc()?
> >  
> > > +     if (!map_ids)
> > > +             return -ENOMEM;
> > > +
> > > +     memset(&prog_info, 0, prog_len);
> > > +     prog_info.nr_map_ids = num_maps;
> > > +     prog_info.map_ids = (__u64)(unsigned long)map_ids;
> > > +
> > > +     err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> > > +     if (err)
> > > +             return err;  
> >
> > Leaks map_ids on error.
> >  
> > > +
> > > +     for (i = 0; i < prog_info.nr_map_ids; i++) {
> > > +             int fd;
> > > +
> > > +             fd = bpf_map_get_fd_by_id(map_ids[i]);
> > > +             if (fd < 0) {
> > > +                     err = -errno;
> > > +                     goto out;
> > > +             }
> > > +
> > > +             err = bpf_obj_get_info_by_fd(fd, &map_info, &map_len);
> > > +             if (err)
> > > +                     goto out;  
> >
> > close(fd) on error, also the case for below, please double check everything
> > so that no fd leaks by accident into the app.
> >  
> > > +
> > > +             if (!strcmp(map_info.name, "qidconf_map")) {
> > > +                     err = bpf_map_update_elem(fd, &xsk->queue_id,
> > > +                                               &qidconf_value, 0);
> > > +                     if (err)
> > > +                             goto out;
> > > +                     qidconf_map_updated = true;
> > > +                     xsk->qidconf_map_fd = fd;
> > > +             } else if (!strcmp(map_info.name, "xsks_map")) {
> > > +                     err = bpf_map_update_elem(fd, &xsk->queue_id,
> > > +                                               &xsks_value, 0);
> > > +                     if (err)
> > > +                             goto out;> +                    xsks_map_updated = true;
> > > +                     xsk->xsks_map_fd = fd;
> > > +             }
> > > +
> > > +             if (qidconf_map_updated && xsks_map_updated)
> > > +                     break;
> > > +     }
> > > +
> > > +     if (!(qidconf_map_updated && xsks_map_updated)) {
> > > +             err = -ENOENT;
> > > +             goto out;
> > > +     }
> > > +
> > > +     return 0;
> > > +
> > > +out:
> > > +     if (qidconf_map_updated)
> > > +             (void)bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id,
> > > +                                       &reset_value, 0);
> > > +     if (xsks_map_updated)
> > > +             (void)bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id,
> > > +                                       &reset_value, 0);
> > > +     return err;
> > > +}
> > > +
> > > +static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> > > +{
> > > +     bool prog_loaded = false;
> > > +     int err;
> > > +
> > > +     if (!xsk_xdp_prog_attached(xsk)) {
> > > +             err = xsk_create_bpf_maps(xsk);
> > > +             if (err)
> > > +                     goto out_load;
> > > +
> > > +             err = xsk_load_xdp_prog(xsk);
> > > +             if (err)
> > > +                     return err;  
> >
> > Needs to undo created maps on error.
> >  
> > > +             prog_loaded = true;
> > > +     }
> > > +
> > > +     err = xsk_update_bpf_maps(xsk, true, xsk->fd);
> > > +     if (err)
> > > +             goto out_load;
> > > +
> > > +     return 0;
> > > +
> > > +out_load:
> > > +     if (prog_loaded)
> > > +             close(xsk->prog_fd);  
> >
> > Ditto
> >  
> > > +     return err;
> > > +}
> > > +
> > > +int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> > > +                    __u32 queue_id, struct xsk_umem *umem,
> > > +                    struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
> > > +                    const struct xsk_socket_config *usr_config)
> > > +{
> > > +     struct sockaddr_xdp sxdp = {};
> > > +     struct xdp_mmap_offsets off;
> > > +     struct xsk_socket *xsk;
> > > +     socklen_t optlen;
> > > +     void *map;  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets
  2019-01-31 14:05       ` Maciej Fijalkowski
@ 2019-01-31 14:23         ` Magnus Karlsson
  0 siblings, 0 replies; 16+ messages in thread
From: Magnus Karlsson @ 2019-01-31 14:23 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Daniel Borkmann, Magnus Karlsson, Björn Töpel, ast,
	Network Development, Jakub Kicinski, Björn Töpel,
	Zhang, Qi Z, Jesper Dangaard Brouer

On Thu, Jan 31, 2019 at 3:06 PM Maciej Fijalkowski
<maciejromanfijalkowski@gmail.com> wrote:
>
> On Thu, 31 Jan 2019 14:55:06 +0100
> Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
>
> > On Thu, Jan 31, 2019 at 11:31 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > > On 01/29/2019 04:12 PM, Magnus Karlsson wrote:
> > > > This commit adds AF_XDP support to libbpf. The main reason for this is
> > > > to facilitate writing applications that use AF_XDP by offering
> > > > higher-level APIs that hide many of the details of the AF_XDP
> > > > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > > > offering easy-to-use higher level interfaces of XDP
> > > > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > > > applications using it simpler and smaller, and finally also make it
> > > > possible for applications to benefit from optimizations in the AF_XDP
> > > > user space access code. Previously, people just copied and pasted the
> > > > code from the sample application into their application, which is not
> > > > desirable.
> > > >
> > > > The interface is composed of two parts:
> > > >
> > > > * Low-level access interface to the four rings and the packet
> > > > * High-level control plane interface for creating and setting
> > > >   up umems and af_xdp sockets as well as a simple XDP program.
> > > >
> > > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > > [...]
> > > > +
> > > > +static __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
> > > > +{
> > > > +     __u32 free_entries = r->cached_cons - r->cached_prod;
> > > > +
> > > > +     if (free_entries >= nb)
> > > > +             return free_entries;
> > > > +
> > > > +     /* Refresh the local tail pointer.
> > > > +      * cached_cons is r->size bigger than the real consumer pointer so
> > > > +      * that this addition can be avoided in the more frequently
> > > > +      * executed code that computs free_entries in the beginning of
> > > > +      * this function. Without this optimization it whould have been
> > > > +      * free_entries = r->cached_prod - r->cached_cons + r->size.
> > > > +      */
> > > > +     r->cached_cons = *r->consumer + r->size;
> > > > +
> > > > +     return r->cached_cons - r->cached_prod;
> > > > +}
> > > > +
> > > > +static __u32 xsk_cons_nb_avail(struct xsk_ring_cons *r, __u32 nb)
> > > > +{
> > > > +     __u32 entries = r->cached_prod - r->cached_cons;
> > > > +
> > > > +     if (entries == 0) {
> > > > +             r->cached_prod = *r->producer;
> > > > +             entries = r->cached_prod - r->cached_cons;
> > > > +     }
> > > > +
> > > > +     return (entries > nb) ? nb : entries;
> > > > +}
> > > > +
> > > > +size_t xsk_ring_prod__reserve(struct xsk_ring_prod *prod, size_t nb, __u32 *idx)
> > > > +{
> > > > +     if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
> > > > +             return 0;
> > > > +
> > > > +     *idx = prod->cached_prod;
> > > > +     prod->cached_prod += nb;
> > > > +
> > > > +     return nb;
> > > > +}
> > > > +
> > > > +void xsk_ring_prod__submit(struct xsk_ring_prod *prod, size_t nb)
> > > > +{
> > > > +     /* Make sure everything has been written to the ring before signalling
> > > > +      * this to the kernel.
> > > > +      */
> > > > +     smp_wmb();
> > > > +
> > > > +     *prod->producer += nb;
> > > > +}
> > > > +
> > > > +size_t xsk_ring_cons__peek(struct xsk_ring_cons *cons, size_t nb, __u32 *idx)
> > > > +{
> > > > +     size_t entries = xsk_cons_nb_avail(cons, nb);
> > > > +
> > > > +     if (likely(entries > 0)) {
> > > > +             /* Make sure we do not speculatively read the data before
> > > > +              * we have received the packet buffers from the ring.
> > > > +              */
> > > > +             smp_rmb();
> > > > +
> > > > +             *idx = cons->cached_cons;
> > > > +             cons->cached_cons += entries;
> > > > +     }
> > > > +
> > > > +     return entries;
> > > > +}
> > > > +
> > > > +void xsk_ring_cons__release(struct xsk_ring_cons *cons, size_t nb)
> > > > +{
> > > > +     *cons->consumer += nb;
> > > > +}
> > > > +
> > > > +void *xsk_umem__get_data_raw(void *umem_area, __u64 addr)
> > > > +{
> > > > +     return &((char *)umem_area)[addr];
> > > > +}
> > > > +
> > > > +void *xsk_umem__get_data(struct xsk_umem *umem, __u64 addr)
> > > > +{
> > > > +     return &((char *)(umem->umem_area))[addr];
> > > > +}
> > >
> > > Shouldn't some of the above helpers for critical path be exposed as static
> > > inline functions instead?
> >
> > Could be. I will make some experiments to see how much performance
> > would improve. Kept them as non-static so that they could be changed
> > if someone thought of a faster way of doing these operations. But
> > might be unnecessary, since we can change parts of it even if they are
> > static. I will measure and come back to you.
> >
> > > [...]
> > > > +static int xsk_parse_nl(void *cookie, void *msg, struct nlattr **tb)
> > > > +{
> > > > +     struct nlattr *tb_parsed[IFLA_XDP_MAX + 1];
> > > > +     struct xsk_nl_info *nl_info = cookie;
> > > > +     unsigned char mode;
> > > > +     int err;
> > > > +
> > > > +     (void)msg;
> > >
> > > Unused?
> >
> > Yes.
> >
> IMO we should make sure here that we actually do the parsing for xsk->ifindex
> only. So nl_info should have an ifindex field which will be set to xsk->ifindex
> and here we will do:
>
> struct ifinfomsg *ifinfo = msg;
> if (nl_info->ifindex && nl_info->ifindex != ifinfo->ifi_index)
>         return 0;
>
> RTM_GETLINK with NLM_F_DUMP flag will give you information from all interfaces
> so let's suppose that you are running the xdpsock on eth0 but on eth1 there's a
> xdp prog already running. IIUC here you might have a situation that
> xsk_xdp_prog_attached will return true because you did the parsing for every
> interface, eth1 has xdp prog attached so the nl_info will be filled. Then, in
> xsk_setup_xdp_prog you won't create maps.
>
> Check against the xsk->ifindex at the beginning of xsk_parse_nl will prevent
> this kind of behavior.

Thanks for catching this Maciej. Will fix.

/Magnus

> > > > +     nl_info->xdp_prog_attached = false;
> > > > +     if (!tb[IFLA_XDP])
> > > > +             return 0;
> > > > +
> > > > +     err = libbpf_nla_parse_nested(tb_parsed, IFLA_XDP_MAX, tb[IFLA_XDP],
> > > > +                                   NULL);
> > > > +     if (err)
> > > > +             return err;
> > > > +
> > > > +     if (!tb_parsed[IFLA_XDP_ATTACHED] || !tb_parsed[IFLA_XDP_FD])
> > > > +             return 0;
> > > > +
> > > > +     mode = libbpf_nla_getattr_u8(tb_parsed[IFLA_XDP_ATTACHED]);
> > > > +     if (mode == XDP_ATTACHED_NONE)
> > > > +             return 0;
> > >
> > > Probably good to memset and/or init the passed struct xsk_nl_info (e.g.
> > > fd to -1 etc) such that if some user did not we won't end up with garbage
> > > values on return 0.
> >
> > Will do.
> >
> > > > +     nl_info->xdp_prog_attached = true;
> > > > +     nl_info->fd = libbpf_nla_getattr_u32(tb_parsed[IFLA_XDP_FD]);
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static bool xsk_xdp_prog_attached(struct xsk_socket *xsk)
> > > > +{
> > > > +     struct xsk_nl_info nl_info;
> > > > +     unsigned int nl_pid;
> > > > +     char err_buf[256];
> > > > +     int sock, err;
> > > > +
> > > > +     sock = libbpf_netlink_open(&nl_pid);
> > > > +     if (sock < 0)
> > > > +             return false;
> > > > +
> > > > +     nl_info.xdp_prog_attached = false;
> > > > +     nl_info.fd = 0;
> > > > +
> > > > +     err = libbpf_nl_get_link(sock, nl_pid, xsk_parse_nl, &nl_info);
> > > > +     if (err) {
> > > > +             libbpf_strerror(err, err_buf, sizeof(err_buf));
> > > > +             pr_warning("Error:\n%s\n", err_buf);
> > > > +             return false;
> > > > +     }
> > > > +
> > > > +     xsk->prog_fd = nl_info.fd;
> > > > +     return nl_info.xdp_prog_attached;
> > >
> > > Don't we leak sock here and in error above?
> >
> > Will fix the fd leaks that you have pointed out here and below. And go
> > through the code to see if there are more fd leaks. Do I ever learn
> > ;-).
> >
> > Thanks for your review. Appreciated.
> >
> > /Magnus
> >
> > > > +}
> > > > +
> > > > +static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> > > > +{
> > > > +     char bpf_log_buf[BPF_LOG_BUF_SIZE];
> > > > +     int err, prog_fd;
> > > > +
> > > > +     /* This is the C-program:
> > > > +      * SEC("xdp_sock") int xdp_sock_prog(struct xdp_md *ctx)
> > > > +      * {
> > > > +      *     int *qidconf, index = ctx->rx_queue_index;
> > > > +      *
> > > > +      *     // A set entry here means that the correspnding queue_id
> > > > +      *     // has an active AF_XDP socket bound to it.
> > > > +      *     qidconf = bpf_map_lookup_elem(&qidconf_map, &index);
> > > > +      *     if (!qidconf)
> > > > +      *         return XDP_ABORTED;
> > > > +      *
> > > > +      *     if (*qidconf)
> > > > +      *         return bpf_redirect_map(&xsks_map, index, 0);
> > > > +      *
> > > > +      *     return XDP_PASS;
> > > > +      * }
> > > > +      */
> > > > +     struct bpf_insn prog[] = {
> > > > +             /* r1 = *(u32 *)(r1 + 16) */
> > > > +             BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 16),
> > > > +             /* *(u32 *)(r10 - 4) = r1 */
> > > > +             BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_1, -4),
> > > > +             BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> > > > +             BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
> > > > +             BPF_LD_MAP_FD(BPF_REG_1, xsk->qidconf_map_fd),
> > > > +             BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
> > > > +             BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> > > > +             BPF_MOV32_IMM(BPF_REG_0, 0),
> > > > +             /* if r1 == 0 goto +8 */
> > > > +             BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 8),
> > > > +             BPF_MOV32_IMM(BPF_REG_0, 2),
> > > > +             /* r1 = *(u32 *)(r1 + 0) */
> > > > +             BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1, 0),
> > > > +             /* if r1 == 0 goto +5 */
> > > > +             BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 5),
> > > > +             /* r2 = *(u32 *)(r10 - 4) */
> > > > +             BPF_LD_MAP_FD(BPF_REG_1, xsk->xsks_map_fd),
> > > > +             BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_10, -4),
> > > > +             BPF_MOV32_IMM(BPF_REG_3, 0),
> > > > +             BPF_EMIT_CALL(BPF_FUNC_redirect_map),
> > > > +             /* The jumps are to this instruction */
> > > > +             BPF_EXIT_INSN(),
> > > > +     };
> > > > +     size_t insns_cnt = sizeof(prog) / sizeof(struct bpf_insn);
> > > > +
> > > > +     prog_fd = bpf_load_program(BPF_PROG_TYPE_XDP, prog, insns_cnt,
> > > > +                                "LGPL-2.1 or BSD-2-Clause", 0, bpf_log_buf,
> > > > +                                BPF_LOG_BUF_SIZE);
> > > > +     if (prog_fd < 0) {
> > > > +             pr_warning("BPF log buffer:\n%s", bpf_log_buf);
> > > > +             return prog_fd;
> > > > +     }
> > > > +
> > > > +     err = bpf_set_link_xdp_fd(xsk->ifindex, prog_fd, xsk->config.xdp_flags);
> > > > +     if (err)
> > > > +             return err;
> > >
> > > Leaks prog_fd on error.
> > >
> > > > +     xsk->prog_fd = prog_fd;
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> > > > +{
> > > > +     int fd;
> > > > +
> > > > +     fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "qidconf_map",
> > > > +                              sizeof(int), sizeof(int), MAX_QUEUES, 0);
> > > > +     if (fd < 0)
> > > > +             return fd;
> > > > +     xsk->qidconf_map_fd = fd;
> > > > +
> > > > +     fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
> > > > +                              sizeof(int), sizeof(int), MAX_QUEUES, 0);
> > > > +     if (fd < 0)
> > > > +             return fd;
> > >
> > > Leaks first map fd on error.
> > >
> > > > +     xsk->xsks_map_fd = fd;
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static int xsk_update_bpf_maps(struct xsk_socket *xsk, int qidconf_value,
> > > > +                            int xsks_value)
> > > > +{
> > > > +     bool qidconf_map_updated = false, xsks_map_updated = false;
> > > > +     struct bpf_prog_info prog_info = {};
> > > > +     __u32 prog_len = sizeof(prog_info);
> > > > +     struct bpf_map_info map_info;
> > > > +     __u32 map_len = sizeof(map_info);
> > > > +     __u32 *map_ids;
> > > > +     int reset_value = 0;
> > > > +     __u32 num_maps;
> > > > +     unsigned int i;
> > > > +     int err;
> > > > +
> > > > +     err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> > > > +     if (err)
> > > > +             return err;
> > > > +
> > > > +     num_maps = prog_info.nr_map_ids;
> > > > +
> > > > +     map_ids = malloc(prog_info.nr_map_ids * sizeof(*map_ids));
> > >
> > > calloc()?
> > >
> > > > +     if (!map_ids)
> > > > +             return -ENOMEM;
> > > > +
> > > > +     memset(&prog_info, 0, prog_len);
> > > > +     prog_info.nr_map_ids = num_maps;
> > > > +     prog_info.map_ids = (__u64)(unsigned long)map_ids;
> > > > +
> > > > +     err = bpf_obj_get_info_by_fd(xsk->prog_fd, &prog_info, &prog_len);
> > > > +     if (err)
> > > > +             return err;
> > >
> > > Leaks map_ids on error.
> > >
> > > > +
> > > > +     for (i = 0; i < prog_info.nr_map_ids; i++) {
> > > > +             int fd;
> > > > +
> > > > +             fd = bpf_map_get_fd_by_id(map_ids[i]);
> > > > +             if (fd < 0) {
> > > > +                     err = -errno;
> > > > +                     goto out;
> > > > +             }
> > > > +
> > > > +             err = bpf_obj_get_info_by_fd(fd, &map_info, &map_len);
> > > > +             if (err)
> > > > +                     goto out;
> > >
> > > close(fd) on error, also the case for below, please double check everything
> > > so that no fd leaks by accident into the app.
> > >
> > > > +
> > > > +             if (!strcmp(map_info.name, "qidconf_map")) {
> > > > +                     err = bpf_map_update_elem(fd, &xsk->queue_id,
> > > > +                                               &qidconf_value, 0);
> > > > +                     if (err)
> > > > +                             goto out;
> > > > +                     qidconf_map_updated = true;
> > > > +                     xsk->qidconf_map_fd = fd;
> > > > +             } else if (!strcmp(map_info.name, "xsks_map")) {
> > > > +                     err = bpf_map_update_elem(fd, &xsk->queue_id,
> > > > +                                               &xsks_value, 0);
> > > > +                     if (err)
> > > > +                             goto out;> +                    xsks_map_updated = true;
> > > > +                     xsk->xsks_map_fd = fd;
> > > > +             }
> > > > +
> > > > +             if (qidconf_map_updated && xsks_map_updated)
> > > > +                     break;
> > > > +     }
> > > > +
> > > > +     if (!(qidconf_map_updated && xsks_map_updated)) {
> > > > +             err = -ENOENT;
> > > > +             goto out;
> > > > +     }
> > > > +
> > > > +     return 0;
> > > > +
> > > > +out:
> > > > +     if (qidconf_map_updated)
> > > > +             (void)bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id,
> > > > +                                       &reset_value, 0);
> > > > +     if (xsks_map_updated)
> > > > +             (void)bpf_map_update_elem(xsk->xsks_map_fd, &xsk->queue_id,
> > > > +                                       &reset_value, 0);
> > > > +     return err;
> > > > +}
> > > > +
> > > > +static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> > > > +{
> > > > +     bool prog_loaded = false;
> > > > +     int err;
> > > > +
> > > > +     if (!xsk_xdp_prog_attached(xsk)) {
> > > > +             err = xsk_create_bpf_maps(xsk);
> > > > +             if (err)
> > > > +                     goto out_load;
> > > > +
> > > > +             err = xsk_load_xdp_prog(xsk);
> > > > +             if (err)
> > > > +                     return err;
> > >
> > > Needs to undo created maps on error.
> > >
> > > > +             prog_loaded = true;
> > > > +     }
> > > > +
> > > > +     err = xsk_update_bpf_maps(xsk, true, xsk->fd);
> > > > +     if (err)
> > > > +             goto out_load;
> > > > +
> > > > +     return 0;
> > > > +
> > > > +out_load:
> > > > +     if (prog_loaded)
> > > > +             close(xsk->prog_fd);
> > >
> > > Ditto
> > >
> > > > +     return err;
> > > > +}
> > > > +
> > > > +int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> > > > +                    __u32 queue_id, struct xsk_umem *umem,
> > > > +                    struct xsk_ring_cons *rx, struct xsk_ring_prod *tx,
> > > > +                    const struct xsk_socket_config *usr_config)
> > > > +{
> > > > +     struct sockaddr_xdp sxdp = {};
> > > > +     struct xdp_mmap_offsets off;
> > > > +     struct xsk_socket *xsk;
> > > > +     socklen_t optlen;
> > > > +     void *map;
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
  2019-01-29 15:12 ` [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file Magnus Karlsson
  2019-01-31 10:04   ` Daniel Borkmann
@ 2019-01-31 18:52   ` Alexei Starovoitov
  2019-01-31 19:12     ` Yonghong Song
  2019-01-31 19:35   ` Y Song
  2 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2019-01-31 18:52 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jakub.kicinski, bjorn.topel,
	qi.z.zhang, brouer, acme, yhs

On Tue, Jan 29, 2019 at 04:12:15PM +0100, Magnus Karlsson wrote:
> Move the pr_*() functions in libbpf.c to a common header file called
> libbpf_internal.h. This so that the later libbpf AF_XDP helper library
> code in xsk.c can use these printing functions too.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  tools/lib/bpf/libbpf.c          | 30 +-----------------------------
>  tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 29 deletions(-)
>  create mode 100644 tools/lib/bpf/libbpf_internal.h
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 2ccde17..1d7fe26 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -39,6 +39,7 @@
>  #include <gelf.h>
>  
>  #include "libbpf.h"
> +#include "libbpf_internal.h"
>  #include "bpf.h"
>  #include "btf.h"
>  #include "str_error.h"
> @@ -51,34 +52,6 @@
>  #define BPF_FS_MAGIC		0xcafe4a11
>  #endif
>  
> -#define __printf(a, b)	__attribute__((format(printf, a, b)))
> -
> -__printf(1, 2)
> -static int __base_pr(const char *format, ...)
> -{
> -	va_list args;
> -	int err;
> -
> -	va_start(args, format);
> -	err = vfprintf(stderr, format, args);
> -	va_end(args);
> -	return err;
> -}
> -
> -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
> -
> -#define __pr(func, fmt, ...)	\
> -do {				\
> -	if ((func))		\
> -		(func)("libbpf: " fmt, ##__VA_ARGS__); \
> -} while (0)
> -
> -#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
> -#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
> -#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)

since these funcs are about to be used more widely
let's clean this api while we still can.
How about we convert it to single pr_log callback function
with verbosity flag instead of three callbacks ?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
  2019-01-31 18:52   ` Alexei Starovoitov
@ 2019-01-31 19:12     ` Yonghong Song
  2019-01-31 20:47       ` Arnaldo CArvalho de Melo
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2019-01-31 19:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jakub.kicinski, bjorn.topel,
	qi.z.zhang, brouer, acme



On 1/31/19 10:52 AM, Alexei Starovoitov wrote:
> On Tue, Jan 29, 2019 at 04:12:15PM +0100, Magnus Karlsson wrote:
>> Move the pr_*() functions in libbpf.c to a common header file called
>> libbpf_internal.h. This so that the later libbpf AF_XDP helper library
>> code in xsk.c can use these printing functions too.
>>
>> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
>> ---
>>   tools/lib/bpf/libbpf.c          | 30 +-----------------------------
>>   tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 42 insertions(+), 29 deletions(-)
>>   create mode 100644 tools/lib/bpf/libbpf_internal.h
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 2ccde17..1d7fe26 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -39,6 +39,7 @@
>>   #include <gelf.h>
>>   
>>   #include "libbpf.h"
>> +#include "libbpf_internal.h"
>>   #include "bpf.h"
>>   #include "btf.h"
>>   #include "str_error.h"
>> @@ -51,34 +52,6 @@
>>   #define BPF_FS_MAGIC		0xcafe4a11
>>   #endif
>>   
>> -#define __printf(a, b)	__attribute__((format(printf, a, b)))
>> -
>> -__printf(1, 2)
>> -static int __base_pr(const char *format, ...)
>> -{
>> -	va_list args;
>> -	int err;
>> -
>> -	va_start(args, format);
>> -	err = vfprintf(stderr, format, args);
>> -	va_end(args);
>> -	return err;
>> -}
>> -
>> -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
>> -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
>> -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
>> -
>> -#define __pr(func, fmt, ...)	\
>> -do {				\
>> -	if ((func))		\
>> -		(func)("libbpf: " fmt, ##__VA_ARGS__); \
>> -} while (0)
>> -
>> -#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
>> -#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
>> -#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)
> 
> since these funcs are about to be used more widely
> let's clean this api while we still can.
> How about we convert it to single pr_log callback function
> with verbosity flag instead of three callbacks ?

Another possible change related to the API function
   libbpf_set_print

Currently the function takes three parameters,

LIBBPF_API void libbpf_set_print(libbpf_print_fn_t warn,
                                  libbpf_print_fn_t info,
                                  libbpf_print_fn_t debug);

So it currently supports three level of debugging output.
Is it possible in the future more debugging output level
may be supported? if this is the case, maybe we could
change the API function libbpf_set_print to something like
the below before the library version bumps into 1.0.0?

  LIBBPF_API void libbpf_set_print(libbpf_print_fn_t dprint);
and
   typedef int (*libbpf_print_fn_t)(enum libbpf_debug_level level,
                                    const char *, ...)
         __attribute__((format(printf, 1, 2)));
   enum libbpf_debug_level {
     LIBBPF_WARN,
     LIBBPF_INFO,
     LIBBPF_DEBUG,
   };

Basically, the user provided callback function must have
the first parameters as the level.

Any comments? Arnaldo?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
  2019-01-29 15:12 ` [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file Magnus Karlsson
  2019-01-31 10:04   ` Daniel Borkmann
  2019-01-31 18:52   ` Alexei Starovoitov
@ 2019-01-31 19:35   ` Y Song
  2 siblings, 0 replies; 16+ messages in thread
From: Y Song @ 2019-01-31 19:35 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Björn Töpel, Alexei Starovoitov, Daniel Borkmann,
	netdev, Jakub Kicinski, Björn Töpel, qi.z.zhang,
	Jesper Dangaard Brouer

On Tue, Jan 29, 2019 at 7:13 AM Magnus Karlsson
<magnus.karlsson@intel.com> wrote:
>
> Move the pr_*() functions in libbpf.c to a common header file called
> libbpf_internal.h. This so that the later libbpf AF_XDP helper library
> code in xsk.c can use these printing functions too.
>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>  tools/lib/bpf/libbpf.c          | 30 +-----------------------------
>  tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+), 29 deletions(-)
>  create mode 100644 tools/lib/bpf/libbpf_internal.h
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 2ccde17..1d7fe26 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -39,6 +39,7 @@
>  #include <gelf.h>
>
>  #include "libbpf.h"
> +#include "libbpf_internal.h"
>  #include "bpf.h"
>  #include "btf.h"
>  #include "str_error.h"
> @@ -51,34 +52,6 @@
>  #define BPF_FS_MAGIC           0xcafe4a11
>  #endif
>
> -#define __printf(a, b) __attribute__((format(printf, a, b)))
> -
> -__printf(1, 2)
> -static int __base_pr(const char *format, ...)
> -{
> -       va_list args;
> -       int err;
> -
> -       va_start(args, format);
> -       err = vfprintf(stderr, format, args);
> -       va_end(args);
> -       return err;
> -}
> -
> -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
> -
> -#define __pr(func, fmt, ...)   \
> -do {                           \
> -       if ((func))             \
> -               (func)("libbpf: " fmt, ##__VA_ARGS__); \
> -} while (0)
> -
> -#define pr_warning(fmt, ...)   __pr(__pr_warning, fmt, ##__VA_ARGS__)
> -#define pr_info(fmt, ...)      __pr(__pr_info, fmt, ##__VA_ARGS__)
> -#define pr_debug(fmt, ...)     __pr(__pr_debug, fmt, ##__VA_ARGS__)
> -
>  void libbpf_set_print(libbpf_print_fn_t warn,
>                       libbpf_print_fn_t info,
>                       libbpf_print_fn_t debug)
> @@ -96,7 +69,6 @@ void libbpf_set_print(libbpf_print_fn_t warn,
>                 goto out;               \
>  } while(0)
>
> -
>  /* Copied from tools/perf/util/util.h */
>  #ifndef zfree
>  # define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> new file mode 100644
> index 0000000..951c078
> --- /dev/null
> +++ b/tools/lib/bpf/libbpf_internal.h
> @@ -0,0 +1,41 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +
> +/*
> + * Common internal libbpf functions and definitions.
> + *
> + * Copyright (C) 2013-2015 Alexei Starovoitov <ast@kernel.org>
> + * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
> + * Copyright (C) 2015 Huawei Inc.
> + */
> +#ifndef __LIBBPF_LIBBPF_INTERNAL_H
> +#define __LIBBPF_LIBBPF_INTERNAL_H
> +
> +#define __printf(a, b) __attribute__((format(printf, a, b)))
> +
> +__printf(1, 2)
> +static int __base_pr(const char *format, ...)
> +{
> +       va_list args;
> +       int err;
> +
> +       va_start(args, format);
> +       err = vfprintf(stderr, format, args);
> +       va_end(args);
> +       return err;
> +}

Hmm. This will introduce this static function into every .c file.

Maybe we could define a global function in libbpf.c like
  libbpf_debug_pr(level, ...)
This function will be internal to the library and implemented in libbpf.c.
All files can call this function to print out the debug information
based on level?

> +
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> +static __maybe_unused __printf(1, 2) libbpf_print_fn_t __pr_debug;
> +
> +#define __pr(func, fmt, ...)   \
> +do {                           \
> +       if ((func))             \
> +               (func)("libbpf: " fmt, ##__VA_ARGS__); \
> +} while (0)
> +
> +#define pr_warning(fmt, ...)   __pr(__pr_warning, fmt, ##__VA_ARGS__)
> +#define pr_info(fmt, ...)      __pr(__pr_info, fmt, ##__VA_ARGS__)
> +#define pr_debug(fmt, ...)     __pr(__pr_debug, fmt, ##__VA_ARGS__)
> +
> +#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file
  2019-01-31 19:12     ` Yonghong Song
@ 2019-01-31 20:47       ` Arnaldo CArvalho de Melo
  0 siblings, 0 replies; 16+ messages in thread
From: Arnaldo CArvalho de Melo @ 2019-01-31 20:47 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Magnus Karlsson, bjorn.topel, ast, daniel,
	netdev, jakub.kicinski, bjorn.topel, qi.z.zhang, brouer

Em Thu, Jan 31, 2019 at 07:12:33PM +0000, Yonghong Song escreveu:
> 
> 
> On 1/31/19 10:52 AM, Alexei Starovoitov wrote:
> > On Tue, Jan 29, 2019 at 04:12:15PM +0100, Magnus Karlsson wrote:
> >> Move the pr_*() functions in libbpf.c to a common header file called
> >> libbpf_internal.h. This so that the later libbpf AF_XDP helper library
> >> code in xsk.c can use these printing functions too.
> >>
> >> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> >> ---
> >>   tools/lib/bpf/libbpf.c          | 30 +-----------------------------
> >>   tools/lib/bpf/libbpf_internal.h | 41 +++++++++++++++++++++++++++++++++++++++++
> >>   2 files changed, 42 insertions(+), 29 deletions(-)
> >>   create mode 100644 tools/lib/bpf/libbpf_internal.h
> >>
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index 2ccde17..1d7fe26 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -39,6 +39,7 @@
> >>   #include <gelf.h>
> >>   
> >>   #include "libbpf.h"
> >> +#include "libbpf_internal.h"
> >>   #include "bpf.h"
> >>   #include "btf.h"
> >>   #include "str_error.h"
> >> @@ -51,34 +52,6 @@
> >>   #define BPF_FS_MAGIC		0xcafe4a11
> >>   #endif
> >>   
> >> -#define __printf(a, b)	__attribute__((format(printf, a, b)))
> >> -
> >> -__printf(1, 2)
> >> -static int __base_pr(const char *format, ...)
> >> -{
> >> -	va_list args;
> >> -	int err;
> >> -
> >> -	va_start(args, format);
> >> -	err = vfprintf(stderr, format, args);
> >> -	va_end(args);
> >> -	return err;
> >> -}
> >> -
> >> -static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
> >> -static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
> >> -static __printf(1, 2) libbpf_print_fn_t __pr_debug;
> >> -
> >> -#define __pr(func, fmt, ...)	\
> >> -do {				\
> >> -	if ((func))		\
> >> -		(func)("libbpf: " fmt, ##__VA_ARGS__); \
> >> -} while (0)
> >> -
> >> -#define pr_warning(fmt, ...)	__pr(__pr_warning, fmt, ##__VA_ARGS__)
> >> -#define pr_info(fmt, ...)	__pr(__pr_info, fmt, ##__VA_ARGS__)
> >> -#define pr_debug(fmt, ...)	__pr(__pr_debug, fmt, ##__VA_ARGS__)
> > 
> > since these funcs are about to be used more widely
> > let's clean this api while we still can.
> > How about we convert it to single pr_log callback function
> > with verbosity flag instead of three callbacks ?

Probably. I just wanted to keep, in the source code, the
pr_{debug,warn,info} APIs, to reduce the learning curve for people used
to program in the kernel and in tools/perf/.

> Another possible change related to the API function
>    libbpf_set_print
> 
> Currently the function takes three parameters,
> 
> LIBBPF_API void libbpf_set_print(libbpf_print_fn_t warn,
>                                   libbpf_print_fn_t info,
>                                   libbpf_print_fn_t debug);
> 
> So it currently supports three level of debugging output.
> Is it possible in the future more debugging output level
> may be supported? if this is the case, maybe we could
> change the API function libbpf_set_print to something like
> the below before the library version bumps into 1.0.0?
> 
>   LIBBPF_API void libbpf_set_print(libbpf_print_fn_t dprint);
> and
>    typedef int (*libbpf_print_fn_t)(enum libbpf_debug_level level,
>                                     const char *, ...)
>          __attribute__((format(printf, 1, 2)));
>    enum libbpf_debug_level {
>      LIBBPF_WARN,
>      LIBBPF_INFO,
>      LIBBPF_DEBUG,
>    };
> 
> Basically, the user provided callback function must have
> the first parameters as the level.
> 
> Any comments? Arnaldo?

I think it is ok, as long as we can override the pr_log thing to allow
for using with TUI, stdio, GUI.

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-01-31 20:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29 15:12 [PATCH bpf-next v3 0/3] libbpf: adding AF_XDP support Magnus Karlsson
2019-01-29 15:12 ` [PATCH bpf-next v3 1/3] libbpf: move pr_*() functions to common header file Magnus Karlsson
2019-01-31 10:04   ` Daniel Borkmann
2019-01-31 10:42     ` Magnus Karlsson
2019-01-31 18:52   ` Alexei Starovoitov
2019-01-31 19:12     ` Yonghong Song
2019-01-31 20:47       ` Arnaldo CArvalho de Melo
2019-01-31 19:35   ` Y Song
2019-01-29 15:12 ` [PATCH bpf-next v3 2/3] libbpf: add support for using AF_XDP sockets Magnus Karlsson
2019-01-31 10:05   ` Daniel Borkmann
2019-01-31 10:44     ` Magnus Karlsson
2019-01-31 10:29   ` Daniel Borkmann
2019-01-31 13:55     ` Magnus Karlsson
2019-01-31 14:05       ` Maciej Fijalkowski
2019-01-31 14:23         ` Magnus Karlsson
2019-01-29 15:12 ` [PATCH bpf-next v3 3/3] samples/bpf: convert xdpsock to use libbpf for AF_XDP access Magnus Karlsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.