netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/2] libbpf: adding AF_XDP support
@ 2018-12-12 13:09 Magnus Karlsson
  2018-12-12 13:09 ` [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets Magnus Karlsson
  2018-12-12 13:09 ` [PATCH bpf-next v2 2/2] samples/bpf: convert xdpsock to use libbpf for AF_XDP access Magnus Karlsson
  0 siblings, 2 replies; 9+ messages in thread
From: Magnus Karlsson @ 2018-12-12 13:09 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jakub.kicinski, bjorn.topel, qi.z.zhang
  Cc: brouer

This patch proposes to add AF_XDP support to libbpf. The main reason for
this is to facilitate writing applications that use AF_XDP by offering
higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The proposed interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting up umems
  and AF_XDP sockets.

The sample program has been updated to use this new interface and in
that process it lost roughly 300 lines of code. I cannot detect any
performance degradations due to the use of this library instead of the
previous functions that were inlined in the sample application. But I
did measure this on a slower machine and not the Broadwell that we
normally use.

One question:

* The libbpf internal struct for the rings are exposed in the libbpf.h
  header file for performance reasons as it is important that the
  application writer has full control on where it is allocated, for
  example on the stack or in a struct together with counters that are
  frequently used. Is there some way to achieve this flexibility but
  still hide this struct from the user? This would allow more
  flexibility for future optimizations.

In a future release, I am planning on adding a higher level data plane
interface too. This will be based around recvmsg and sendmsg with the
use of struct iovec for batching, without the user having to know
anything about the underlying four rings of an AF_XDP socket. There
will be one semantic difference though from the standard recvmsg and
that is that the kernel will fill in the iovecs instead of the
application. But the rest should be the same as the libc versions so
that application writers feel at home.

Patch 1: adds AF_XDP support in libbpf
Patch 2: updates the xdpsock sample application to use the libbpf functions.

Changes v1 to v2:
  * Fixed cleanup of library state on error.
  * Moved API to initial version
  * Prefixed all public functions by xsk__ instead of xsk_
  * Added comment about changed default ring sizes, batch size and umem
    size in the sample application commit message
  * The library now only creates an Rx or Tx ring if the respective
    parameter is != NULL

I based this patch set on bpf-next commit e5c504858a18
("selftests/bpf: skip verifier sockmap tests on kernels without support")

Thanks: Magnus

Magnus Karlsson (2):
  libbpf: add support for using AF_XDP sockets
  samples/bpf: convert xdpsock to use libbpf for AF_XDP access

 samples/bpf/xdpsock_user.c        | 585 +++++++++++---------------------------
 tools/include/uapi/linux/if_xdp.h |  78 +++++
 tools/lib/bpf/Build               |   2 +-
 tools/lib/bpf/Makefile            |   5 +-
 tools/lib/bpf/README.rst          |  11 +-
 tools/lib/bpf/libbpf.h            |  93 ++++++
 tools/lib/bpf/libbpf.map          |   9 +
 tools/lib/bpf/xsk.c               | 568 ++++++++++++++++++++++++++++++++++++
 8 files changed, 928 insertions(+), 423 deletions(-)
 create mode 100644 tools/include/uapi/linux/if_xdp.h
 create mode 100644 tools/lib/bpf/xsk.c

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets
  2018-12-12 13:09 [PATCH bpf-next v2 0/2] libbpf: adding AF_XDP support Magnus Karlsson
@ 2018-12-12 13:09 ` Magnus Karlsson
  2018-12-13  6:23   ` Alexei Starovoitov
  2018-12-14 20:23   ` Alexei Starovoitov
  2018-12-12 13:09 ` [PATCH bpf-next v2 2/2] samples/bpf: convert xdpsock to use libbpf for AF_XDP access Magnus Karlsson
  1 sibling, 2 replies; 9+ messages in thread
From: Magnus Karlsson @ 2018-12-12 13:09 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jakub.kicinski, bjorn.topel, qi.z.zhang
  Cc: brouer

This commit adds AF_XDP support to libbpf. The main reason for
this is to facilitate writing applications that use AF_XDP by offering
higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting
  up umems and af_xdp sockets.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 tools/include/uapi/linux/if_xdp.h |  78 ++++++
 tools/lib/bpf/Build               |   2 +-
 tools/lib/bpf/Makefile            |   5 +-
 tools/lib/bpf/README.rst          |  11 +-
 tools/lib/bpf/libbpf.h            |  93 +++++++
 tools/lib/bpf/libbpf.map          |   9 +
 tools/lib/bpf/xsk.c               | 568 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 763 insertions(+), 3 deletions(-)
 create mode 100644 tools/include/uapi/linux/if_xdp.h
 create mode 100644 tools/lib/bpf/xsk.c

diff --git a/tools/include/uapi/linux/if_xdp.h b/tools/include/uapi/linux/if_xdp.h
new file mode 100644
index 0000000..caed8b1
--- /dev/null
+++ b/tools/include/uapi/linux/if_xdp.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * if_xdp: XDP socket user-space interface
+ * Copyright(c) 2018 Intel Corporation.
+ *
+ * Author(s): Björn Töpel <bjorn.topel@intel.com>
+ *	      Magnus Karlsson <magnus.karlsson@intel.com>
+ */
+
+#ifndef _LINUX_IF_XDP_H
+#define _LINUX_IF_XDP_H
+
+#include <linux/types.h>
+
+/* Options for the sxdp_flags field */
+#define XDP_SHARED_UMEM	(1 << 0)
+#define XDP_COPY	(1 << 1) /* Force copy-mode */
+#define XDP_ZEROCOPY	(1 << 2) /* Force zero-copy mode */
+
+struct sockaddr_xdp {
+	__u16 sxdp_family;
+	__u16 sxdp_flags;
+	__u32 sxdp_ifindex;
+	__u32 sxdp_queue_id;
+	__u32 sxdp_shared_umem_fd;
+};
+
+struct xdp_ring_offset {
+	__u64 producer;
+	__u64 consumer;
+	__u64 desc;
+};
+
+struct xdp_mmap_offsets {
+	struct xdp_ring_offset rx;
+	struct xdp_ring_offset tx;
+	struct xdp_ring_offset fr; /* Fill */
+	struct xdp_ring_offset cr; /* Completion */
+};
+
+/* XDP socket options */
+#define XDP_MMAP_OFFSETS		1
+#define XDP_RX_RING			2
+#define XDP_TX_RING			3
+#define XDP_UMEM_REG			4
+#define XDP_UMEM_FILL_RING		5
+#define XDP_UMEM_COMPLETION_RING	6
+#define XDP_STATISTICS			7
+
+struct xdp_umem_reg {
+	__u64 addr; /* Start of packet data area */
+	__u64 len; /* Length of packet data area */
+	__u32 chunk_size;
+	__u32 headroom;
+};
+
+struct xdp_statistics {
+	__u64 rx_dropped; /* Dropped for reasons other than invalid desc */
+	__u64 rx_invalid_descs; /* Dropped due to invalid descriptor */
+	__u64 tx_invalid_descs; /* Dropped due to invalid descriptor */
+};
+
+/* Pgoff for mmaping the rings */
+#define XDP_PGOFF_RX_RING			  0
+#define XDP_PGOFF_TX_RING		 0x80000000
+#define XDP_UMEM_PGOFF_FILL_RING	0x100000000ULL
+#define XDP_UMEM_PGOFF_COMPLETION_RING	0x180000000ULL
+
+/* Rx/Tx descriptor */
+struct xdp_desc {
+	__u64 addr;
+	__u32 len;
+	__u32 options;
+};
+
+/* UMEM descriptor is __u64 */
+
+#endif /* _LINUX_IF_XDP_H */
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 197b40f..91780e8 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o
+libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o netlink.o bpf_prog_linfo.o xsk.o
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 34d9c36..ddaa147 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -179,6 +179,9 @@ $(BPF_IN): force elfdep bpfdep
 	@(test -f ../../include/uapi/linux/if_link.h -a -f ../../../include/uapi/linux/if_link.h && ( \
 	(diff -B ../../include/uapi/linux/if_link.h ../../../include/uapi/linux/if_link.h >/dev/null) || \
 	echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_link.h' differs from latest version at 'include/uapi/linux/if_link.h'" >&2 )) || true
+	@(test -f ../../include/uapi/linux/if_xdp.h -a -f ../../../include/uapi/linux/if_xdp.h && ( \
+	(diff -B ../../include/uapi/linux/if_xdp.h ../../../include/uapi/linux/if_xdp.h >/dev/null) || \
+	echo "Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h'" >&2 )) || true
 	$(Q)$(MAKE) $(build)=libbpf
 
 $(OUTPUT)libbpf.so: $(BPF_IN)
@@ -189,7 +192,7 @@ $(OUTPUT)libbpf.a: $(BPF_IN)
 	$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
 
 $(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
-	$(QUIET_LINK)$(CXX) $^ -lelf -o $@
+	$(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@
 
 check: check_abi
 
diff --git a/tools/lib/bpf/README.rst b/tools/lib/bpf/README.rst
index 056f383..20241cc 100644
--- a/tools/lib/bpf/README.rst
+++ b/tools/lib/bpf/README.rst
@@ -9,7 +9,7 @@ described here. It's recommended to follow these conventions whenever a
 new function or type is added to keep libbpf API clean and consistent.
 
 All types and functions provided by libbpf API should have one of the
-following prefixes: ``bpf_``, ``btf_``, ``libbpf_``.
+following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``.
 
 System call wrappers
 --------------------
@@ -62,6 +62,15 @@ Auxiliary functions and types that don't fit well in any of categories
 described above should have ``libbpf_`` prefix, e.g.
 ``libbpf_get_error`` or ``libbpf_prog_type_by_name``.
 
+AF_XDP functions
+-------------------
+
+AF_XDP functions should have ``xsk_`` prefix, e.g.  ``xsk__get_data``
+or ``xsk__create_umem``. The interface consists of both low-level ring
+access functions and high-level configuration functions. These can be
+mixed and matched. Note that these functions are not reentrant for
+performance reasons.
+
 libbpf ABI
 ==========
 
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 5f68d7b..da99203 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -15,6 +15,7 @@
 #include <stdbool.h>
 #include <sys/types.h>  // for size_t
 #include <linux/bpf.h>
+#include <linux/if_xdp.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -355,6 +356,98 @@ LIBBPF_API const struct bpf_line_info *
 bpf_prog_linfo__lfind(const struct bpf_prog_linfo *prog_linfo,
 		      __u32 insn_off, __u32 nr_skip);
 
+/* Do not access these members directly. Use the functions below. */
+struct xsk_prod_ring {
+	__u32 cached_prod;
+	__u32 cached_cons;
+	__u32 mask;
+	__u32 size;
+	__u32 *producer;
+	__u32 *consumer;
+	void *ring;
+};
+
+/* Do not access these members directly. Use the functions below. */
+struct xsk_cons_ring {
+	__u32 cached_prod;
+	__u32 cached_cons;
+	__u32 mask;
+	__u32 size;
+	__u32 *producer;
+	__u32 *consumer;
+	void *ring;
+};
+
+static inline __u64 *xsk__get_fill_desc(struct xsk_prod_ring *fill,
+				       __u64 idx)
+{
+	__u64 *descs = (__u64 *)fill->ring;
+
+	return &descs[idx & fill->mask];
+}
+
+static inline __u64 *xsk__get_completion_desc(struct xsk_cons_ring *comp,
+					     __u64 idx)
+{
+	__u64 *descs = (__u64 *)comp->ring;
+
+	return &descs[idx & comp->mask];
+}
+
+static inline struct xdp_desc *xsk__get_tx_desc(struct xsk_prod_ring *tx,
+					       __u64 idx)
+{
+	struct xdp_desc *descs = (struct xdp_desc *)tx->ring;
+
+	return &descs[idx & tx->mask];
+}
+
+static inline struct xdp_desc *xsk__get_rx_desc(struct xsk_cons_ring *rx,
+					       __u64 idx)
+{
+	struct xdp_desc *descs = (struct xdp_desc *)rx->ring;
+
+	return &descs[idx & rx->mask];
+}
+
+LIBBPF_API size_t xsk__peek_cons(struct xsk_cons_ring *ring, size_t nb,
+				__u32 *idx);
+LIBBPF_API void xsk__release_cons(struct xsk_cons_ring *ring);
+LIBBPF_API size_t xsk__reserve_prod(struct xsk_prod_ring *ring, size_t nb,
+				   __u32 *idx);
+LIBBPF_API void xsk__submit_prod(struct xsk_prod_ring *ring);
+
+LIBBPF_API void *xsk__get_data(void *umem_area, __u64 addr);
+
+#define XSK__DEFAULT_NUM_DESCS      2048
+#define XSK__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
+#define XSK__DEFAULT_FRAME_SIZE     (1 << XSK__DEFAULT_FRAME_SHIFT)
+#define XSK__DEFAULT_FRAME_HEADROOM 0
+
+struct xsk_umem_config {
+	__u32 fq_size;
+	__u32 cq_size;
+	__u32 frame_size;
+	__u32 frame_headroom;
+};
+
+struct xsk_xdp_socket_config {
+	__u32 rx_size;
+	__u32 tx_size;
+};
+
+/* Set config to XSK_DEFAULT_CONFIG to get the default configuration. */
+LIBBPF_API int xsk__create_umem(void *umem_area, __u64 size,
+			       struct xsk_prod_ring *fq,
+			       struct xsk_cons_ring *cq,
+			       struct xsk_umem_config *config);
+LIBBPF_API int xsk__create_xdp_socket(int umem_fd, struct xsk_cons_ring *rx,
+				     struct xsk_prod_ring *tx,
+				     struct xsk_xdp_socket_config *config);
+/* Returns 0 for success. */
+LIBBPF_API int xsk__delete_umem(int fd);
+LIBBPF_API int xsk__delete_xdp_socket(int fd);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index cd02cd4..ae4cc0d 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -121,6 +121,15 @@ LIBBPF_0.0.1 {
 		libbpf_prog_type_by_name;
 		libbpf_set_print;
 		libbpf_strerror;
+		xsk__peek_cons;
+		xsk__release_cons;
+		xsk__reserve_prod;
+		xsk__submit_prod;
+		xsk__get_data;
+		xsk__create_umem;
+		xsk__create_xdp_socket;
+		xsk__delete_umem;
+		xsk__delete_xdp_socket;
 	local:
 		*;
 };
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
new file mode 100644
index 0000000..60cd896
--- /dev/null
+++ b/tools/lib/bpf/xsk.c
@@ -0,0 +1,568 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+
+/*
+ * AF_XDP user-space access library.
+ *
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <asm/barrier.h>
+#include <linux/compiler.h>
+#include <linux/if_xdp.h>
+#include <linux/list.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/types.h>
+
+#include "libbpf.h"
+
+#ifndef SOL_XDP
+ #define SOL_XDP 283
+#endif
+
+#ifndef AF_XDP
+ #define AF_XDP 44
+#endif
+
+#ifndef PF_XDP
+ #define PF_XDP AF_XDP
+#endif
+
+/* This has to be a power of 2 for performance reasons. */
+#define HASH_TABLE_ENTRIES 128
+
+struct xsk_umem_info {
+	struct xsk_prod_ring *fq;
+	struct xsk_cons_ring *cq;
+	char *umem_area;
+	struct list_head list;
+	struct xsk_umem_config config;
+	int fd;
+	int refcount;
+};
+
+struct xsk_xdp_socket_info {
+	struct xsk_cons_ring *rx;
+	struct xsk_prod_ring *tx;
+	__u64 outstanding_tx;
+	struct list_head list;
+	struct xsk_umem_info *umem;
+	struct xsk_xdp_socket_config config;
+	int fd;
+};
+
+static struct xsk_xdp_socket_info *xsk_hash_table[HASH_TABLE_ENTRIES];
+static struct xsk_umem_info *umem_hash_table[HASH_TABLE_ENTRIES];
+
+/* For 32-bit systems, we need to use mmap2 as the offsets are 64-bit.
+ * Unfortunately, it is not part of glibc.
+ */
+static inline void *xsk_mmap(void *addr, size_t length, int prot, int flags,
+			     int fd, __u64 offset)
+{
+#ifdef __NR_mmap2
+	unsigned int page_shift = __builtin_ffs(getpagesize()) - 1;
+	long ret = syscall(__NR_mmap2, addr, length, prot, flags, fd,
+			   (off_t)(offset >> page_shift));
+
+	return (void *)ret;
+#else
+	return mmap(addr, length, prot, flags, fd, offset);
+#endif
+}
+
+static __u32 xsk_prod_nb_free(struct xsk_prod_ring *r, __u32 nb)
+{
+	__u32 free_entries = r->cached_cons - r->cached_prod;
+
+	if (free_entries >= nb)
+		return free_entries;
+
+	/* Refresh the local tail pointer.
+	 * cached_cons is r->size bigger than the real consumer pointer so
+	 * that this addition can be avoided in the more frequently
+	 * executed code that computs free_entries in the beginning of
+	 * this function. Without this optimization it whould have been
+	 * free_entries = r->cached_prod - r->cached_cons + r->size.
+	 */
+	r->cached_cons = *r->consumer + r->size;
+
+	return r->cached_cons - r->cached_prod;
+}
+
+static __u32 xsk_cons_nb_avail(struct xsk_cons_ring *r, __u32 nb)
+{
+	__u32 entries = r->cached_prod - r->cached_cons;
+
+	if (entries == 0) {
+		r->cached_prod = *r->producer;
+		entries = r->cached_prod - r->cached_cons;
+	}
+
+	return (entries > nb) ? nb : entries;
+}
+
+size_t xsk__reserve_prod(struct xsk_prod_ring *prod, size_t nb, __u32 *idx)
+{
+	if (unlikely(xsk_prod_nb_free(prod, nb) < nb))
+		return 0;
+
+	*idx = prod->cached_prod;
+	prod->cached_prod += nb;
+
+	return nb;
+}
+
+void xsk__submit_prod(struct xsk_prod_ring *prod)
+{
+	/* Make sure everything has been written to the ring before signalling
+	 * this to the kernel.
+	 */
+	smp_wmb();
+
+	*prod->producer = prod->cached_prod;
+}
+
+size_t xsk__peek_cons(struct xsk_cons_ring *cons, size_t nb, __u32 *idx)
+{
+	size_t entries = xsk_cons_nb_avail(cons, nb);
+
+	if (likely(entries > 0)) {
+		/* Make sure we do not speculatively read the data before
+		 * we have received the packet buffers from the ring.
+		 */
+		smp_rmb();
+
+		*idx = cons->cached_cons;
+		cons->cached_cons += entries;
+	}
+
+	return entries;
+}
+
+void xsk__release_cons(struct xsk_cons_ring *cons)
+{
+	*cons->consumer = cons->cached_cons;
+}
+
+void *xsk__get_data(void *umem_area, __u64 addr)
+{
+	return &((char *)umem_area)[addr];
+}
+
+static bool xsk_page_aligned(void *buffer)
+{
+	unsigned long addr = (unsigned long)buffer;
+
+	return !(addr & (getpagesize() - 1));
+}
+
+/* Since the file descriptors are generally allocated sequentially, and also
+ * for performance reasons, we pick the simplest possible hash function:
+ * just a single "and" operation (from the modulo operator).
+ */
+static void xsk_hash_insert_umem(int fd, struct xsk_umem_info *umem)
+{
+	struct xsk_umem_info *umem_in_hash =
+		umem_hash_table[fd % HASH_TABLE_ENTRIES];
+
+	if (umem_in_hash) {
+		list_add_tail(&umem->list, &umem_in_hash->list);
+		return;
+	}
+
+	INIT_LIST_HEAD(&umem->list);
+	umem_hash_table[fd % HASH_TABLE_ENTRIES] = umem;
+}
+
+static struct xsk_umem_info *xsk_hash_find_umem(int fd)
+{
+	struct xsk_umem_info *umem = umem_hash_table[fd % HASH_TABLE_ENTRIES];
+
+	while (umem && umem->fd != fd)
+		umem = list_next_entry(umem, list);
+
+	return umem;
+}
+
+static void xsk_hash_remove_umem(int fd)
+{
+	struct xsk_umem_info *umem = umem_hash_table[fd % HASH_TABLE_ENTRIES];
+
+	while (umem && umem->fd != fd)
+		umem = list_next_entry(umem, list);
+
+	if (umem) {
+		if (list_empty(&umem->list)) {
+			umem_hash_table[fd % HASH_TABLE_ENTRIES] = NULL;
+			return;
+		}
+
+		if (umem == umem_hash_table[fd % HASH_TABLE_ENTRIES])
+			umem_hash_table[fd % HASH_TABLE_ENTRIES] =
+				list_next_entry(umem, list);
+		list_del(&umem->list);
+	}
+}
+
+static void xsk_hash_insert_xdp_socket(int fd, struct xsk_xdp_socket_info *xsk)
+{
+	struct xsk_xdp_socket_info *xsk_in_hash =
+		xsk_hash_table[fd % HASH_TABLE_ENTRIES];
+
+	if (xsk_in_hash) {
+		list_add_tail(&xsk->list, &xsk_in_hash->list);
+		return;
+	}
+
+	INIT_LIST_HEAD(&xsk->list);
+	xsk_hash_table[fd % HASH_TABLE_ENTRIES] = xsk;
+}
+
+static struct xsk_xdp_socket_info *xsk_hash_find_xdp_socket(int fd)
+{
+	struct xsk_xdp_socket_info *xsk =
+		xsk_hash_table[fd % HASH_TABLE_ENTRIES];
+
+	while (xsk && xsk->fd != fd)
+		xsk = list_next_entry(xsk, list);
+
+	return xsk;
+}
+
+static void xsk_hash_remove_xdp_socket(int fd)
+{
+	struct xsk_xdp_socket_info *xsk =
+		xsk_hash_table[fd % HASH_TABLE_ENTRIES];
+
+	while (xsk && xsk->fd != fd)
+		xsk = list_next_entry(xsk, list);
+
+	if (xsk) {
+		if (list_empty(&xsk->list)) {
+			xsk_hash_table[fd % HASH_TABLE_ENTRIES] = NULL;
+			return;
+		}
+
+		if (xsk == xsk_hash_table[fd % HASH_TABLE_ENTRIES])
+			xsk_hash_table[fd % HASH_TABLE_ENTRIES] =
+				list_next_entry(xsk, list);
+		list_del(&xsk->list);
+	}
+}
+
+static void xsk_set_umem_config(struct xsk_umem_config *config,
+				struct xsk_umem_config *usr_config)
+{
+	if (!usr_config) {
+		config->fq_size = XSK__DEFAULT_NUM_DESCS;
+		config->cq_size = XSK__DEFAULT_NUM_DESCS;
+		config->frame_size = XSK__DEFAULT_FRAME_SIZE;
+		config->frame_headroom = XSK__DEFAULT_FRAME_HEADROOM;
+		return;
+	}
+
+	config->fq_size = usr_config->fq_size;
+	config->cq_size = usr_config->cq_size;
+	config->frame_size = usr_config->frame_size;
+	config->frame_headroom = usr_config->frame_headroom;
+}
+
+static void xsk_set_xdp_socket_config(struct xsk_xdp_socket_config *config,
+				      struct xsk_xdp_socket_config *usr_config)
+{
+	if (!usr_config) {
+		config->rx_size = XSK__DEFAULT_NUM_DESCS;
+		config->tx_size = XSK__DEFAULT_NUM_DESCS;
+		return;
+	}
+
+	config->rx_size = usr_config->rx_size;
+	config->tx_size = usr_config->tx_size;
+}
+
+int xsk__create_umem(void *umem_area, __u64 size, struct xsk_prod_ring *fq,
+		     struct xsk_cons_ring *cq,
+		     struct xsk_umem_config *usr_config)
+{
+	struct xdp_mmap_offsets off;
+	struct xsk_umem_info *umem;
+	struct xdp_umem_reg mr;
+	socklen_t optlen;
+	int err, fd;
+	void *map;
+
+	if (!umem_area)
+		return -EFAULT;
+	if (!size && !xsk_page_aligned(umem_area))
+		return -EINVAL;
+
+	fd = socket(AF_XDP, SOCK_RAW, 0);
+	if (fd < 0)
+		return -errno;
+
+	umem = calloc(1, sizeof(*umem));
+	if (!umem) {
+		err = -ENOMEM;
+		goto out_socket;
+	}
+
+	umem->umem_area = umem_area;
+	umem->fd = fd;
+	xsk_set_umem_config(&umem->config, usr_config);
+
+	mr.addr = (uintptr_t)umem_area;
+	mr.len = size;
+	mr.chunk_size = umem->config.frame_size;
+	mr.headroom = umem->config.frame_headroom;
+
+	err = setsockopt(fd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr));
+	if (err) {
+		err = -errno;
+		goto out_umem_alloc;
+	}
+	err = setsockopt(fd, SOL_XDP, XDP_UMEM_FILL_RING,
+			 &umem->config.fq_size, sizeof(umem->config.fq_size));
+	if (err) {
+		err = -errno;
+		goto out_umem_alloc;
+	}
+	err = setsockopt(fd, SOL_XDP, XDP_UMEM_COMPLETION_RING,
+			 &umem->config.cq_size, sizeof(umem->config.cq_size));
+	if (err) {
+		err = -errno;
+		goto out_umem_alloc;
+	}
+
+	optlen = sizeof(off);
+	err = getsockopt(fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+	if (err) {
+		err = -errno;
+		goto out_umem_alloc;
+	}
+
+	map = xsk_mmap(NULL, off.fr.desc + umem->config.fq_size * sizeof(__u64),
+		       PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+		       fd, XDP_UMEM_PGOFF_FILL_RING);
+	if (map == MAP_FAILED) {
+		err = -errno;
+		goto out_umem_alloc;
+	}
+
+	umem->fq = fq;
+	fq->mask = umem->config.fq_size - 1;
+	fq->size = umem->config.fq_size;
+	fq->producer = map + off.fr.producer;
+	fq->consumer = map + off.fr.consumer;
+	fq->ring = map + off.fr.desc;
+	fq->cached_cons = umem->config.fq_size;
+
+	map = xsk_mmap(NULL, off.cr.desc + umem->config.cq_size * sizeof(__u64),
+		    PROT_READ | PROT_WRITE, MAP_SHARED | MAP_POPULATE,
+		    fd, XDP_UMEM_PGOFF_COMPLETION_RING);
+	if (map == MAP_FAILED) {
+		err = -errno;
+		goto out_mmap;
+	}
+
+	umem->cq = cq;
+	cq->mask = umem->config.cq_size - 1;
+	cq->size = umem->config.cq_size;
+	cq->producer = map + off.cr.producer;
+	cq->consumer = map + off.cr.consumer;
+	cq->ring = map + off.cr.desc;
+
+	xsk_hash_insert_umem(fd, umem);
+	return fd;
+
+out_mmap:
+	munmap(umem->fq, off.fr.desc + umem->config.fq_size * sizeof(__u64));
+out_umem_alloc:
+	free(umem);
+out_socket:
+	close(fd);
+	return err;
+}
+
+int xsk__create_xdp_socket(int umem_fd, struct xsk_cons_ring *rx,
+			   struct xsk_prod_ring *tx,
+			   struct xsk_xdp_socket_config *usr_config)
+{
+	struct xsk_xdp_socket_info *xsk;
+	struct xdp_mmap_offsets off;
+	struct xsk_umem_info *umem;
+	socklen_t optlen;
+	int err, fd;
+	void *map;
+
+	umem = xsk_hash_find_umem(umem_fd);
+	if (!umem)
+		return -EBADF;
+
+	if (umem->refcount++ == 0) {
+		fd = umem_fd;
+	} else {
+		fd = socket(AF_XDP, SOCK_RAW, 0);
+		if (fd < 0)
+			return -errno;
+	}
+
+	xsk = calloc(1, sizeof(*xsk));
+	if (!xsk) {
+		err = -ENOMEM;
+		goto out_socket;
+	}
+
+	xsk->fd = fd;
+	xsk->outstanding_tx = 0;
+	xsk_set_xdp_socket_config(&xsk->config, usr_config);
+
+	if (rx) {
+		err = setsockopt(fd, SOL_XDP, XDP_RX_RING,
+				 &xsk->config.rx_size,
+				 sizeof(xsk->config.rx_size));
+		if (err) {
+			err = -errno;
+			goto out_xsk_alloc;
+		}
+	}
+	if (tx) {
+		err = setsockopt(fd, SOL_XDP, XDP_TX_RING,
+				 &xsk->config.tx_size,
+				 sizeof(xsk->config.tx_size));
+		if (err) {
+			err = -errno;
+			goto out_xsk_alloc;
+		}
+	}
+
+	optlen = sizeof(off);
+	err = getsockopt(fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+	if (err) {
+		err = -errno;
+		goto out_xsk_alloc;
+	}
+
+	if (rx) {
+		map = xsk_mmap(NULL, off.rx.desc +
+			       xsk->config.rx_size * sizeof(struct xdp_desc),
+			       PROT_READ | PROT_WRITE,
+			       MAP_SHARED | MAP_POPULATE,
+			       fd, XDP_PGOFF_RX_RING);
+		if (map == MAP_FAILED) {
+			err = -errno;
+			goto out_xsk_alloc;
+		}
+
+		rx->mask = xsk->config.rx_size - 1;
+		rx->size = xsk->config.rx_size;
+		rx->producer = map + off.rx.producer;
+		rx->consumer = map + off.rx.consumer;
+		rx->ring = map + off.rx.desc;
+	}
+	xsk->rx = rx;
+
+	if (tx) {
+		map = xsk_mmap(NULL, off.tx.desc +
+			       xsk->config.tx_size * sizeof(struct xdp_desc),
+			       PROT_READ | PROT_WRITE,
+			       MAP_SHARED | MAP_POPULATE,
+			       fd, XDP_PGOFF_TX_RING);
+		if (map == MAP_FAILED) {
+			err = -errno;
+			goto out_mmap;
+		}
+
+		tx->mask = xsk->config.tx_size - 1;
+		tx->size = xsk->config.tx_size;
+		tx->producer = map + off.tx.producer;
+		tx->consumer = map + off.tx.consumer;
+		tx->ring = map + off.tx.desc;
+		tx->cached_cons = xsk->config.tx_size;
+	}
+	xsk->tx = tx;
+
+	xsk_hash_insert_xdp_socket(fd, xsk);
+	return fd;
+
+out_mmap:
+	if (rx)
+		munmap(xsk->rx,
+		       off.rx.desc +
+		       xsk->config.rx_size * sizeof(struct xdp_desc));
+out_xsk_alloc:
+	free(xsk);
+out_socket:
+	if (--umem->refcount)
+		close(fd);
+	return err;
+}
+
+int xsk__delete_umem(int fd)
+{
+	struct xdp_mmap_offsets off;
+	struct xsk_umem_info *umem;
+	socklen_t optlen;
+	int err;
+
+	umem = xsk_hash_find_umem(fd);
+	if (!umem)
+		return -EBADF;
+
+	if (umem->refcount)
+		return -EBUSY;
+
+	optlen = sizeof(off);
+	err = getsockopt(fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+	if (!err) {
+		munmap(umem->fq->ring,
+		       off.fr.desc + umem->config.fq_size * sizeof(__u64));
+		munmap(umem->cq->ring,
+		       off.cr.desc + umem->config.cq_size * sizeof(__u64));
+	}
+
+	xsk_hash_remove_umem(fd);
+	close(fd);
+	free(umem);
+
+	return 0;
+}
+
+int xsk__delete_xdp_socket(int fd)
+{
+	struct xsk_xdp_socket_info *xsk;
+	struct xdp_mmap_offsets off;
+	socklen_t optlen;
+	int err;
+
+	xsk = xsk_hash_find_xdp_socket(fd);
+	if (!xsk)
+		return -EBADF;
+
+	optlen = sizeof(off);
+	err = getsockopt(fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
+	if (!err) {
+		if (xsk->rx)
+			munmap(xsk->rx->ring,
+			       off.rx.desc +
+			       xsk->config.rx_size * sizeof(struct xdp_desc));
+		if (xsk->tx)
+			munmap(xsk->tx->ring,
+			       off.tx.desc +
+			       xsk->config.tx_size * sizeof(struct xdp_desc));
+	}
+
+	xsk->umem->refcount--;
+	xsk_hash_remove_xdp_socket(fd);
+	/* Do not close the fd that also has an associated umem connected
+	 * to it.
+	 */
+	if (xsk->fd != xsk->umem->fd)
+		close(fd);
+	free(xsk);
+
+	return 0;
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf-next v2 2/2] samples/bpf: convert xdpsock to use libbpf for AF_XDP access
  2018-12-12 13:09 [PATCH bpf-next v2 0/2] libbpf: adding AF_XDP support Magnus Karlsson
  2018-12-12 13:09 ` [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets Magnus Karlsson
@ 2018-12-12 13:09 ` Magnus Karlsson
  1 sibling, 0 replies; 9+ messages in thread
From: Magnus Karlsson @ 2018-12-12 13:09 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev,
	jakub.kicinski, bjorn.topel, qi.z.zhang
  Cc: brouer

This commit converts the xdpsock sample application to use the AF_XDP
functions present in libbpf. This cuts down the size of it by nearly
300 lines of code.

The default ring sizes plus the batch size has been increased and the
size of the umem area has decreased. This so that the sample application
will provide higher throughput.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 samples/bpf/xdpsock_user.c | 585 +++++++++++++--------------------------------
 1 file changed, 165 insertions(+), 420 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 57ecadc..55dcc67 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -44,15 +44,8 @@
 #define PF_XDP AF_XDP
 #endif
 
-#define NUM_FRAMES 131072
-#define FRAME_HEADROOM 0
-#define FRAME_SHIFT 11
-#define FRAME_SIZE 2048
-#define NUM_DESCS 1024
-#define BATCH_SIZE 16
-
-#define FQ_NUM_DESCS 1024
-#define CQ_NUM_DESCS 1024
+#define NUM_FRAMES (4 * 1024)
+#define BATCH_SIZE 64
 
 #define DEBUG_HEXDUMP 0
 
@@ -77,49 +70,42 @@ static int opt_shared_packet_buffer;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags;
 
-struct xdp_umem_uqueue {
-	u32 cached_prod;
-	u32 cached_cons;
-	u32 mask;
-	u32 size;
-	u32 *producer;
-	u32 *consumer;
-	u64 *ring;
-	void *map;
-};
-
 struct xdp_umem {
-	char *frames;
-	struct xdp_umem_uqueue fq;
-	struct xdp_umem_uqueue cq;
+	struct xsk_prod_ring fq;
+	struct xsk_cons_ring cq;
+	char *umem_area;
 	int fd;
 };
 
-struct xdp_uqueue {
-	u32 cached_prod;
-	u32 cached_cons;
-	u32 mask;
-	u32 size;
-	u32 *producer;
-	u32 *consumer;
-	struct xdp_desc *ring;
-	void *map;
-};
-
-struct xdpsock {
-	struct xdp_uqueue rx;
-	struct xdp_uqueue tx;
-	int sfd;
+struct xsk_socket {
+	struct xsk_cons_ring rx;
+	struct xsk_prod_ring tx;
 	struct xdp_umem *umem;
 	u32 outstanding_tx;
 	unsigned long rx_npkts;
 	unsigned long tx_npkts;
 	unsigned long prev_rx_npkts;
 	unsigned long prev_tx_npkts;
+	int fd;
 };
 
 static int num_socks;
-struct xdpsock *xsks[MAX_SOCKS];
+struct xsk_socket *xsks[MAX_SOCKS];
+
+static void dump_stats(void);
+
+static void __exit_with_error(int error, const char *file, const char *func,
+			      int line)
+{
+	fprintf(stderr, "%s:%s:%i: errno: %d/\"%s\"\n", file, func,
+		line, error, strerror(error));
+	dump_stats();
+	bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
+	exit(EXIT_FAILURE);
+}
+
+#define exit_with_error(error) __exit_with_error(error, __FILE__, __func__, \
+						 __LINE__)
 
 static unsigned long get_nsecs(void)
 {
@@ -129,226 +115,12 @@ static unsigned long get_nsecs(void)
 	return ts.tv_sec * 1000000000UL + ts.tv_nsec;
 }
 
-static void dump_stats(void);
-
-#define lassert(expr)							\
-	do {								\
-		if (!(expr)) {						\
-			fprintf(stderr, "%s:%s:%i: Assertion failed: "	\
-				#expr ": errno: %d/\"%s\"\n",		\
-				__FILE__, __func__, __LINE__,		\
-				errno, strerror(errno));		\
-			dump_stats();					\
-			exit(EXIT_FAILURE);				\
-		}							\
-	} while (0)
-
-#define barrier() __asm__ __volatile__("": : :"memory")
-#ifdef __aarch64__
-#define u_smp_rmb() __asm__ __volatile__("dmb ishld": : :"memory")
-#define u_smp_wmb() __asm__ __volatile__("dmb ishst": : :"memory")
-#else
-#define u_smp_rmb() barrier()
-#define u_smp_wmb() barrier()
-#endif
-#define likely(x) __builtin_expect(!!(x), 1)
-#define unlikely(x) __builtin_expect(!!(x), 0)
-
 static const char pkt_data[] =
 	"\x3c\xfd\xfe\x9e\x7f\x71\xec\xb1\xd7\x98\x3a\xc0\x08\x00\x45\x00"
 	"\x00\x2e\x00\x00\x00\x00\x40\x11\x88\x97\x05\x08\x07\x08\xc8\x14"
-	"\x1e\x04\x10\x92\x10\x92\x00\x1a\x6d\xa3\x34\x33\x1f\x69\x40\x6b"
+"\x1e\x04\x10\x92\x10\x92\x00\x1a\x6d\xa3\x34\x33\x1f\x69\x40\x6b"
 	"\x54\x59\xb6\x14\x2d\x11\x44\xbf\xaf\xd9\xbe\xaa";
 
-static inline u32 umem_nb_free(struct xdp_umem_uqueue *q, u32 nb)
-{
-	u32 free_entries = q->cached_cons - q->cached_prod;
-
-	if (free_entries >= nb)
-		return free_entries;
-
-	/* Refresh the local tail pointer */
-	q->cached_cons = *q->consumer + q->size;
-
-	return q->cached_cons - q->cached_prod;
-}
-
-static inline u32 xq_nb_free(struct xdp_uqueue *q, u32 ndescs)
-{
-	u32 free_entries = q->cached_cons - q->cached_prod;
-
-	if (free_entries >= ndescs)
-		return free_entries;
-
-	/* Refresh the local tail pointer */
-	q->cached_cons = *q->consumer + q->size;
-	return q->cached_cons - q->cached_prod;
-}
-
-static inline u32 umem_nb_avail(struct xdp_umem_uqueue *q, u32 nb)
-{
-	u32 entries = q->cached_prod - q->cached_cons;
-
-	if (entries == 0) {
-		q->cached_prod = *q->producer;
-		entries = q->cached_prod - q->cached_cons;
-	}
-
-	return (entries > nb) ? nb : entries;
-}
-
-static inline u32 xq_nb_avail(struct xdp_uqueue *q, u32 ndescs)
-{
-	u32 entries = q->cached_prod - q->cached_cons;
-
-	if (entries == 0) {
-		q->cached_prod = *q->producer;
-		entries = q->cached_prod - q->cached_cons;
-	}
-
-	return (entries > ndescs) ? ndescs : entries;
-}
-
-static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq,
-					 struct xdp_desc *d,
-					 size_t nb)
-{
-	u32 i;
-
-	if (umem_nb_free(fq, nb) < nb)
-		return -ENOSPC;
-
-	for (i = 0; i < nb; i++) {
-		u32 idx = fq->cached_prod++ & fq->mask;
-
-		fq->ring[idx] = d[i].addr;
-	}
-
-	u_smp_wmb();
-
-	*fq->producer = fq->cached_prod;
-
-	return 0;
-}
-
-static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, u64 *d,
-				      size_t nb)
-{
-	u32 i;
-
-	if (umem_nb_free(fq, nb) < nb)
-		return -ENOSPC;
-
-	for (i = 0; i < nb; i++) {
-		u32 idx = fq->cached_prod++ & fq->mask;
-
-		fq->ring[idx] = d[i];
-	}
-
-	u_smp_wmb();
-
-	*fq->producer = fq->cached_prod;
-
-	return 0;
-}
-
-static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq,
-					       u64 *d, size_t nb)
-{
-	u32 idx, i, entries = umem_nb_avail(cq, nb);
-
-	u_smp_rmb();
-
-	for (i = 0; i < entries; i++) {
-		idx = cq->cached_cons++ & cq->mask;
-		d[i] = cq->ring[idx];
-	}
-
-	if (entries > 0) {
-		u_smp_wmb();
-
-		*cq->consumer = cq->cached_cons;
-	}
-
-	return entries;
-}
-
-static inline void *xq_get_data(struct xdpsock *xsk, u64 addr)
-{
-	return &xsk->umem->frames[addr];
-}
-
-static inline int xq_enq(struct xdp_uqueue *uq,
-			 const struct xdp_desc *descs,
-			 unsigned int ndescs)
-{
-	struct xdp_desc *r = uq->ring;
-	unsigned int i;
-
-	if (xq_nb_free(uq, ndescs) < ndescs)
-		return -ENOSPC;
-
-	for (i = 0; i < ndescs; i++) {
-		u32 idx = uq->cached_prod++ & uq->mask;
-
-		r[idx].addr = descs[i].addr;
-		r[idx].len = descs[i].len;
-	}
-
-	u_smp_wmb();
-
-	*uq->producer = uq->cached_prod;
-	return 0;
-}
-
-static inline int xq_enq_tx_only(struct xdp_uqueue *uq,
-				 unsigned int id, unsigned int ndescs)
-{
-	struct xdp_desc *r = uq->ring;
-	unsigned int i;
-
-	if (xq_nb_free(uq, ndescs) < ndescs)
-		return -ENOSPC;
-
-	for (i = 0; i < ndescs; i++) {
-		u32 idx = uq->cached_prod++ & uq->mask;
-
-		r[idx].addr	= (id + i) << FRAME_SHIFT;
-		r[idx].len	= sizeof(pkt_data) - 1;
-	}
-
-	u_smp_wmb();
-
-	*uq->producer = uq->cached_prod;
-	return 0;
-}
-
-static inline int xq_deq(struct xdp_uqueue *uq,
-			 struct xdp_desc *descs,
-			 int ndescs)
-{
-	struct xdp_desc *r = uq->ring;
-	unsigned int idx;
-	int i, entries;
-
-	entries = xq_nb_avail(uq, ndescs);
-
-	u_smp_rmb();
-
-	for (i = 0; i < entries; i++) {
-		idx = uq->cached_cons++ & uq->mask;
-		descs[i] = r[idx];
-	}
-
-	if (entries > 0) {
-		u_smp_wmb();
-
-		*uq->consumer = uq->cached_cons;
-	}
-
-	return entries;
-}
-
 static void swap_mac_addresses(void *data)
 {
 	struct ether_header *eth = (struct ether_header *)data;
@@ -402,146 +174,38 @@ static size_t gen_eth_frame(char *frame)
 	return sizeof(pkt_data) - 1;
 }
 
-static struct xdp_umem *xdp_umem_configure(int sfd)
+static struct xdp_umem *xsk_configure_umem(void *buffer, u64 size)
 {
-	int fq_size = FQ_NUM_DESCS, cq_size = CQ_NUM_DESCS;
-	struct xdp_mmap_offsets off;
-	struct xdp_umem_reg mr;
 	struct xdp_umem *umem;
-	socklen_t optlen;
-	void *bufs;
 
 	umem = calloc(1, sizeof(*umem));
-	lassert(umem);
-
-	lassert(posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
-			       NUM_FRAMES * FRAME_SIZE) == 0);
-
-	mr.addr = (__u64)bufs;
-	mr.len = NUM_FRAMES * FRAME_SIZE;
-	mr.chunk_size = FRAME_SIZE;
-	mr.headroom = FRAME_HEADROOM;
-
-	lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0);
-	lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size,
-			   sizeof(int)) == 0);
-	lassert(setsockopt(sfd, SOL_XDP, XDP_UMEM_COMPLETION_RING, &cq_size,
-			   sizeof(int)) == 0);
-
-	optlen = sizeof(off);
-	lassert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off,
-			   &optlen) == 0);
-
-	umem->fq.map = mmap(0, off.fr.desc +
-			    FQ_NUM_DESCS * sizeof(u64),
-			    PROT_READ | PROT_WRITE,
-			    MAP_SHARED | MAP_POPULATE, sfd,
-			    XDP_UMEM_PGOFF_FILL_RING);
-	lassert(umem->fq.map != MAP_FAILED);
-
-	umem->fq.mask = FQ_NUM_DESCS - 1;
-	umem->fq.size = FQ_NUM_DESCS;
-	umem->fq.producer = umem->fq.map + off.fr.producer;
-	umem->fq.consumer = umem->fq.map + off.fr.consumer;
-	umem->fq.ring = umem->fq.map + off.fr.desc;
-	umem->fq.cached_cons = FQ_NUM_DESCS;
-
-	umem->cq.map = mmap(0, off.cr.desc +
-			     CQ_NUM_DESCS * sizeof(u64),
-			     PROT_READ | PROT_WRITE,
-			     MAP_SHARED | MAP_POPULATE, sfd,
-			     XDP_UMEM_PGOFF_COMPLETION_RING);
-	lassert(umem->cq.map != MAP_FAILED);
-
-	umem->cq.mask = CQ_NUM_DESCS - 1;
-	umem->cq.size = CQ_NUM_DESCS;
-	umem->cq.producer = umem->cq.map + off.cr.producer;
-	umem->cq.consumer = umem->cq.map + off.cr.consumer;
-	umem->cq.ring = umem->cq.map + off.cr.desc;
-
-	umem->frames = bufs;
-	umem->fd = sfd;
+	if (!umem)
+		exit_with_error(errno);
 
-	if (opt_bench == BENCH_TXONLY) {
-		int i;
+	umem->fd = xsk__create_umem(buffer, size, &umem->fq, &umem->cq, NULL);
+	if (umem->fd < 0)
+		exit_with_error(-umem->fd);
 
-		for (i = 0; i < NUM_FRAMES * FRAME_SIZE; i += FRAME_SIZE)
-			(void)gen_eth_frame(&umem->frames[i]);
-	}
+	umem->umem_area = buffer;
 
 	return umem;
 }
 
-static struct xdpsock *xsk_configure(struct xdp_umem *umem)
+static struct xsk_socket *xsk_configure_socket(struct xdp_umem *umem,
+					       bool shared)
 {
 	struct sockaddr_xdp sxdp = {};
-	struct xdp_mmap_offsets off;
-	int sfd, ndescs = NUM_DESCS;
-	struct xdpsock *xsk;
-	bool shared = true;
-	socklen_t optlen;
-	u64 i;
-
-	sfd = socket(PF_XDP, SOCK_RAW, 0);
-	lassert(sfd >= 0);
+	struct xsk_socket *xsk;
+	int ret;
 
 	xsk = calloc(1, sizeof(*xsk));
-	lassert(xsk);
-
-	xsk->sfd = sfd;
-	xsk->outstanding_tx = 0;
-
-	if (!umem) {
-		shared = false;
-		xsk->umem = xdp_umem_configure(sfd);
-	} else {
-		xsk->umem = umem;
-	}
-
-	lassert(setsockopt(sfd, SOL_XDP, XDP_RX_RING,
-			   &ndescs, sizeof(int)) == 0);
-	lassert(setsockopt(sfd, SOL_XDP, XDP_TX_RING,
-			   &ndescs, sizeof(int)) == 0);
-	optlen = sizeof(off);
-	lassert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off,
-			   &optlen) == 0);
-
-	/* Rx */
-	xsk->rx.map = mmap(NULL,
-			   off.rx.desc +
-			   NUM_DESCS * sizeof(struct xdp_desc),
-			   PROT_READ | PROT_WRITE,
-			   MAP_SHARED | MAP_POPULATE, sfd,
-			   XDP_PGOFF_RX_RING);
-	lassert(xsk->rx.map != MAP_FAILED);
-
-	if (!shared) {
-		for (i = 0; i < NUM_DESCS * FRAME_SIZE; i += FRAME_SIZE)
-			lassert(umem_fill_to_kernel(&xsk->umem->fq, &i, 1)
-				== 0);
-	}
+	if (!xsk)
+		exit_with_error(errno);
 
-	/* Tx */
-	xsk->tx.map = mmap(NULL,
-			   off.tx.desc +
-			   NUM_DESCS * sizeof(struct xdp_desc),
-			   PROT_READ | PROT_WRITE,
-			   MAP_SHARED | MAP_POPULATE, sfd,
-			   XDP_PGOFF_TX_RING);
-	lassert(xsk->tx.map != MAP_FAILED);
-
-	xsk->rx.mask = NUM_DESCS - 1;
-	xsk->rx.size = NUM_DESCS;
-	xsk->rx.producer = xsk->rx.map + off.rx.producer;
-	xsk->rx.consumer = xsk->rx.map + off.rx.consumer;
-	xsk->rx.ring = xsk->rx.map + off.rx.desc;
-
-	xsk->tx.mask = NUM_DESCS - 1;
-	xsk->tx.size = NUM_DESCS;
-	xsk->tx.producer = xsk->tx.map + off.tx.producer;
-	xsk->tx.consumer = xsk->tx.map + off.tx.consumer;
-	xsk->tx.ring = xsk->tx.map + off.tx.desc;
-	xsk->tx.cached_cons = NUM_DESCS;
+	xsk->umem = umem;
+	xsk->fd = xsk__create_xdp_socket(umem->fd, &xsk->rx, &xsk->tx, NULL);
+	if (xsk->fd < 0)
+		exit_with_error(-xsk->fd);
 
 	sxdp.sxdp_family = PF_XDP;
 	sxdp.sxdp_ifindex = opt_ifindex;
@@ -554,7 +218,24 @@ static struct xdpsock *xsk_configure(struct xdp_umem *umem)
 		sxdp.sxdp_flags = opt_xdp_bind_flags;
 	}
 
-	lassert(bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0);
+	if (!shared) {
+		u32 idx;
+		int i;
+
+		ret = xsk__reserve_prod(&xsk->umem->fq, XSK__DEFAULT_NUM_DESCS,
+				       &idx);
+		if (ret != XSK__DEFAULT_NUM_DESCS)
+			exit_with_error(-ret);
+		for (i = 0;
+		     i < XSK__DEFAULT_NUM_DESCS * XSK__DEFAULT_FRAME_SIZE;
+		     i += XSK__DEFAULT_FRAME_SIZE)
+			*xsk__get_fill_desc(&xsk->umem->fq, idx++) = i;
+		xsk__submit_prod(&xsk->umem->fq);
+	}
+
+	ret = bind(xsk->fd, (struct sockaddr *)&sxdp, sizeof(sxdp));
+	if (ret)
+		exit_with_error(errno);
 
 	return xsk;
 }
@@ -745,66 +426,92 @@ static void kick_tx(int fd)
 	ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0);
 	if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno == EBUSY)
 		return;
-	lassert(0);
+	exit_with_error(errno);
 }
 
-static inline void complete_tx_l2fwd(struct xdpsock *xsk)
+static inline void complete_tx_l2fwd(struct xsk_socket *xsk)
 {
-	u64 descs[BATCH_SIZE];
+	u32 idx_cq, idx_fq;
 	unsigned int rcvd;
 	size_t ndescs;
 
 	if (!xsk->outstanding_tx)
 		return;
 
-	kick_tx(xsk->sfd);
+	kick_tx(xsk->fd);
 	ndescs = (xsk->outstanding_tx > BATCH_SIZE) ? BATCH_SIZE :
-		 xsk->outstanding_tx;
+		xsk->outstanding_tx;
 
 	/* re-add completed Tx buffers */
-	rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, ndescs);
+	rcvd = xsk__peek_cons(&xsk->umem->cq, ndescs, &idx_cq);
 	if (rcvd > 0) {
-		umem_fill_to_kernel(&xsk->umem->fq, descs, rcvd);
+		unsigned int i;
+		int ret;
+
+		ret = xsk__reserve_prod(&xsk->umem->fq, rcvd, &idx_fq);
+		while (ret != rcvd) {
+			if (ret < 0)
+				exit_with_error(-ret);
+			ret = xsk__reserve_prod(&xsk->umem->fq, rcvd, &idx_fq);
+		}
+		for (i = 0; i < rcvd; i++)
+			*xsk__get_completion_desc(&xsk->umem->cq, idx_cq++) =
+				*xsk__get_fill_desc(&xsk->umem->fq, idx_fq++);
+
+		xsk__submit_prod(&xsk->umem->fq);
+		xsk__release_cons(&xsk->umem->cq);
 		xsk->outstanding_tx -= rcvd;
 		xsk->tx_npkts += rcvd;
 	}
 }
 
-static inline void complete_tx_only(struct xdpsock *xsk)
+static inline void complete_tx_only(struct xsk_socket *xsk)
 {
-	u64 descs[BATCH_SIZE];
 	unsigned int rcvd;
+	u32 idx;
 
 	if (!xsk->outstanding_tx)
 		return;
 
-	kick_tx(xsk->sfd);
+	kick_tx(xsk->fd);
 
-	rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, BATCH_SIZE);
+	rcvd = xsk__peek_cons(&xsk->umem->cq, BATCH_SIZE, &idx);
 	if (rcvd > 0) {
+		xsk__release_cons(&xsk->umem->cq);
 		xsk->outstanding_tx -= rcvd;
 		xsk->tx_npkts += rcvd;
 	}
 }
 
-static void rx_drop(struct xdpsock *xsk)
+static void rx_drop(struct xsk_socket *xsk)
 {
-	struct xdp_desc descs[BATCH_SIZE];
 	unsigned int rcvd, i;
+	u32 idx_rx, idx_fq;
+	int ret;
 
-	rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
+	rcvd = xsk__peek_cons(&xsk->rx, BATCH_SIZE, &idx_rx);
 	if (!rcvd)
 		return;
 
+	ret = xsk__reserve_prod(&xsk->umem->fq, rcvd, &idx_fq);
+	while (ret != rcvd) {
+		if (ret < 0)
+			exit_with_error(-ret);
+		ret = xsk__reserve_prod(&xsk->umem->fq, rcvd, &idx_fq);
+	}
+
 	for (i = 0; i < rcvd; i++) {
-		char *pkt = xq_get_data(xsk, descs[i].addr);
+		u64 addr = xsk__get_rx_desc(&xsk->rx, idx_rx)->addr;
+		u32 len = xsk__get_rx_desc(&xsk->rx, idx_rx++)->len;
+		char *pkt = xsk__get_data(xsk->umem->umem_area, addr);
 
-		hex_dump(pkt, descs[i].len, descs[i].addr);
+		hex_dump(pkt, len, addr);
+		*xsk__get_fill_desc(&xsk->umem->fq, idx_fq++) = addr;
 	}
 
+	xsk__submit_prod(&xsk->umem->fq);
+	xsk__release_cons(&xsk->rx);
 	xsk->rx_npkts += rcvd;
-
-	umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd);
 }
 
 static void rx_drop_all(void)
@@ -815,7 +522,7 @@ static void rx_drop_all(void)
 	memset(fds, 0, sizeof(fds));
 
 	for (i = 0; i < num_socks; i++) {
-		fds[i].fd = xsks[i]->sfd;
+		fds[i].fd = xsks[i]->fd;
 		fds[i].events = POLLIN;
 		timeout = 1000; /* 1sn */
 	}
@@ -832,14 +539,14 @@ static void rx_drop_all(void)
 	}
 }
 
-static void tx_only(struct xdpsock *xsk)
+static void tx_only(struct xsk_socket *xsk)
 {
 	int timeout, ret, nfds = 1;
 	struct pollfd fds[nfds + 1];
-	unsigned int idx = 0;
+	u32 idx, frame_nb = 0;
 
 	memset(fds, 0, sizeof(fds));
-	fds[0].fd = xsk->sfd;
+	fds[0].fd = xsk->fd;
 	fds[0].events = POLLOUT;
 	timeout = 1000; /* 1sn */
 
@@ -849,50 +556,71 @@ static void tx_only(struct xdpsock *xsk)
 			if (ret <= 0)
 				continue;
 
-			if (fds[0].fd != xsk->sfd ||
+			if (fds[0].fd != xsk->fd ||
 			    !(fds[0].revents & POLLOUT))
 				continue;
 		}
 
-		if (xq_nb_free(&xsk->tx, BATCH_SIZE) >= BATCH_SIZE) {
-			lassert(xq_enq_tx_only(&xsk->tx, idx, BATCH_SIZE) == 0);
+		if (xsk__reserve_prod(&xsk->tx, BATCH_SIZE, &idx) ==
+		    BATCH_SIZE) {
+			unsigned int i;
 
+			for (i = 0; i < BATCH_SIZE; i++) {
+				xsk__get_tx_desc(&xsk->tx, idx + i)->addr =
+					(frame_nb + i) <<
+					XSK__DEFAULT_FRAME_SHIFT;
+				xsk__get_tx_desc(&xsk->tx, idx + i)->len =
+					sizeof(pkt_data) - 1;
+			}
+
+			xsk__submit_prod(&xsk->tx);
 			xsk->outstanding_tx += BATCH_SIZE;
-			idx += BATCH_SIZE;
-			idx %= NUM_FRAMES;
+			frame_nb += BATCH_SIZE;
+			frame_nb %= NUM_FRAMES;
 		}
 
 		complete_tx_only(xsk);
 	}
 }
 
-static void l2fwd(struct xdpsock *xsk)
+static void l2fwd(struct xsk_socket *xsk)
 {
 	for (;;) {
-		struct xdp_desc descs[BATCH_SIZE];
 		unsigned int rcvd, i;
+		u32 idx_rx, idx_tx;
 		int ret;
 
 		for (;;) {
 			complete_tx_l2fwd(xsk);
 
-			rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE);
+			rcvd = xsk__peek_cons(&xsk->rx, BATCH_SIZE, &idx_rx);
 			if (rcvd > 0)
 				break;
 		}
 
+		ret = xsk__reserve_prod(&xsk->tx, rcvd, &idx_tx);
+		while (ret != rcvd) {
+			if (ret < 0)
+				exit_with_error(-ret);
+			ret = xsk__reserve_prod(&xsk->tx, rcvd, &idx_tx);
+		}
+
 		for (i = 0; i < rcvd; i++) {
-			char *pkt = xq_get_data(xsk, descs[i].addr);
+			u64 addr = xsk__get_rx_desc(&xsk->rx, idx_rx)->addr;
+			u32 len = xsk__get_rx_desc(&xsk->rx, idx_rx++)->len;
+			char *pkt = xsk__get_data(xsk->umem->umem_area, addr);
 
 			swap_mac_addresses(pkt);
 
-			hex_dump(pkt, descs[i].len, descs[i].addr);
+			hex_dump(pkt, len, addr);
+			xsk__get_tx_desc(&xsk->tx, idx_tx)->addr = addr;
+			xsk__get_tx_desc(&xsk->tx, idx_tx++)->len = len;
 		}
 
-		xsk->rx_npkts += rcvd;
+		xsk__submit_prod(&xsk->tx);
+		xsk__release_cons(&xsk->rx);
 
-		ret = xq_enq(&xsk->tx, descs, rcvd);
-		lassert(ret == 0);
+		xsk->rx_npkts += rcvd;
 		xsk->outstanding_tx += rcvd;
 	}
 }
@@ -906,9 +634,11 @@ int main(int argc, char **argv)
 	int prog_fd, qidconf_map, xsks_map;
 	struct bpf_object *obj;
 	char xdp_filename[256];
+	struct xdp_umem *umem;
 	struct bpf_map *map;
 	int i, ret, key = 0;
 	pthread_t pt;
+	void *bufs;
 
 	parse_command_line(argc, argv);
 
@@ -956,18 +686,32 @@ int main(int argc, char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	/* Create sockets... */
-	xsks[num_socks++] = xsk_configure(NULL);
+	ret = posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */
+			     NUM_FRAMES * XSK__DEFAULT_FRAME_SIZE);
+	if (ret)
+		exit_with_error(ret);
+
+       /* Create sockets... */
+	umem = xsk_configure_umem(bufs, NUM_FRAMES * XSK__DEFAULT_FRAME_SIZE);
+	xsks[num_socks++] = xsk_configure_socket(umem, false);
+
+	if (opt_bench == BENCH_TXONLY) {
+		int i;
+
+		for (i = 0; i < NUM_FRAMES * XSK__DEFAULT_FRAME_SIZE;
+		     i += XSK__DEFAULT_FRAME_SIZE)
+			(void)gen_eth_frame(&umem->umem_area[i]);
+	}
 
 #if RR_LB
 	for (i = 0; i < MAX_SOCKS - 1; i++)
-		xsks[num_socks++] = xsk_configure(xsks[0]->umem);
+		xsks[num_socks++] = xsk_configure_socket(umem, true);
 #endif
 
 	/* ...and insert them into the map. */
 	for (i = 0; i < num_socks; i++) {
 		key = i;
-		ret = bpf_map_update_elem(xsks_map, &key, &xsks[i]->sfd, 0);
+		ret = bpf_map_update_elem(xsks_map, &key, &xsks[i]->fd, 0);
 		if (ret) {
 			fprintf(stderr, "ERROR: bpf_map_update_elem %d\n", i);
 			exit(EXIT_FAILURE);
@@ -981,7 +725,8 @@ int main(int argc, char **argv)
 	setlocale(LC_ALL, "");
 
 	ret = pthread_create(&pt, NULL, poller, NULL);
-	lassert(ret == 0);
+	if (ret)
+		exit_with_error(ret);
 
 	prev_time = get_nsecs();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets
  2018-12-12 13:09 ` [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets Magnus Karlsson
@ 2018-12-13  6:23   ` Alexei Starovoitov
  2018-12-13  9:06     ` Magnus Karlsson
  2018-12-14 20:23   ` Alexei Starovoitov
  1 sibling, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2018-12-13  6:23 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jakub.kicinski, bjorn.topel,
	qi.z.zhang, brouer

On Wed, Dec 12, 2018 at 02:09:48PM +0100, Magnus Karlsson wrote:
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index cd02cd4..ae4cc0d 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -121,6 +121,15 @@ LIBBPF_0.0.1 {
>  		libbpf_prog_type_by_name;
>  		libbpf_set_print;
>  		libbpf_strerror;
> +		xsk__peek_cons;
> +		xsk__release_cons;
> +		xsk__reserve_prod;
> +		xsk__submit_prod;
> +		xsk__get_data;
> +		xsk__create_umem;
> +		xsk__create_xdp_socket;
> +		xsk__delete_umem;
> +		xsk__delete_xdp_socket;
>  	local:

I fully support the idea to provide common library for AF_XDP
that is easily available in the distros.

The main question is whether AF_XDP warrants its own lib or
piggy back on libbpf effort is acceptable.

Do you think above set of APIs will be enough for foreseeable
future or this is just a beginning?

If above is enough then it falls into XDP category. libbpf
already has minimal support for XDP and AF_XDP fits right in.

But if AF_XDP will keep growing a lot then it would make
sense to keep the functionality in a separate library
that minimally depends on libbpf.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets
  2018-12-13  6:23   ` Alexei Starovoitov
@ 2018-12-13  9:06     ` Magnus Karlsson
  2018-12-13 15:48       ` Daniel Borkmann
  0 siblings, 1 reply; 9+ messages in thread
From: Magnus Karlsson @ 2018-12-13  9:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Magnus Karlsson, Björn Töpel, ast, Daniel Borkmann,
	Network Development, Jakub Kicinski, Björn Töpel,
	Zhang, Qi Z, Jesper Dangaard Brouer

On Thu, Dec 13, 2018 at 7:24 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Dec 12, 2018 at 02:09:48PM +0100, Magnus Karlsson wrote:
> > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> > index cd02cd4..ae4cc0d 100644
> > --- a/tools/lib/bpf/libbpf.map
> > +++ b/tools/lib/bpf/libbpf.map
> > @@ -121,6 +121,15 @@ LIBBPF_0.0.1 {
> >               libbpf_prog_type_by_name;
> >               libbpf_set_print;
> >               libbpf_strerror;
> > +             xsk__peek_cons;
> > +             xsk__release_cons;
> > +             xsk__reserve_prod;
> > +             xsk__submit_prod;
> > +             xsk__get_data;
> > +             xsk__create_umem;
> > +             xsk__create_xdp_socket;
> > +             xsk__delete_umem;
> > +             xsk__delete_xdp_socket;
> >       local:
>
> I fully support the idea to provide common library for AF_XDP
> that is easily available in the distros.
>
> The main question is whether AF_XDP warrants its own lib or
> piggy back on libbpf effort is acceptable.
>
> Do you think above set of APIs will be enough for foreseeable
> future or this is just a beginning?

This should be enough for the foreseeable future, maybe with the
addition of the two higher level data plane functions xsk__recvmsg and
xsl__sendmsg that was mentioned in the cover letter. My intention with
this functionality is not to create another DPDK (there is already
one, so no reason to reinvent it). I just want to lower the bar of
entry for using AF_XDP and to stop people copying the code in the
sample application. To use AF_XDP you need libbpf anyway, so I think
it is a good fit for it. The intention is to keep this functionality
lean and mean.

> If above is enough then it falls into XDP category. libbpf
> already has minimal support for XDP and AF_XDP fits right in.
>
> But if AF_XDP will keep growing a lot then it would make
> sense to keep the functionality in a separate library
> that minimally depends on libbpf.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets
  2018-12-13  9:06     ` Magnus Karlsson
@ 2018-12-13 15:48       ` Daniel Borkmann
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Borkmann @ 2018-12-13 15:48 UTC (permalink / raw)
  To: Magnus Karlsson, Alexei Starovoitov
  Cc: Magnus Karlsson, Björn Töpel, ast, Network Development,
	Jakub Kicinski, Björn Töpel, Zhang, Qi Z,
	Jesper Dangaard Brouer

On 12/13/2018 10:06 AM, Magnus Karlsson wrote:
> On Thu, Dec 13, 2018 at 7:24 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Wed, Dec 12, 2018 at 02:09:48PM +0100, Magnus Karlsson wrote:
>>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>>> index cd02cd4..ae4cc0d 100644
>>> --- a/tools/lib/bpf/libbpf.map
>>> +++ b/tools/lib/bpf/libbpf.map
>>> @@ -121,6 +121,15 @@ LIBBPF_0.0.1 {
>>>               libbpf_prog_type_by_name;
>>>               libbpf_set_print;
>>>               libbpf_strerror;
>>> +             xsk__peek_cons;
>>> +             xsk__release_cons;
>>> +             xsk__reserve_prod;
>>> +             xsk__submit_prod;
>>> +             xsk__get_data;
>>> +             xsk__create_umem;
>>> +             xsk__create_xdp_socket;
>>> +             xsk__delete_umem;
>>> +             xsk__delete_xdp_socket;
>>>       local:
>>
>> I fully support the idea to provide common library for AF_XDP
>> that is easily available in the distros.

+1

>> The main question is whether AF_XDP warrants its own lib or
>> piggy back on libbpf effort is acceptable.
>>
>> Do you think above set of APIs will be enough for foreseeable
>> future or this is just a beginning?
> 
> This should be enough for the foreseeable future, maybe with the
> addition of the two higher level data plane functions xsk__recvmsg and
> xsl__sendmsg that was mentioned in the cover letter. My intention with
> this functionality is not to create another DPDK (there is already
> one, so no reason to reinvent it). I just want to lower the bar of
> entry for using AF_XDP and to stop people copying the code in the
> sample application. To use AF_XDP you need libbpf anyway, so I think
> it is a good fit for it. The intention is to keep this functionality
> lean and mean.

+1

>> If above is enough then it falls into XDP category. libbpf
>> already has minimal support for XDP and AF_XDP fits right in.

Agree, I think given we have XDP support in there already, it would
fit to complement the lib with AF_XDP helpers to set up and get raw
access to the pkt data.

Any framework on top of this providing helper functions to develop
applications should be out of scope here and subject to other libraries,
DPDK and whatnot.

>> But if AF_XDP will keep growing a lot then it would make
>> sense to keep the functionality in a separate library
>> that minimally depends on libbpf.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets
  2018-12-12 13:09 ` [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets Magnus Karlsson
  2018-12-13  6:23   ` Alexei Starovoitov
@ 2018-12-14 20:23   ` Alexei Starovoitov
  2018-12-17  9:12     ` Magnus Karlsson
  1 sibling, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2018-12-14 20:23 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bjorn.topel, ast, daniel, netdev, jakub.kicinski, bjorn.topel,
	qi.z.zhang, brouer

On Wed, Dec 12, 2018 at 02:09:48PM +0100, Magnus Karlsson wrote:
> This commit adds AF_XDP support to libbpf. The main reason for
> this is to facilitate writing applications that use AF_XDP by offering
> higher-level APIs that hide many of the details of the AF_XDP
> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> offering easy-to-use higher level interfaces of XDP
> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> applications using it simpler and smaller, and finally also make it
> possible for applications to benefit from optimizations in the AF_XDP
> user space access code. Previously, people just copied and pasted the
> code from the sample application into their application, which is not
> desirable.
> 
> The interface is composed of two parts:
> 
> * Low-level access interface to the four rings and the packet
> * High-level control plane interface for creating and setting
>   up umems and af_xdp sockets.
> 
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
...
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 5f68d7b..da99203 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h

may be instead of lib/bpf/libbpf.h the xsk stuff should go to lib/bpf/xsk.h ?

> @@ -15,6 +15,7 @@
>  #include <stdbool.h>
>  #include <sys/types.h>  // for size_t
>  #include <linux/bpf.h>
> +#include <linux/if_xdp.h>
>  
>  #ifdef __cplusplus
>  extern "C" {
> @@ -355,6 +356,98 @@ LIBBPF_API const struct bpf_line_info *
>  bpf_prog_linfo__lfind(const struct bpf_prog_linfo *prog_linfo,
>  		      __u32 insn_off, __u32 nr_skip);
>  
> +/* Do not access these members directly. Use the functions below. */
> +struct xsk_prod_ring {
> +	__u32 cached_prod;
> +	__u32 cached_cons;
> +	__u32 mask;
> +	__u32 size;
> +	__u32 *producer;
> +	__u32 *consumer;
> +	void *ring;
> +};
> +
> +/* Do not access these members directly. Use the functions below. */
> +struct xsk_cons_ring {
> +	__u32 cached_prod;
> +	__u32 cached_cons;
> +	__u32 mask;
> +	__u32 size;
> +	__u32 *producer;
> +	__u32 *consumer;
> +	void *ring;
> +};

xsk_prod_ring and xsk_cons_ring have exactly the same members,
but they're two different structs? why?
May be have one 'struct xsk_ring' ?

> +
> +static inline __u64 *xsk__get_fill_desc(struct xsk_prod_ring *fill,
> +				       __u64 idx)

see tools/lib/bpf/README.rst
the main idea is for "objects" use __ to separate class vs method.
In this case 'struct xsk_ring' would be an object and
the name of the method would be:
static inline __u64 *xsk_ring__get_fill_desc(struct xsk_ring *fill, __u64 idx)


> +{
> +	__u64 *descs = (__u64 *)fill->ring;
> +
> +	return &descs[idx & fill->mask];
> +}
> +
> +static inline __u64 *xsk__get_completion_desc(struct xsk_cons_ring *comp,
> +					     __u64 idx)
> +{
> +	__u64 *descs = (__u64 *)comp->ring;
> +
> +	return &descs[idx & comp->mask];
> +}
> +
> +static inline struct xdp_desc *xsk__get_tx_desc(struct xsk_prod_ring *tx,
> +					       __u64 idx)
> +{
> +	struct xdp_desc *descs = (struct xdp_desc *)tx->ring;
> +
> +	return &descs[idx & tx->mask];
> +}
> +
> +static inline struct xdp_desc *xsk__get_rx_desc(struct xsk_cons_ring *rx,
> +					       __u64 idx)
> +{
> +	struct xdp_desc *descs = (struct xdp_desc *)rx->ring;
> +
> +	return &descs[idx & rx->mask];
> +}
> +
> +LIBBPF_API size_t xsk__peek_cons(struct xsk_cons_ring *ring, size_t nb,
> +				__u32 *idx);
> +LIBBPF_API void xsk__release_cons(struct xsk_cons_ring *ring);
> +LIBBPF_API size_t xsk__reserve_prod(struct xsk_prod_ring *ring, size_t nb,
> +				   __u32 *idx);
> +LIBBPF_API void xsk__submit_prod(struct xsk_prod_ring *ring);

if we combine the struct names then above could be:

LIBBPF_API size_t xsk_ring__reserve(struct xsk_ring *ring, size_t nb, __u32 *idx);
LIBBPF_API void xsk_ring__submit(struct xsk_ring *ring);

?

> +
> +LIBBPF_API void *xsk__get_data(void *umem_area, __u64 addr);
> +
> +#define XSK__DEFAULT_NUM_DESCS      2048
> +#define XSK__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
> +#define XSK__DEFAULT_FRAME_SIZE     (1 << XSK__DEFAULT_FRAME_SHIFT)
> +#define XSK__DEFAULT_FRAME_HEADROOM 0
> +
> +struct xsk_umem_config {
> +	__u32 fq_size;
> +	__u32 cq_size;
> +	__u32 frame_size;
> +	__u32 frame_headroom;
> +};
> +
> +struct xsk_xdp_socket_config {
> +	__u32 rx_size;
> +	__u32 tx_size;
> +};
> +
> +/* Set config to XSK_DEFAULT_CONFIG to get the default configuration. */
> +LIBBPF_API int xsk__create_umem(void *umem_area, __u64 size,
> +			       struct xsk_prod_ring *fq,
> +			       struct xsk_cons_ring *cq,
> +			       struct xsk_umem_config *config);

this one looks too low level.
espcially considering it's usage:
umem->fd = xsk__create_umem(buffer, size, &umem->fq, &umem->cq, NULL);

may be create an object "struct xsk_umem" ?
then api will be:
err = xsk_umem__create(buffer, size, NULL) ?

> +LIBBPF_API int xsk__create_xdp_socket(int umem_fd, struct xsk_cons_ring *rx,
> +				     struct xsk_prod_ring *tx,
> +				     struct xsk_xdp_socket_config *config);

similar concern here. feels that implementation details are leaking into api.
The usage of it is:
xsk->fd = xsk__create_xdp_socket(umem->fd, &xsk->rx, &xsk->tx, NULL);

may be create an object "struct xsk_socket" ?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets
  2018-12-14 20:23   ` Alexei Starovoitov
@ 2018-12-17  9:12     ` Magnus Karlsson
  2018-12-18 18:53       ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Magnus Karlsson @ 2018-12-17  9:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Magnus Karlsson, Björn Töpel, ast, Daniel Borkmann,
	Network Development, Jakub Kicinski, Björn Töpel,
	Zhang, Qi Z, Jesper Dangaard Brouer

On Fri, Dec 14, 2018 at 9:25 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Dec 12, 2018 at 02:09:48PM +0100, Magnus Karlsson wrote:
> > This commit adds AF_XDP support to libbpf. The main reason for
> > this is to facilitate writing applications that use AF_XDP by offering
> > higher-level APIs that hide many of the details of the AF_XDP
> > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > offering easy-to-use higher level interfaces of XDP
> > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > applications using it simpler and smaller, and finally also make it
> > possible for applications to benefit from optimizations in the AF_XDP
> > user space access code. Previously, people just copied and pasted the
> > code from the sample application into their application, which is not
> > desirable.
> >
> > The interface is composed of two parts:
> >
> > * Low-level access interface to the four rings and the packet
> > * High-level control plane interface for creating and setting
> >   up umems and af_xdp sockets.
> >
> > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ...
> > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > index 5f68d7b..da99203 100644
> > --- a/tools/lib/bpf/libbpf.h
> > +++ b/tools/lib/bpf/libbpf.h
>
> may be instead of lib/bpf/libbpf.h the xsk stuff should go to lib/bpf/xsk.h ?

Yes. Good idea.

> > @@ -15,6 +15,7 @@
> >  #include <stdbool.h>
> >  #include <sys/types.h>  // for size_t
> >  #include <linux/bpf.h>
> > +#include <linux/if_xdp.h>
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> > @@ -355,6 +356,98 @@ LIBBPF_API const struct bpf_line_info *
> >  bpf_prog_linfo__lfind(const struct bpf_prog_linfo *prog_linfo,
> >                     __u32 insn_off, __u32 nr_skip);
> >
> > +/* Do not access these members directly. Use the functions below. */
> > +struct xsk_prod_ring {
> > +     __u32 cached_prod;
> > +     __u32 cached_cons;
> > +     __u32 mask;
> > +     __u32 size;
> > +     __u32 *producer;
> > +     __u32 *consumer;
> > +     void *ring;
> > +};
> > +
> > +/* Do not access these members directly. Use the functions below. */
> > +struct xsk_cons_ring {
> > +     __u32 cached_prod;
> > +     __u32 cached_cons;
> > +     __u32 mask;
> > +     __u32 size;
> > +     __u32 *producer;
> > +     __u32 *consumer;
> > +     void *ring;
> > +};
>
> xsk_prod_ring and xsk_cons_ring have exactly the same members,
> but they're two different structs? why?
> May be have one 'struct xsk_ring' ?

They operate on the same ring but represents either the producer or
the consumer of that same ring. The reason for this is that I want to
make sure that the user gets a compile time error if he or she tries
to use for example the function xsk__reserve_prod when user space is a
consumer of that ring (and the same kind of argument for
xsk__submit_prod, xsk__peek_cons, and xsk__release_cons). If we move
to a xsk_ring, we lose this compile time check, or is there another
better way of doing this in C without getting double definitions or
using hard to read #defines? The only benefit I see with going to
xsk_ring is that we get rid of two small inline functions and one
struct definition in the header file, but the code size will be the
same. Personally I prefer compile time error checking. But let me know
what to proceed with.

> > +
> > +static inline __u64 *xsk__get_fill_desc(struct xsk_prod_ring *fill,
> > +                                    __u64 idx)
>
> see tools/lib/bpf/README.rst
> the main idea is for "objects" use __ to separate class vs method.
> In this case 'struct xsk_ring' would be an object and
> the name of the method would be:
> static inline __u64 *xsk_ring__get_fill_desc(struct xsk_ring *fill, __u64 idx)

Got it. Will fix.

> > +{
> > +     __u64 *descs = (__u64 *)fill->ring;
> > +
> > +     return &descs[idx & fill->mask];
> > +}
> > +
> > +static inline __u64 *xsk__get_completion_desc(struct xsk_cons_ring *comp,
> > +                                          __u64 idx)
> > +{
> > +     __u64 *descs = (__u64 *)comp->ring;
> > +
> > +     return &descs[idx & comp->mask];
> > +}
> > +
> > +static inline struct xdp_desc *xsk__get_tx_desc(struct xsk_prod_ring *tx,
> > +                                            __u64 idx)
> > +{
> > +     struct xdp_desc *descs = (struct xdp_desc *)tx->ring;
> > +
> > +     return &descs[idx & tx->mask];
> > +}
> > +
> > +static inline struct xdp_desc *xsk__get_rx_desc(struct xsk_cons_ring *rx,
> > +                                            __u64 idx)
> > +{
> > +     struct xdp_desc *descs = (struct xdp_desc *)rx->ring;
> > +
> > +     return &descs[idx & rx->mask];
> > +}
> > +
> > +LIBBPF_API size_t xsk__peek_cons(struct xsk_cons_ring *ring, size_t nb,
> > +                             __u32 *idx);
> > +LIBBPF_API void xsk__release_cons(struct xsk_cons_ring *ring);
> > +LIBBPF_API size_t xsk__reserve_prod(struct xsk_prod_ring *ring, size_t nb,
> > +                                __u32 *idx);
> > +LIBBPF_API void xsk__submit_prod(struct xsk_prod_ring *ring);
>
> if we combine the struct names then above could be:
>
> LIBBPF_API size_t xsk_ring__reserve(struct xsk_ring *ring, size_t nb, __u32 *idx);
> LIBBPF_API void xsk_ring__submit(struct xsk_ring *ring);

The implementations of xsk__peek_cons and and xsk__reserve_prod are
different because one performs producer operations and the other
consumer ones, so they cannot be combined (well at least not without
some added if statements and state that would impact performance).
Same for xsk__release_cons and xsk__submit_prod. But the static inline
functions could be reduced from four to two with an xsk_ring struct.

>
> > +
> > +LIBBPF_API void *xsk__get_data(void *umem_area, __u64 addr);
> > +
> > +#define XSK__DEFAULT_NUM_DESCS      2048
> > +#define XSK__DEFAULT_FRAME_SHIFT    11 /* 2048 bytes */
> > +#define XSK__DEFAULT_FRAME_SIZE     (1 << XSK__DEFAULT_FRAME_SHIFT)
> > +#define XSK__DEFAULT_FRAME_HEADROOM 0
> > +
> > +struct xsk_umem_config {
> > +     __u32 fq_size;
> > +     __u32 cq_size;
> > +     __u32 frame_size;
> > +     __u32 frame_headroom;
> > +};
> > +
> > +struct xsk_xdp_socket_config {
> > +     __u32 rx_size;
> > +     __u32 tx_size;
> > +};
> > +
> > +/* Set config to XSK_DEFAULT_CONFIG to get the default configuration. */
> > +LIBBPF_API int xsk__create_umem(void *umem_area, __u64 size,
> > +                            struct xsk_prod_ring *fq,
> > +                            struct xsk_cons_ring *cq,
> > +                            struct xsk_umem_config *config);
>
> this one looks too low level.
> espcially considering it's usage:
> umem->fd = xsk__create_umem(buffer, size, &umem->fq, &umem->cq, NULL);
>
> may be create an object "struct xsk_umem" ?
> then api will be:
> err = xsk_umem__create(buffer, size, NULL) ?

Makes sense considering the other functions in the library. Will do.

> > +LIBBPF_API int xsk__create_xdp_socket(int umem_fd, struct xsk_cons_ring *rx,
> > +                                  struct xsk_prod_ring *tx,
> > +                                  struct xsk_xdp_socket_config *config);
>
> similar concern here. feels that implementation details are leaking into api.
> The usage of it is:
> xsk->fd = xsk__create_xdp_socket(umem->fd, &xsk->rx, &xsk->tx, NULL);
>
> may be create an object "struct xsk_socket" ?

Yes to this also. I will add a function xsk_socket__fd(struct
xsk_socket) that returns the fd when the user needs it (for bind, poll
and others), in the same spirit as other functionality in the library.
Protest if you do not agree.

Thanks: Magnus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets
  2018-12-17  9:12     ` Magnus Karlsson
@ 2018-12-18 18:53       ` Alexei Starovoitov
  0 siblings, 0 replies; 9+ messages in thread
From: Alexei Starovoitov @ 2018-12-18 18:53 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Magnus Karlsson, Björn Töpel, ast, Daniel Borkmann,
	Network Development, Jakub Kicinski, Björn Töpel,
	Zhang, Qi Z, Jesper Dangaard Brouer

On Mon, Dec 17, 2018 at 10:12:33AM +0100, Magnus Karlsson wrote:
> On Fri, Dec 14, 2018 at 9:25 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Dec 12, 2018 at 02:09:48PM +0100, Magnus Karlsson wrote:
> > > This commit adds AF_XDP support to libbpf. The main reason for
> > > this is to facilitate writing applications that use AF_XDP by offering
> > > higher-level APIs that hide many of the details of the AF_XDP
> > > uapi. This is in the same vein as libbpf facilitates XDP adoption by
> > > offering easy-to-use higher level interfaces of XDP
> > > functionality. Hopefully this will facilitate adoption of AF_XDP, make
> > > applications using it simpler and smaller, and finally also make it
> > > possible for applications to benefit from optimizations in the AF_XDP
> > > user space access code. Previously, people just copied and pasted the
> > > code from the sample application into their application, which is not
> > > desirable.
> > >
> > > The interface is composed of two parts:
> > >
> > > * Low-level access interface to the four rings and the packet
> > > * High-level control plane interface for creating and setting
> > >   up umems and af_xdp sockets.
> > >
> > > Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> > ...
> > > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > > index 5f68d7b..da99203 100644
> > > --- a/tools/lib/bpf/libbpf.h
> > > +++ b/tools/lib/bpf/libbpf.h
> >
> > may be instead of lib/bpf/libbpf.h the xsk stuff should go to lib/bpf/xsk.h ?
> 
> Yes. Good idea.
> 
> > > @@ -15,6 +15,7 @@
> > >  #include <stdbool.h>
> > >  #include <sys/types.h>  // for size_t
> > >  #include <linux/bpf.h>
> > > +#include <linux/if_xdp.h>
> > >
> > >  #ifdef __cplusplus
> > >  extern "C" {
> > > @@ -355,6 +356,98 @@ LIBBPF_API const struct bpf_line_info *
> > >  bpf_prog_linfo__lfind(const struct bpf_prog_linfo *prog_linfo,
> > >                     __u32 insn_off, __u32 nr_skip);
> > >
> > > +/* Do not access these members directly. Use the functions below. */
> > > +struct xsk_prod_ring {
> > > +     __u32 cached_prod;
> > > +     __u32 cached_cons;
> > > +     __u32 mask;
> > > +     __u32 size;
> > > +     __u32 *producer;
> > > +     __u32 *consumer;
> > > +     void *ring;
> > > +};
> > > +
> > > +/* Do not access these members directly. Use the functions below. */
> > > +struct xsk_cons_ring {
> > > +     __u32 cached_prod;
> > > +     __u32 cached_cons;
> > > +     __u32 mask;
> > > +     __u32 size;
> > > +     __u32 *producer;
> > > +     __u32 *consumer;
> > > +     void *ring;
> > > +};
> >
> > xsk_prod_ring and xsk_cons_ring have exactly the same members,
> > but they're two different structs? why?
> > May be have one 'struct xsk_ring' ?
> 
> They operate on the same ring but represents either the producer or
> the consumer of that same ring. The reason for this is that I want to
> make sure that the user gets a compile time error if he or she tries
> to use for example the function xsk__reserve_prod when user space is a
> consumer of that ring (and the same kind of argument for
> xsk__submit_prod, xsk__peek_cons, and xsk__release_cons). If we move
> to a xsk_ring, we lose this compile time check, or is there another
> better way of doing this in C without getting double definitions or
> using hard to read #defines? The only benefit I see with going to
> xsk_ring is that we get rid of two small inline functions and one
> struct definition in the header file, but the code size will be the
> same. Personally I prefer compile time error checking. But let me know
> what to proceed with.

how about the following?
struct xsk_ring {...};
struct xsk_prod {
  struct xsk_ring r;
};
struct xsk_cons {
  struct xsk_ring r;
};
and compiler will warn when xsk_prod is used instead of xsk_cons.
The methods can be called xsk_prod__* and xsk_cons__*

Also do 'producer' and 'consumer' names really fit?
Typically producer and consumer are two sides of the single ring.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-12-18 18:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 13:09 [PATCH bpf-next v2 0/2] libbpf: adding AF_XDP support Magnus Karlsson
2018-12-12 13:09 ` [PATCH bpf-next v2 1/2] libbpf: add support for using AF_XDP sockets Magnus Karlsson
2018-12-13  6:23   ` Alexei Starovoitov
2018-12-13  9:06     ` Magnus Karlsson
2018-12-13 15:48       ` Daniel Borkmann
2018-12-14 20:23   ` Alexei Starovoitov
2018-12-17  9:12     ` Magnus Karlsson
2018-12-18 18:53       ` Alexei Starovoitov
2018-12-12 13:09 ` [PATCH bpf-next v2 2/2] samples/bpf: convert xdpsock to use libbpf for AF_XDP access Magnus Karlsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).