netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/4] libbpf: xsk improvements
@ 2019-06-03 13:19 Maciej Fijalkowski
  2019-06-03 13:19 ` [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call Maciej Fijalkowski
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-03 13:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving

Hello,
This set contains fixes for libbpf's AF_XDP support. For patches 1-3 please
have a look at commit messages, 4th patch needs a bit more discussion.

Patch 4 tries to address the issue of dangling xsksocks in case when there were
many instances of those running on a particular interface and one of them got
removed. The case is that we don't keep an information about how many entries
are currently used in eBPF maps, so there's a need for having an external
counter, or we could just be traversing the map via
bpf_map_get_next_key/bpf_map_lookup_elem syscall combination, but IMHO that's a
no-go since the maps used in libbpf's xsk get preallocated entries. The count
of entries is equal to number of HW queues present on a working interface,
which means many unnecessary operations as CPUs are getting more and more cores
and normally NICs are allocating HW queue per CPU core.

For xsk counter we could have for example additional entry in qidconf_map, but
that map is removed in Jonathan's patches, so that's not an option. The
resolution I gave a shot is to have a pinned map with a single entry. Further
reasoning is included in commit message of fourth patch.

Thanks!

Maciej Fijalkowski (4):
  libbpf: fill the AF_XDP fill queue before bind() call
  libbpf: check for channels.max_{t,r}x in xsk_get_max_queues
  libbpf: move xdp program removal to libbpf
  libbpf: don't remove eBPF resources when other xsks are present

 samples/bpf/xdpsock_user.c |  48 ++++--------------
 tools/lib/bpf/xsk.c        | 124 +++++++++++++++++++++++++++++++++++++++------
 tools/lib/bpf/xsk.h        |   1 +
 3 files changed, 120 insertions(+), 53 deletions(-)

-- 
2.16.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call
  2019-06-03 13:19 [RFC PATCH bpf-next 0/4] libbpf: xsk improvements Maciej Fijalkowski
@ 2019-06-03 13:19 ` Maciej Fijalkowski
  2019-06-04  8:06   ` Björn Töpel
  2019-06-03 13:19 ` [RFC PATCH bpf-next 2/4] libbpf: check for channels.max_{t,r}x in xsk_get_max_queues Maciej Fijalkowski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-03 13:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving

Let's get into the driver via ndo_bpf with command set to XDP_SETUP_UMEM
with fill queue that already contains some available entries that can be
used by Rx driver rings. Things worked in such way on old version of
xdpsock (that lacked libbpf support) and there's no particular reason
for having this preparation done after bind().

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
---
 samples/bpf/xdpsock_user.c | 15 ---------------
 tools/lib/bpf/xsk.c        | 19 ++++++++++++++++++-
 2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index d08ee1ab7bb4..e9dceb09b6d1 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -296,8 +296,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
 	struct xsk_socket_config cfg;
 	struct xsk_socket_info *xsk;
 	int ret;
-	u32 idx;
-	int i;
 
 	xsk = calloc(1, sizeof(*xsk));
 	if (!xsk)
@@ -318,19 +316,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
 	if (ret)
 		exit_with_error(-ret);
 
-	ret = xsk_ring_prod__reserve(&xsk->umem->fq,
-				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
-				     &idx);
-	if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
-		exit_with_error(-ret);
-	for (i = 0;
-	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
-		     XSK_UMEM__DEFAULT_FRAME_SIZE;
-	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
-		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
-	xsk_ring_prod__submit(&xsk->umem->fq,
-			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
-
 	return xsk;
 }
 
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 38667b62f1fe..57dda1389870 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -529,7 +529,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	struct xdp_mmap_offsets off;
 	struct xsk_socket *xsk;
 	socklen_t optlen;
-	int err;
+	int err, i;
+	u32 idx;
 
 	if (!umem || !xsk_ptr || !rx || !tx)
 		return -EFAULT;
@@ -632,6 +633,22 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 	}
 	xsk->tx = tx;
 
+	err = xsk_ring_prod__reserve(umem->fill,
+				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
+				     &idx);
+	if (err != XSK_RING_PROD__DEFAULT_NUM_DESCS) {
+		err = -errno;
+		goto out_mmap_tx;
+	}
+
+	for (i = 0;
+	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
+		     XSK_UMEM__DEFAULT_FRAME_SIZE;
+	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
+		*xsk_ring_prod__fill_addr(umem->fill, idx++) = i;
+	xsk_ring_prod__submit(umem->fill,
+			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
+
 	sxdp.sxdp_family = PF_XDP;
 	sxdp.sxdp_ifindex = xsk->ifindex;
 	sxdp.sxdp_queue_id = xsk->queue_id;
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH bpf-next 2/4] libbpf: check for channels.max_{t,r}x in xsk_get_max_queues
  2019-06-03 13:19 [RFC PATCH bpf-next 0/4] libbpf: xsk improvements Maciej Fijalkowski
  2019-06-03 13:19 ` [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call Maciej Fijalkowski
@ 2019-06-03 13:19 ` Maciej Fijalkowski
  2019-06-04  8:06   ` Björn Töpel
  2019-06-03 13:19 ` [RFC PATCH bpf-next 3/4] libbpf: move xdp program removal to libbpf Maciej Fijalkowski
  2019-06-03 13:19 ` [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present Maciej Fijalkowski
  3 siblings, 1 reply; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-03 13:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving

When it comes down to ethtool's get channels API, various drivers are
reporting the queue count in two ways - they are setting max_combined or
max_tx/max_rx fields. When creating the eBPF maps for xsk socket, this
API is used so that we have an entries in maps per each queue.
In case where driver (mlx4, ice) reports queues in max_tx/max_rx, we end
up with eBPF maps with single entries, so it's not possible to attach an
AF_XDP socket onto queue other than 0 - xsk_set_bpf_maps() would try to
call bpf_map_update_elem() with key set to xsk->queue_id.

To fix this, let's look for channels.max_{t,r}x as well in
xsk_get_max_queues.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/lib/bpf/xsk.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 57dda1389870..514ab3fb06f4 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -339,21 +339,23 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
 	ifr.ifr_data = (void *)&channels;
 	strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ);
 	err = ioctl(fd, SIOCETHTOOL, &ifr);
-	if (err && errno != EOPNOTSUPP) {
-		ret = -errno;
-		goto out;
-	}
+	close(fd);
+
+	if (err && errno != EOPNOTSUPP)
+		return -errno;
 
-	if (channels.max_combined == 0 || errno == EOPNOTSUPP)
+	if (channels.max_combined)
+		ret = channels.max_combined;
+	else if (channels.max_rx && channels.max_tx)
+		ret = min(channels.max_rx, channels.max_tx);
+	else if (channels.max_combined == 0 || errno == EOPNOTSUPP)
 		/* If the device says it has no channels, then all traffic
 		 * is sent to a single stream, so max queues = 1.
 		 */
 		ret = 1;
 	else
-		ret = channels.max_combined;
+		ret = -1;
 
-out:
-	close(fd);
 	return ret;
 }
 
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH bpf-next 3/4] libbpf: move xdp program removal to libbpf
  2019-06-03 13:19 [RFC PATCH bpf-next 0/4] libbpf: xsk improvements Maciej Fijalkowski
  2019-06-03 13:19 ` [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call Maciej Fijalkowski
  2019-06-03 13:19 ` [RFC PATCH bpf-next 2/4] libbpf: check for channels.max_{t,r}x in xsk_get_max_queues Maciej Fijalkowski
@ 2019-06-03 13:19 ` Maciej Fijalkowski
  2019-06-04  8:07   ` Björn Töpel
  2019-06-03 13:19 ` [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present Maciej Fijalkowski
  3 siblings, 1 reply; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-03 13:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving

Since xsk support in libbpf loads the xdp program interface, make it
also responsible for its removal. Store the prog id in xsk_socket_config
so when removing the program we are still able to compare the current
program id with the id from the attachment time and make a decision
onward.

While at it, remove the socket/umem in xdpsock's error path.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 samples/bpf/xdpsock_user.c | 33 ++++++++++-----------------------
 tools/lib/bpf/xsk.c        | 32 ++++++++++++++++++++++++++++++++
 tools/lib/bpf/xsk.h        |  1 +
 3 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index e9dceb09b6d1..123862b16dd4 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -68,7 +68,6 @@ static int opt_queue;
 static int opt_poll;
 static int opt_interval = 1;
 static u32 opt_xdp_bind_flags;
-static __u32 prog_id;
 
 struct xsk_umem_info {
 	struct xsk_ring_prod fq;
@@ -170,22 +169,6 @@ static void *poller(void *arg)
 	return NULL;
 }
 
-static void remove_xdp_program(void)
-{
-	__u32 curr_prog_id = 0;
-
-	if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) {
-		printf("bpf_get_link_xdp_id failed\n");
-		exit(EXIT_FAILURE);
-	}
-	if (prog_id == curr_prog_id)
-		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
-	else if (!curr_prog_id)
-		printf("couldn't find a prog id on a given interface\n");
-	else
-		printf("program on interface changed, not removing\n");
-}
-
 static void int_exit(int sig)
 {
 	struct xsk_umem *umem = xsks[0]->umem->umem;
@@ -195,7 +178,6 @@ static void int_exit(int sig)
 	dump_stats();
 	xsk_socket__delete(xsks[0]->xsk);
 	(void)xsk_umem__delete(umem);
-	remove_xdp_program();
 
 	exit(EXIT_SUCCESS);
 }
@@ -206,7 +188,16 @@ static void __exit_with_error(int error, const char *file, const char *func,
 	fprintf(stderr, "%s:%s:%i: errno: %d/\"%s\"\n", file, func,
 		line, error, strerror(error));
 	dump_stats();
-	remove_xdp_program();
+
+	if (xsks[0]->xsk)
+		xsk_socket__delete(xsks[0]->xsk);
+
+	if (xsks[0]->umem) {
+		struct xsk_umem *umem = xsks[0]->umem->umem;
+
+		(void)xsk_umem__delete(umem);
+	}
+
 	exit(EXIT_FAILURE);
 }
 
@@ -312,10 +303,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
 	if (ret)
 		exit_with_error(-ret);
 
-	ret = bpf_get_link_xdp_id(opt_ifindex, &prog_id, opt_xdp_flags);
-	if (ret)
-		exit_with_error(-ret);
-
 	return xsk;
 }
 
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 514ab3fb06f4..e28bedb0b078 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -259,6 +259,8 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
 static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 {
 	static const int log_buf_size = 16 * 1024;
+	struct bpf_prog_info info = {};
+	__u32 info_len = sizeof(info);
 	char log_buf[log_buf_size];
 	int err, prog_fd;
 
@@ -321,6 +323,14 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
 		return err;
 	}
 
+	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
+	if (err) {
+		pr_warning("can't get prog info - %s\n", strerror(errno));
+		close(prog_fd);
+		return err;
+	}
+	xsk->config.prog_id = info.id;
+
 	xsk->prog_fd = prog_fd;
 	return 0;
 }
@@ -483,6 +493,25 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
 	return err;
 }
 
+static void xsk_remove_xdp_prog(struct xsk_socket *xsk)
+{
+	__u32 prog_id = xsk->config.prog_id;
+	__u32 curr_prog_id = 0;
+	int err;
+
+	err = bpf_get_link_xdp_id(xsk->ifindex, &curr_prog_id,
+				  xsk->config.xdp_flags);
+	if (err)
+		return;
+
+	if (prog_id == curr_prog_id)
+		bpf_set_link_xdp_fd(xsk->ifindex, -1, xsk->config.xdp_flags);
+	else if (!curr_prog_id)
+		pr_warning("couldn't find a prog id on a given interface\n");
+	else
+		pr_warning("program on interface changed, not removing\n");
+}
+
 static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
 {
 	__u32 prog_id = 0;
@@ -506,6 +535,7 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
 		err = xsk_lookup_bpf_maps(xsk);
 		if (err)
 			goto out_load;
+		xsk->config.prog_id = prog_id;
 	}
 
 	err = xsk_set_bpf_maps(xsk);
@@ -744,6 +774,8 @@ void xsk_socket__delete(struct xsk_socket *xsk)
 
 	}
 
+	xsk_remove_xdp_prog(xsk);
+
 	xsk->umem->refcount--;
 	/* Do not close an fd that also has an associated umem connected
 	 * to it.
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 82ea71a0f3ec..e1b23e9432c9 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -186,6 +186,7 @@ struct xsk_socket_config {
 	__u32 tx_size;
 	__u32 libbpf_flags;
 	__u32 xdp_flags;
+	__u32 prog_id;
 	__u16 bind_flags;
 };
 
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present
  2019-06-03 13:19 [RFC PATCH bpf-next 0/4] libbpf: xsk improvements Maciej Fijalkowski
                   ` (2 preceding siblings ...)
  2019-06-03 13:19 ` [RFC PATCH bpf-next 3/4] libbpf: move xdp program removal to libbpf Maciej Fijalkowski
@ 2019-06-03 13:19 ` Maciej Fijalkowski
  2019-06-03 18:26   ` Jonathan Lemon
  2019-06-04  8:08   ` Björn Töpel
  3 siblings, 2 replies; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-03 13:19 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving

In case where multiple xsk sockets are attached to a single interface
and one of them gets detached, the eBPF maps and program are removed.
This should not happen as the rest of xsksocks are still using these
resources.

In order to fix that, let's have an additional eBPF map with a single
entry that will be used as a xsks count. During the xsk_socket__delete,
remove the resources only when this count is equal to 0.  This map is
not being accessed from eBPF program, so the verifier is not associating
it with the prog, which in turn makes bpf_obj_get_info_by_fd not
reporting this map in nr_map_ids field of struct bpf_prog_info. The
described behaviour brings the need to have this map pinned, so in
case when socket is being created and the libbpf detects the presence of
bpf resources, it will be able to access that map.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 tools/lib/bpf/xsk.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index e28bedb0b078..88d2c931ad14 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -44,6 +44,8 @@
  #define PF_XDP AF_XDP
 #endif
 
+#define XSKS_CNT_MAP_PATH "/sys/fs/bpf/xsks_cnt_map"
+
 struct xsk_umem {
 	struct xsk_ring_prod *fill;
 	struct xsk_ring_cons *comp;
@@ -65,6 +67,7 @@ struct xsk_socket {
 	int prog_fd;
 	int qidconf_map_fd;
 	int xsks_map_fd;
+	int xsks_cnt_map_fd;
 	__u32 queue_id;
 	char ifname[IFNAMSIZ];
 };
@@ -372,7 +375,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
 static int xsk_create_bpf_maps(struct xsk_socket *xsk)
 {
 	int max_queues;
-	int fd;
+	int fd, ret;
 
 	max_queues = xsk_get_max_queues(xsk);
 	if (max_queues < 0)
@@ -392,6 +395,24 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk)
 	}
 	xsk->xsks_map_fd = fd;
 
+	fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "xsks_cnt_map",
+				 sizeof(int), sizeof(int), 1, 0);
+	if (fd < 0) {
+		close(xsk->qidconf_map_fd);
+		close(xsk->xsks_map_fd);
+		return fd;
+	}
+
+	ret = bpf_obj_pin(fd, XSKS_CNT_MAP_PATH);
+	if (ret < 0) {
+		pr_warning("pinning map failed; is bpffs mounted?\n");
+		close(xsk->qidconf_map_fd);
+		close(xsk->xsks_map_fd);
+		close(fd);
+		return ret;
+	}
+	xsk->xsks_cnt_map_fd = fd;
+
 	return 0;
 }
 
@@ -456,8 +477,10 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
 		close(fd);
 	}
 
+	xsk->xsks_cnt_map_fd = bpf_obj_get(XSKS_CNT_MAP_PATH);
 	err = 0;
-	if (xsk->qidconf_map_fd < 0 || xsk->xsks_map_fd < 0) {
+	if (xsk->qidconf_map_fd < 0 || xsk->xsks_map_fd < 0 ||
+	    xsk->xsks_cnt_map_fd < 0) {
 		err = -ENOENT;
 		xsk_delete_bpf_maps(xsk);
 	}
@@ -467,17 +490,25 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
 	return err;
 }
 
-static void xsk_clear_bpf_maps(struct xsk_socket *xsk)
+static void xsk_clear_bpf_maps(struct xsk_socket *xsk, long *xsks_cnt_ptr)
 {
+	long xsks_cnt, key = 0;
 	int qid = false;
 
 	bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
 	bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id);
+	bpf_map_lookup_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt);
+	if (xsks_cnt)
+		xsks_cnt--;
+	bpf_map_update_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt, 0);
+	if (xsks_cnt_ptr)
+		*xsks_cnt_ptr = xsks_cnt;
 }
 
 static int xsk_set_bpf_maps(struct xsk_socket *xsk)
 {
 	int qid = true, fd = xsk->fd, err;
+	long xsks_cnt, key = 0;
 
 	err = bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
 	if (err)
@@ -487,9 +518,18 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
 	if (err)
 		goto out;
 
+	err = bpf_map_lookup_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt);
+	if (err)
+		goto out;
+
+	xsks_cnt++;
+	err = bpf_map_update_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt, 0);
+	if (err)
+		goto out;
+
 	return 0;
 out:
-	xsk_clear_bpf_maps(xsk);
+	xsk_clear_bpf_maps(xsk, NULL);
 	return err;
 }
 
@@ -752,13 +792,18 @@ void xsk_socket__delete(struct xsk_socket *xsk)
 	size_t desc_sz = sizeof(struct xdp_desc);
 	struct xdp_mmap_offsets off;
 	socklen_t optlen;
+	long xsks_cnt;
 	int err;
 
 	if (!xsk)
 		return;
 
-	xsk_clear_bpf_maps(xsk);
-	xsk_delete_bpf_maps(xsk);
+	xsk_clear_bpf_maps(xsk, &xsks_cnt);
+	unlink(XSKS_CNT_MAP_PATH);
+	if (!xsks_cnt) {
+		xsk_delete_bpf_maps(xsk);
+		xsk_remove_xdp_prog(xsk);
+	}
 
 	optlen = sizeof(off);
 	err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
@@ -774,8 +819,6 @@ void xsk_socket__delete(struct xsk_socket *xsk)
 
 	}
 
-	xsk_remove_xdp_prog(xsk);
-
 	xsk->umem->refcount--;
 	/* Do not close an fd that also has an associated umem connected
 	 * to it.
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present
  2019-06-03 13:19 ` [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present Maciej Fijalkowski
@ 2019-06-03 18:26   ` Jonathan Lemon
  2019-06-04  8:08   ` Björn Töpel
  1 sibling, 0 replies; 18+ messages in thread
From: Jonathan Lemon @ 2019-06-03 18:26 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: magnus.karlsson, bjorn.topel, netdev, ast, daniel,
	jakub.kicinski, songliubraving

On 3 Jun 2019, at 6:19, Maciej Fijalkowski wrote:

> In case where multiple xsk sockets are attached to a single interface
> and one of them gets detached, the eBPF maps and program are removed.
> This should not happen as the rest of xsksocks are still using these
> resources.

I'm not seeing that behavior - each xsk holds it's own reference to
xsks_maps, so when the map descriptor is closed, it doesn't necessarily
delete the map.

There's no refcount on the bpf program though; so the socket should not
be trying to remove the program - that should be done by the application.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call
  2019-06-03 13:19 ` [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call Maciej Fijalkowski
@ 2019-06-04  8:06   ` Björn Töpel
  2019-06-04 15:04     ` Maciej Fijalkowski
  0 siblings, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2019-06-04  8:06 UTC (permalink / raw)
  To: Maciej Fijalkowski, magnus.karlsson, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving, bpf

On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> Let's get into the driver via ndo_bpf with command set to XDP_SETUP_UMEM
> with fill queue that already contains some available entries that can be
> used by Rx driver rings. Things worked in such way on old version of
> xdpsock (that lacked libbpf support) and there's no particular reason
> for having this preparation done after bind().
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
> ---
>   samples/bpf/xdpsock_user.c | 15 ---------------
>   tools/lib/bpf/xsk.c        | 19 ++++++++++++++++++-
>   2 files changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index d08ee1ab7bb4..e9dceb09b6d1 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -296,8 +296,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
>   	struct xsk_socket_config cfg;
>   	struct xsk_socket_info *xsk;
>   	int ret;
> -	u32 idx;
> -	int i;
>   
>   	xsk = calloc(1, sizeof(*xsk));
>   	if (!xsk)
> @@ -318,19 +316,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
>   	if (ret)
>   		exit_with_error(-ret);
>   
> -	ret = xsk_ring_prod__reserve(&xsk->umem->fq,
> -				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
> -				     &idx);
> -	if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
> -		exit_with_error(-ret);
> -	for (i = 0;
> -	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> -		     XSK_UMEM__DEFAULT_FRAME_SIZE;
> -	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> -		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
> -	xsk_ring_prod__submit(&xsk->umem->fq,
> -			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
> -
>   	return xsk;
>   }
>   
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 38667b62f1fe..57dda1389870 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -529,7 +529,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
>   	struct xdp_mmap_offsets off;
>   	struct xsk_socket *xsk;
>   	socklen_t optlen;
> -	int err;
> +	int err, i;
> +	u32 idx;
>   
>   	if (!umem || !xsk_ptr || !rx || !tx)
>   		return -EFAULT;
> @@ -632,6 +633,22 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
>   	}
>   	xsk->tx = tx;
>   
> +	err = xsk_ring_prod__reserve(umem->fill,
> +				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
> +				     &idx);
> +	if (err != XSK_RING_PROD__DEFAULT_NUM_DESCS) {
> +		err = -errno;
> +		goto out_mmap_tx;
> +	}
> +
> +	for (i = 0;
> +	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> +		     XSK_UMEM__DEFAULT_FRAME_SIZE;
> +	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> +		*xsk_ring_prod__fill_addr(umem->fill, idx++) = i;
> +	xsk_ring_prod__submit(umem->fill,
> +			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
> +

Here, entries are added to the umem fill ring regardless if Rx is being
used or not. For a Tx only setup, this is not what we want, right?

Thinking out loud here; Now libbpf is making the decision which umem
entries that are added to the fill ring. The sample application has this
(naive) scheme. I'm not sure that all applications would like that
policy. What do you think?

>   	sxdp.sxdp_family = PF_XDP;
>   	sxdp.sxdp_ifindex = xsk->ifindex;
>   	sxdp.sxdp_queue_id = xsk->queue_id;
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] libbpf: check for channels.max_{t,r}x in xsk_get_max_queues
  2019-06-03 13:19 ` [RFC PATCH bpf-next 2/4] libbpf: check for channels.max_{t,r}x in xsk_get_max_queues Maciej Fijalkowski
@ 2019-06-04  8:06   ` Björn Töpel
  2019-06-04 15:05     ` Maciej Fijalkowski
  0 siblings, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2019-06-04  8:06 UTC (permalink / raw)
  To: Maciej Fijalkowski, magnus.karlsson, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving, bpf

On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> When it comes down to ethtool's get channels API, various drivers are
> reporting the queue count in two ways - they are setting max_combined or
> max_tx/max_rx fields. When creating the eBPF maps for xsk socket, this
> API is used so that we have an entries in maps per each queue.
> In case where driver (mlx4, ice) reports queues in max_tx/max_rx, we end
> up with eBPF maps with single entries, so it's not possible to attach an
> AF_XDP socket onto queue other than 0 - xsk_set_bpf_maps() would try to
> call bpf_map_update_elem() with key set to xsk->queue_id.
> 
> To fix this, let's look for channels.max_{t,r}x as well in
> xsk_get_max_queues.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>   tools/lib/bpf/xsk.c | 18 ++++++++++--------
>   1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 57dda1389870..514ab3fb06f4 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -339,21 +339,23 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
>   	ifr.ifr_data = (void *)&channels;
>   	strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ);
>   	err = ioctl(fd, SIOCETHTOOL, &ifr);
> -	if (err && errno != EOPNOTSUPP) {
> -		ret = -errno;
> -		goto out;
> -	}
> +	close(fd);
> +
> +	if (err && errno != EOPNOTSUPP)
> +		return -errno;
>   
> -	if (channels.max_combined == 0 || errno == EOPNOTSUPP)
> +	if (channels.max_combined)
> +		ret = channels.max_combined;
> +	else if (channels.max_rx && channels.max_tx)
> +		ret = min(channels.max_rx, channels.max_tx);

Hmm, do we really need to look at max_tx? For each Rx, there's (usually)
an XDP ring.

OTOH, when AF_XDP ZC is not implemented, it uses the skb path...

> +	else if (channels.max_combined == 0 || errno == EOPNOTSUPP)
>   		/* If the device says it has no channels, then all traffic
>   		 * is sent to a single stream, so max queues = 1.
>   		 */
>   		ret = 1;
>   	else
> -		ret = channels.max_combined;
> +		ret = -1;
>   
> -out:
> -	close(fd);
>   	return ret;
>   }
>   
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 3/4] libbpf: move xdp program removal to libbpf
  2019-06-03 13:19 ` [RFC PATCH bpf-next 3/4] libbpf: move xdp program removal to libbpf Maciej Fijalkowski
@ 2019-06-04  8:07   ` Björn Töpel
  2019-06-04 15:06     ` Maciej Fijalkowski
  0 siblings, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2019-06-04  8:07 UTC (permalink / raw)
  To: Maciej Fijalkowski, magnus.karlsson, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving, bpf


On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> Since xsk support in libbpf loads the xdp program interface, make it
> also responsible for its removal. Store the prog id in xsk_socket_config
> so when removing the program we are still able to compare the current
> program id with the id from the attachment time and make a decision
> onward.
> 
> While at it, remove the socket/umem in xdpsock's error path.
>

We're loading a new, or reusing an existing XDP program at socket
creation, but tearing it down at *socket delete* is explicitly left to
the application.

For a per-queue XDP program (tied to the socket), this kind cleanup would
make sense.

The intention with the libbpf AF_XDP support was to leave the XDP
handling to whatever XDP orchestration process availble. It's not part
of libbpf. For convenience, *loading/lookup of the XDP program* was
added even though this was an asymmetry.

For the sample application, this makes sense, but for larger/real
applications?

OTOH I like the idea of a scoped cleanup "when all sockets are gone",
the XDP program + maps are removed.

> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>   samples/bpf/xdpsock_user.c | 33 ++++++++++-----------------------
>   tools/lib/bpf/xsk.c        | 32 ++++++++++++++++++++++++++++++++
>   tools/lib/bpf/xsk.h        |  1 +
>   3 files changed, 43 insertions(+), 23 deletions(-)
> 
> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> index e9dceb09b6d1..123862b16dd4 100644
> --- a/samples/bpf/xdpsock_user.c
> +++ b/samples/bpf/xdpsock_user.c
> @@ -68,7 +68,6 @@ static int opt_queue;
>   static int opt_poll;
>   static int opt_interval = 1;
>   static u32 opt_xdp_bind_flags;
> -static __u32 prog_id;
>   
>   struct xsk_umem_info {
>   	struct xsk_ring_prod fq;
> @@ -170,22 +169,6 @@ static void *poller(void *arg)
>   	return NULL;
>   }
>   
> -static void remove_xdp_program(void)
> -{
> -	__u32 curr_prog_id = 0;
> -
> -	if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) {
> -		printf("bpf_get_link_xdp_id failed\n");
> -		exit(EXIT_FAILURE);
> -	}
> -	if (prog_id == curr_prog_id)
> -		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> -	else if (!curr_prog_id)
> -		printf("couldn't find a prog id on a given interface\n");
> -	else
> -		printf("program on interface changed, not removing\n");
> -}
> -
>   static void int_exit(int sig)
>   {
>   	struct xsk_umem *umem = xsks[0]->umem->umem;
> @@ -195,7 +178,6 @@ static void int_exit(int sig)
>   	dump_stats();
>   	xsk_socket__delete(xsks[0]->xsk);
>   	(void)xsk_umem__delete(umem);
> -	remove_xdp_program();
>   
>   	exit(EXIT_SUCCESS);
>   }
> @@ -206,7 +188,16 @@ static void __exit_with_error(int error, const char *file, const char *func,
>   	fprintf(stderr, "%s:%s:%i: errno: %d/\"%s\"\n", file, func,
>   		line, error, strerror(error));
>   	dump_stats();
> -	remove_xdp_program();
> +
> +	if (xsks[0]->xsk)
> +		xsk_socket__delete(xsks[0]->xsk);
> +
> +	if (xsks[0]->umem) {
> +		struct xsk_umem *umem = xsks[0]->umem->umem;
> +
> +		(void)xsk_umem__delete(umem);
> +	}
> +
>   	exit(EXIT_FAILURE);
>   }
>   
> @@ -312,10 +303,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
>   	if (ret)
>   		exit_with_error(-ret);
>   
> -	ret = bpf_get_link_xdp_id(opt_ifindex, &prog_id, opt_xdp_flags);
> -	if (ret)
> -		exit_with_error(-ret);
> -
>   	return xsk;
>   }
>   
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 514ab3fb06f4..e28bedb0b078 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -259,6 +259,8 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
>   static int xsk_load_xdp_prog(struct xsk_socket *xsk)
>   {
>   	static const int log_buf_size = 16 * 1024;
> +	struct bpf_prog_info info = {};
> +	__u32 info_len = sizeof(info);
>   	char log_buf[log_buf_size];
>   	int err, prog_fd;
>   
> @@ -321,6 +323,14 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
>   		return err;
>   	}
>   
> +	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
> +	if (err) {
> +		pr_warning("can't get prog info - %s\n", strerror(errno));
> +		close(prog_fd);
> +		return err;
> +	}
> +	xsk->config.prog_id = info.id;
> +
>   	xsk->prog_fd = prog_fd;
>   	return 0;
>   }
> @@ -483,6 +493,25 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
>   	return err;
>   }
>   
> +static void xsk_remove_xdp_prog(struct xsk_socket *xsk)
> +{
> +	__u32 prog_id = xsk->config.prog_id;
> +	__u32 curr_prog_id = 0;
> +	int err;
> +
> +	err = bpf_get_link_xdp_id(xsk->ifindex, &curr_prog_id,
> +				  xsk->config.xdp_flags);
> +	if (err)
> +		return;
> +
> +	if (prog_id == curr_prog_id)
> +		bpf_set_link_xdp_fd(xsk->ifindex, -1, xsk->config.xdp_flags);
> +	else if (!curr_prog_id)
> +		pr_warning("couldn't find a prog id on a given interface\n");
> +	else
> +		pr_warning("program on interface changed, not removing\n");
> +}
> +
>   static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
>   {
>   	__u32 prog_id = 0;
> @@ -506,6 +535,7 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
>   		err = xsk_lookup_bpf_maps(xsk);
>   		if (err)
>   			goto out_load;
> +		xsk->config.prog_id = prog_id;
>   	}
>   
>   	err = xsk_set_bpf_maps(xsk);
> @@ -744,6 +774,8 @@ void xsk_socket__delete(struct xsk_socket *xsk)
>   
>   	}
>   
> +	xsk_remove_xdp_prog(xsk);
> +
>   	xsk->umem->refcount--;
>   	/* Do not close an fd that also has an associated umem connected
>   	 * to it.
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 82ea71a0f3ec..e1b23e9432c9 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -186,6 +186,7 @@ struct xsk_socket_config {
>   	__u32 tx_size;
>   	__u32 libbpf_flags;
>   	__u32 xdp_flags;
> +	__u32 prog_id;
>   	__u16 bind_flags;
>   };
>   
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present
  2019-06-03 13:19 ` [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present Maciej Fijalkowski
  2019-06-03 18:26   ` Jonathan Lemon
@ 2019-06-04  8:08   ` Björn Töpel
  2019-06-04 15:07     ` Maciej Fijalkowski
  1 sibling, 1 reply; 18+ messages in thread
From: Björn Töpel @ 2019-06-04  8:08 UTC (permalink / raw)
  To: Maciej Fijalkowski, magnus.karlsson, netdev
  Cc: ast, daniel, jakub.kicinski, jonathan.lemon, songliubraving, bpf

On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> In case where multiple xsk sockets are attached to a single interface
> and one of them gets detached, the eBPF maps and program are removed.
> This should not happen as the rest of xsksocks are still using these
> resources.
> 
> In order to fix that, let's have an additional eBPF map with a single
> entry that will be used as a xsks count. During the xsk_socket__delete,
> remove the resources only when this count is equal to 0.  This map is
> not being accessed from eBPF program, so the verifier is not associating
> it with the prog, which in turn makes bpf_obj_get_info_by_fd not
> reporting this map in nr_map_ids field of struct bpf_prog_info. The
> described behaviour brings the need to have this map pinned, so in
> case when socket is being created and the libbpf detects the presence of
> bpf resources, it will be able to access that map.
>

This commit is only needed after #3 is applied, right? So, this is a way 
of refcounting XDP socks?


> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>   tools/lib/bpf/xsk.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 51 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index e28bedb0b078..88d2c931ad14 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -44,6 +44,8 @@
>    #define PF_XDP AF_XDP
>   #endif
>   
> +#define XSKS_CNT_MAP_PATH "/sys/fs/bpf/xsks_cnt_map"
> +
>   struct xsk_umem {
>   	struct xsk_ring_prod *fill;
>   	struct xsk_ring_cons *comp;
> @@ -65,6 +67,7 @@ struct xsk_socket {
>   	int prog_fd;
>   	int qidconf_map_fd;
>   	int xsks_map_fd;
> +	int xsks_cnt_map_fd;
>   	__u32 queue_id;
>   	char ifname[IFNAMSIZ];
>   };
> @@ -372,7 +375,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
>   static int xsk_create_bpf_maps(struct xsk_socket *xsk)
>   {
>   	int max_queues;
> -	int fd;
> +	int fd, ret;
>   
>   	max_queues = xsk_get_max_queues(xsk);
>   	if (max_queues < 0)
> @@ -392,6 +395,24 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk)
>   	}
>   	xsk->xsks_map_fd = fd;
>   
> +	fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "xsks_cnt_map",
> +				 sizeof(int), sizeof(int), 1, 0);
> +	if (fd < 0) {
> +		close(xsk->qidconf_map_fd);
> +		close(xsk->xsks_map_fd);
> +		return fd;
> +	}
> +
> +	ret = bpf_obj_pin(fd, XSKS_CNT_MAP_PATH);
> +	if (ret < 0) {
> +		pr_warning("pinning map failed; is bpffs mounted?\n");
> +		close(xsk->qidconf_map_fd);
> +		close(xsk->xsks_map_fd);
> +		close(fd);
> +		return ret;
> +	}
> +	xsk->xsks_cnt_map_fd = fd;
> +
>   	return 0;
>   }
>   
> @@ -456,8 +477,10 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
>   		close(fd);
>   	}
>   
> +	xsk->xsks_cnt_map_fd = bpf_obj_get(XSKS_CNT_MAP_PATH);
>   	err = 0;
> -	if (xsk->qidconf_map_fd < 0 || xsk->xsks_map_fd < 0) {
> +	if (xsk->qidconf_map_fd < 0 || xsk->xsks_map_fd < 0 ||
> +	    xsk->xsks_cnt_map_fd < 0) {
>   		err = -ENOENT;
>   		xsk_delete_bpf_maps(xsk);
>   	}
> @@ -467,17 +490,25 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
>   	return err;
>   }
>   
> -static void xsk_clear_bpf_maps(struct xsk_socket *xsk)
> +static void xsk_clear_bpf_maps(struct xsk_socket *xsk, long *xsks_cnt_ptr)
>   {
> +	long xsks_cnt, key = 0;
>   	int qid = false;
>   
>   	bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
>   	bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id);
> +	bpf_map_lookup_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt);
> +	if (xsks_cnt)
> +		xsks_cnt--;
> +	bpf_map_update_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt, 0);
> +	if (xsks_cnt_ptr)
> +		*xsks_cnt_ptr = xsks_cnt;

This refcount scheme will not work; There's no synchronization between 
the updates (cross process)!

>   }
>   
>   static int xsk_set_bpf_maps(struct xsk_socket *xsk)
>   {
>   	int qid = true, fd = xsk->fd, err;
> +	long xsks_cnt, key = 0;
>   
>   	err = bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
>   	if (err)
> @@ -487,9 +518,18 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
>   	if (err)
>   		goto out;
>   
> +	err = bpf_map_lookup_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt);
> +	if (err)
> +		goto out;
> +
> +	xsks_cnt++;
> +	err = bpf_map_update_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt, 0);
> +	if (err)
> +		goto out;
> +

Dito.

>   	return 0;
>   out:
> -	xsk_clear_bpf_maps(xsk);
> +	xsk_clear_bpf_maps(xsk, NULL);
>   	return err;
>   }
>   
> @@ -752,13 +792,18 @@ void xsk_socket__delete(struct xsk_socket *xsk)
>   	size_t desc_sz = sizeof(struct xdp_desc);
>   	struct xdp_mmap_offsets off;
>   	socklen_t optlen;
> +	long xsks_cnt;
>   	int err;
>   
>   	if (!xsk)
>   		return;
>   
> -	xsk_clear_bpf_maps(xsk);
> -	xsk_delete_bpf_maps(xsk);
> +	xsk_clear_bpf_maps(xsk, &xsks_cnt);
> +	unlink(XSKS_CNT_MAP_PATH);
> +	if (!xsks_cnt) {
> +		xsk_delete_bpf_maps(xsk);
> +		xsk_remove_xdp_prog(xsk);
> +	}
>   
>   	optlen = sizeof(off);
>   	err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
> @@ -774,8 +819,6 @@ void xsk_socket__delete(struct xsk_socket *xsk)
>   
>   	}
>   
> -	xsk_remove_xdp_prog(xsk);
> -
>   	xsk->umem->refcount--;
>   	/* Do not close an fd that also has an associated umem connected
>   	 * to it.
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call
  2019-06-04  8:06   ` Björn Töpel
@ 2019-06-04 15:04     ` Maciej Fijalkowski
  2019-06-04 15:54       ` Jonathan Lemon
  2019-06-05  9:00       ` Björn Töpel
  0 siblings, 2 replies; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-04 15:04 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, netdev, ast, daniel, jakub.kicinski,
	jonathan.lemon, songliubraving, bpf

On Tue, 4 Jun 2019 10:06:36 +0200
Björn Töpel <bjorn.topel@intel.com> wrote:

> On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> > Let's get into the driver via ndo_bpf with command set to XDP_SETUP_UMEM
> > with fill queue that already contains some available entries that can be
> > used by Rx driver rings. Things worked in such way on old version of
> > xdpsock (that lacked libbpf support) and there's no particular reason
> > for having this preparation done after bind().
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
> > ---
> >   samples/bpf/xdpsock_user.c | 15 ---------------
> >   tools/lib/bpf/xsk.c        | 19 ++++++++++++++++++-
> >   2 files changed, 18 insertions(+), 16 deletions(-)
> > 
> > diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> > index d08ee1ab7bb4..e9dceb09b6d1 100644
> > --- a/samples/bpf/xdpsock_user.c
> > +++ b/samples/bpf/xdpsock_user.c
> > @@ -296,8 +296,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
> >   	struct xsk_socket_config cfg;
> >   	struct xsk_socket_info *xsk;
> >   	int ret;
> > -	u32 idx;
> > -	int i;
> >   
> >   	xsk = calloc(1, sizeof(*xsk));
> >   	if (!xsk)
> > @@ -318,19 +316,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
> >   	if (ret)
> >   		exit_with_error(-ret);
> >   
> > -	ret = xsk_ring_prod__reserve(&xsk->umem->fq,
> > -				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
> > -				     &idx);
> > -	if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
> > -		exit_with_error(-ret);
> > -	for (i = 0;
> > -	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> > -		     XSK_UMEM__DEFAULT_FRAME_SIZE;
> > -	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> > -		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
> > -	xsk_ring_prod__submit(&xsk->umem->fq,
> > -			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
> > -
> >   	return xsk;
> >   }
> >   
> > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> > index 38667b62f1fe..57dda1389870 100644
> > --- a/tools/lib/bpf/xsk.c
> > +++ b/tools/lib/bpf/xsk.c
> > @@ -529,7 +529,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> >   	struct xdp_mmap_offsets off;
> >   	struct xsk_socket *xsk;
> >   	socklen_t optlen;
> > -	int err;
> > +	int err, i;
> > +	u32 idx;
> >   
> >   	if (!umem || !xsk_ptr || !rx || !tx)
> >   		return -EFAULT;
> > @@ -632,6 +633,22 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> >   	}
> >   	xsk->tx = tx;
> >   
> > +	err = xsk_ring_prod__reserve(umem->fill,
> > +				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
> > +				     &idx);
> > +	if (err != XSK_RING_PROD__DEFAULT_NUM_DESCS) {
> > +		err = -errno;
> > +		goto out_mmap_tx;
> > +	}
> > +
> > +	for (i = 0;
> > +	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> > +		     XSK_UMEM__DEFAULT_FRAME_SIZE;
> > +	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> > +		*xsk_ring_prod__fill_addr(umem->fill, idx++) = i;
> > +	xsk_ring_prod__submit(umem->fill,
> > +			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
> > +
> 
> Here, entries are added to the umem fill ring regardless if Rx is being
> used or not. For a Tx only setup, this is not what we want, right?

Right, but we have such behavior even without this patch. So I see two options
here:
- if you agree with this patch, then I guess we would need to pass the info to
  libbpf what exactly we are setting up (txonly, rxdrop, l2fwd)?
- otherwise, we should be passing the opt_bench onto xsk_configure_socket and
  based on that decide whether we fill the fq or not?

> 
> Thinking out loud here; Now libbpf is making the decision which umem
> entries that are added to the fill ring. The sample application has this
> (naive) scheme. I'm not sure that all applications would like that
> policy. What do you think?
>

I find it convenient to have the fill queue in "initialized" state if I am
making use of it, especially in case when I am doing the ZC so I must give the
buffers to the driver via fill queue. So why would we bother other applications
to provide it? I must admit that I haven't used AF_XDP with other apps than the
example one, so I might not be able to elaborate further. Maybe other people
have different feelings about it.

> >   	sxdp.sxdp_family = PF_XDP;
> >   	sxdp.sxdp_ifindex = xsk->ifindex;
> >   	sxdp.sxdp_queue_id = xsk->queue_id;
> > 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 2/4] libbpf: check for channels.max_{t,r}x in xsk_get_max_queues
  2019-06-04  8:06   ` Björn Töpel
@ 2019-06-04 15:05     ` Maciej Fijalkowski
  0 siblings, 0 replies; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-04 15:05 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, netdev, ast, daniel, jakub.kicinski,
	jonathan.lemon, songliubraving, bpf

On Tue, 4 Jun 2019 10:06:57 +0200
Björn Töpel <bjorn.topel@intel.com> wrote:

> On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> > When it comes down to ethtool's get channels API, various drivers are
> > reporting the queue count in two ways - they are setting max_combined or
> > max_tx/max_rx fields. When creating the eBPF maps for xsk socket, this
> > API is used so that we have an entries in maps per each queue.
> > In case where driver (mlx4, ice) reports queues in max_tx/max_rx, we end
> > up with eBPF maps with single entries, so it's not possible to attach an
> > AF_XDP socket onto queue other than 0 - xsk_set_bpf_maps() would try to
> > call bpf_map_update_elem() with key set to xsk->queue_id.
> > 
> > To fix this, let's look for channels.max_{t,r}x as well in
> > xsk_get_max_queues.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >   tools/lib/bpf/xsk.c | 18 ++++++++++--------
> >   1 file changed, 10 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> > index 57dda1389870..514ab3fb06f4 100644
> > --- a/tools/lib/bpf/xsk.c
> > +++ b/tools/lib/bpf/xsk.c
> > @@ -339,21 +339,23 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
> >   	ifr.ifr_data = (void *)&channels;
> >   	strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ);
> >   	err = ioctl(fd, SIOCETHTOOL, &ifr);
> > -	if (err && errno != EOPNOTSUPP) {
> > -		ret = -errno;
> > -		goto out;
> > -	}
> > +	close(fd);
> > +
> > +	if (err && errno != EOPNOTSUPP)
> > +		return -errno;
> >   
> > -	if (channels.max_combined == 0 || errno == EOPNOTSUPP)
> > +	if (channels.max_combined)
> > +		ret = channels.max_combined;
> > +	else if (channels.max_rx && channels.max_tx)
> > +		ret = min(channels.max_rx, channels.max_tx);
> 
> Hmm, do we really need to look at max_tx? For each Rx, there's (usually)
> an XDP ring.

Probably we would be good to go with only max_rx, but in drivers during the
umem setup we also are comparing the queue id provided by user against the num
tx queues...so in theory, we could allocate the max_rx entries, but if the
current txq count is lower than reported max_rx, a bunch of map entries would
never be used, no?

>
> OTOH, when AF_XDP ZC is not implemented, it uses the skb path...
> 
> > +	else if (channels.max_combined == 0 || errno == EOPNOTSUPP)
> >   		/* If the device says it has no channels, then all traffic
> >   		 * is sent to a single stream, so max queues = 1.
> >   		 */
> >   		ret = 1;
> >   	else
> > -		ret = channels.max_combined;
> > +		ret = -1;
> >   
> > -out:
> > -	close(fd);
> >   	return ret;
> >   }
> >   
> > 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 3/4] libbpf: move xdp program removal to libbpf
  2019-06-04  8:07   ` Björn Töpel
@ 2019-06-04 15:06     ` Maciej Fijalkowski
  2019-06-05  9:03       ` Björn Töpel
  0 siblings, 1 reply; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-04 15:06 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, netdev, ast, daniel, jakub.kicinski,
	jonathan.lemon, songliubraving, bpf

On Tue, 4 Jun 2019 10:07:25 +0200
Björn Töpel <bjorn.topel@intel.com> wrote:

> 
> On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> > Since xsk support in libbpf loads the xdp program interface, make it
> > also responsible for its removal. Store the prog id in xsk_socket_config
> > so when removing the program we are still able to compare the current
> > program id with the id from the attachment time and make a decision
> > onward.
> > 
> > While at it, remove the socket/umem in xdpsock's error path.
> >
> 
> We're loading a new, or reusing an existing XDP program at socket
> creation, but tearing it down at *socket delete* is explicitly left to
> the application.

Are you describing here the old behavior?

> 
> For a per-queue XDP program (tied to the socket), this kind cleanup would
> make sense.
> 
> The intention with the libbpf AF_XDP support was to leave the XDP
> handling to whatever XDP orchestration process availble. It's not part
> of libbpf. For convenience, *loading/lookup of the XDP program* was
> added even though this was an asymmetry.

Hmmm ok and I tried to make it symmetric :p 

> 
> For the sample application, this makes sense, but for larger/real
> applications?
>

Tough questions on those real apps!


> OTOH I like the idea of a scoped cleanup "when all sockets are gone",
> the XDP program + maps are removed.

That's happening with patch 4 included from this set (in case it gets fixed :))

> 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >   samples/bpf/xdpsock_user.c | 33 ++++++++++-----------------------
> >   tools/lib/bpf/xsk.c        | 32 ++++++++++++++++++++++++++++++++
> >   tools/lib/bpf/xsk.h        |  1 +
> >   3 files changed, 43 insertions(+), 23 deletions(-)
> > 
> > diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> > index e9dceb09b6d1..123862b16dd4 100644
> > --- a/samples/bpf/xdpsock_user.c
> > +++ b/samples/bpf/xdpsock_user.c
> > @@ -68,7 +68,6 @@ static int opt_queue;
> >   static int opt_poll;
> >   static int opt_interval = 1;
> >   static u32 opt_xdp_bind_flags;
> > -static __u32 prog_id;
> >   
> >   struct xsk_umem_info {
> >   	struct xsk_ring_prod fq;
> > @@ -170,22 +169,6 @@ static void *poller(void *arg)
> >   	return NULL;
> >   }
> >   
> > -static void remove_xdp_program(void)
> > -{
> > -	__u32 curr_prog_id = 0;
> > -
> > -	if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) {
> > -		printf("bpf_get_link_xdp_id failed\n");
> > -		exit(EXIT_FAILURE);
> > -	}
> > -	if (prog_id == curr_prog_id)
> > -		bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> > -	else if (!curr_prog_id)
> > -		printf("couldn't find a prog id on a given interface\n");
> > -	else
> > -		printf("program on interface changed, not removing\n");
> > -}
> > -
> >   static void int_exit(int sig)
> >   {
> >   	struct xsk_umem *umem = xsks[0]->umem->umem;
> > @@ -195,7 +178,6 @@ static void int_exit(int sig)
> >   	dump_stats();
> >   	xsk_socket__delete(xsks[0]->xsk);
> >   	(void)xsk_umem__delete(umem);
> > -	remove_xdp_program();
> >   
> >   	exit(EXIT_SUCCESS);
> >   }
> > @@ -206,7 +188,16 @@ static void __exit_with_error(int error, const char *file, const char *func,
> >   	fprintf(stderr, "%s:%s:%i: errno: %d/\"%s\"\n", file, func,
> >   		line, error, strerror(error));
> >   	dump_stats();
> > -	remove_xdp_program();
> > +
> > +	if (xsks[0]->xsk)
> > +		xsk_socket__delete(xsks[0]->xsk);
> > +
> > +	if (xsks[0]->umem) {
> > +		struct xsk_umem *umem = xsks[0]->umem->umem;
> > +
> > +		(void)xsk_umem__delete(umem);
> > +	}
> > +
> >   	exit(EXIT_FAILURE);
> >   }
> >   
> > @@ -312,10 +303,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
> >   	if (ret)
> >   		exit_with_error(-ret);
> >   
> > -	ret = bpf_get_link_xdp_id(opt_ifindex, &prog_id, opt_xdp_flags);
> > -	if (ret)
> > -		exit_with_error(-ret);
> > -
> >   	return xsk;
> >   }
> >   
> > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> > index 514ab3fb06f4..e28bedb0b078 100644
> > --- a/tools/lib/bpf/xsk.c
> > +++ b/tools/lib/bpf/xsk.c
> > @@ -259,6 +259,8 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
> >   static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> >   {
> >   	static const int log_buf_size = 16 * 1024;
> > +	struct bpf_prog_info info = {};
> > +	__u32 info_len = sizeof(info);
> >   	char log_buf[log_buf_size];
> >   	int err, prog_fd;
> >   
> > @@ -321,6 +323,14 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> >   		return err;
> >   	}
> >   
> > +	err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
> > +	if (err) {
> > +		pr_warning("can't get prog info - %s\n", strerror(errno));
> > +		close(prog_fd);
> > +		return err;
> > +	}
> > +	xsk->config.prog_id = info.id;
> > +
> >   	xsk->prog_fd = prog_fd;
> >   	return 0;
> >   }
> > @@ -483,6 +493,25 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
> >   	return err;
> >   }
> >   
> > +static void xsk_remove_xdp_prog(struct xsk_socket *xsk)
> > +{
> > +	__u32 prog_id = xsk->config.prog_id;
> > +	__u32 curr_prog_id = 0;
> > +	int err;
> > +
> > +	err = bpf_get_link_xdp_id(xsk->ifindex, &curr_prog_id,
> > +				  xsk->config.xdp_flags);
> > +	if (err)
> > +		return;
> > +
> > +	if (prog_id == curr_prog_id)
> > +		bpf_set_link_xdp_fd(xsk->ifindex, -1, xsk->config.xdp_flags);
> > +	else if (!curr_prog_id)
> > +		pr_warning("couldn't find a prog id on a given interface\n");
> > +	else
> > +		pr_warning("program on interface changed, not removing\n");
> > +}
> > +
> >   static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> >   {
> >   	__u32 prog_id = 0;
> > @@ -506,6 +535,7 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> >   		err = xsk_lookup_bpf_maps(xsk);
> >   		if (err)
> >   			goto out_load;
> > +		xsk->config.prog_id = prog_id;
> >   	}
> >   
> >   	err = xsk_set_bpf_maps(xsk);
> > @@ -744,6 +774,8 @@ void xsk_socket__delete(struct xsk_socket *xsk)
> >   
> >   	}
> >   
> > +	xsk_remove_xdp_prog(xsk);
> > +
> >   	xsk->umem->refcount--;
> >   	/* Do not close an fd that also has an associated umem connected
> >   	 * to it.
> > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > index 82ea71a0f3ec..e1b23e9432c9 100644
> > --- a/tools/lib/bpf/xsk.h
> > +++ b/tools/lib/bpf/xsk.h
> > @@ -186,6 +186,7 @@ struct xsk_socket_config {
> >   	__u32 tx_size;
> >   	__u32 libbpf_flags;
> >   	__u32 xdp_flags;
> > +	__u32 prog_id;
> >   	__u16 bind_flags;
> >   };
> >   
> > 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present
  2019-06-04  8:08   ` Björn Töpel
@ 2019-06-04 15:07     ` Maciej Fijalkowski
  2019-06-05  9:26       ` Björn Töpel
  0 siblings, 1 reply; 18+ messages in thread
From: Maciej Fijalkowski @ 2019-06-04 15:07 UTC (permalink / raw)
  To: Björn Töpel
  Cc: magnus.karlsson, netdev, ast, daniel, jakub.kicinski,
	jonathan.lemon, songliubraving, bpf

On Tue, 4 Jun 2019 10:08:03 +0200
Björn Töpel <bjorn.topel@intel.com> wrote:

> On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> > In case where multiple xsk sockets are attached to a single interface
> > and one of them gets detached, the eBPF maps and program are removed.
> > This should not happen as the rest of xsksocks are still using these
> > resources.
> > 
> > In order to fix that, let's have an additional eBPF map with a single
> > entry that will be used as a xsks count. During the xsk_socket__delete,
> > remove the resources only when this count is equal to 0.  This map is
> > not being accessed from eBPF program, so the verifier is not associating
> > it with the prog, which in turn makes bpf_obj_get_info_by_fd not
> > reporting this map in nr_map_ids field of struct bpf_prog_info. The
> > described behaviour brings the need to have this map pinned, so in
> > case when socket is being created and the libbpf detects the presence of
> > bpf resources, it will be able to access that map.
> >
> 
> This commit is only needed after #3 is applied, right? So, this is a way 
> of refcounting XDP socks?

Yes, but as you pointed out it needs synchronization.

> 
> 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >   tools/lib/bpf/xsk.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------
> >   1 file changed, 51 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> > index e28bedb0b078..88d2c931ad14 100644
> > --- a/tools/lib/bpf/xsk.c
> > +++ b/tools/lib/bpf/xsk.c
> > @@ -44,6 +44,8 @@
> >    #define PF_XDP AF_XDP
> >   #endif
> >   
> > +#define XSKS_CNT_MAP_PATH "/sys/fs/bpf/xsks_cnt_map"
> > +
> >   struct xsk_umem {
> >   	struct xsk_ring_prod *fill;
> >   	struct xsk_ring_cons *comp;
> > @@ -65,6 +67,7 @@ struct xsk_socket {
> >   	int prog_fd;
> >   	int qidconf_map_fd;
> >   	int xsks_map_fd;
> > +	int xsks_cnt_map_fd;
> >   	__u32 queue_id;
> >   	char ifname[IFNAMSIZ];
> >   };
> > @@ -372,7 +375,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
> >   static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> >   {
> >   	int max_queues;
> > -	int fd;
> > +	int fd, ret;
> >   
> >   	max_queues = xsk_get_max_queues(xsk);
> >   	if (max_queues < 0)
> > @@ -392,6 +395,24 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> >   	}
> >   	xsk->xsks_map_fd = fd;
> >   
> > +	fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "xsks_cnt_map",
> > +				 sizeof(int), sizeof(int), 1, 0);
> > +	if (fd < 0) {
> > +		close(xsk->qidconf_map_fd);
> > +		close(xsk->xsks_map_fd);
> > +		return fd;
> > +	}
> > +
> > +	ret = bpf_obj_pin(fd, XSKS_CNT_MAP_PATH);
> > +	if (ret < 0) {
> > +		pr_warning("pinning map failed; is bpffs mounted?\n");
> > +		close(xsk->qidconf_map_fd);
> > +		close(xsk->xsks_map_fd);
> > +		close(fd);
> > +		return ret;
> > +	}
> > +	xsk->xsks_cnt_map_fd = fd;
> > +
> >   	return 0;
> >   }
> >   
> > @@ -456,8 +477,10 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
> >   		close(fd);
> >   	}
> >   
> > +	xsk->xsks_cnt_map_fd = bpf_obj_get(XSKS_CNT_MAP_PATH);
> >   	err = 0;
> > -	if (xsk->qidconf_map_fd < 0 || xsk->xsks_map_fd < 0) {
> > +	if (xsk->qidconf_map_fd < 0 || xsk->xsks_map_fd < 0 ||
> > +	    xsk->xsks_cnt_map_fd < 0) {
> >   		err = -ENOENT;
> >   		xsk_delete_bpf_maps(xsk);
> >   	}
> > @@ -467,17 +490,25 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
> >   	return err;
> >   }
> >   
> > -static void xsk_clear_bpf_maps(struct xsk_socket *xsk)
> > +static void xsk_clear_bpf_maps(struct xsk_socket *xsk, long *xsks_cnt_ptr)
> >   {
> > +	long xsks_cnt, key = 0;
> >   	int qid = false;
> >   
> >   	bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
> >   	bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id);
> > +	bpf_map_lookup_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt);
> > +	if (xsks_cnt)
> > +		xsks_cnt--;
> > +	bpf_map_update_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt, 0);
> > +	if (xsks_cnt_ptr)
> > +		*xsks_cnt_ptr = xsks_cnt;
> 
> This refcount scheme will not work; There's no synchronization between 
> the updates (cross process)!

Ugh, shame. Let me fix this in v2 :(

> 
> >   }
> >   
> >   static int xsk_set_bpf_maps(struct xsk_socket *xsk)
> >   {
> >   	int qid = true, fd = xsk->fd, err;
> > +	long xsks_cnt, key = 0;
> >   
> >   	err = bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
> >   	if (err)
> > @@ -487,9 +518,18 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
> >   	if (err)
> >   		goto out;
> >   
> > +	err = bpf_map_lookup_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt);
> > +	if (err)
> > +		goto out;
> > +
> > +	xsks_cnt++;
> > +	err = bpf_map_update_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt, 0);
> > +	if (err)
> > +		goto out;
> > +
> 
> Dito.
> 
> >   	return 0;
> >   out:
> > -	xsk_clear_bpf_maps(xsk);
> > +	xsk_clear_bpf_maps(xsk, NULL);
> >   	return err;
> >   }
> >   
> > @@ -752,13 +792,18 @@ void xsk_socket__delete(struct xsk_socket *xsk)
> >   	size_t desc_sz = sizeof(struct xdp_desc);
> >   	struct xdp_mmap_offsets off;
> >   	socklen_t optlen;
> > +	long xsks_cnt;
> >   	int err;
> >   
> >   	if (!xsk)
> >   		return;
> >   
> > -	xsk_clear_bpf_maps(xsk);
> > -	xsk_delete_bpf_maps(xsk);
> > +	xsk_clear_bpf_maps(xsk, &xsks_cnt);
> > +	unlink(XSKS_CNT_MAP_PATH);
> > +	if (!xsks_cnt) {
> > +		xsk_delete_bpf_maps(xsk);
> > +		xsk_remove_xdp_prog(xsk);
> > +	}
> >   
> >   	optlen = sizeof(off);
> >   	err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
> > @@ -774,8 +819,6 @@ void xsk_socket__delete(struct xsk_socket *xsk)
> >   
> >   	}
> >   
> > -	xsk_remove_xdp_prog(xsk);
> > -
> >   	xsk->umem->refcount--;
> >   	/* Do not close an fd that also has an associated umem connected
> >   	 * to it.
> > 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call
  2019-06-04 15:04     ` Maciej Fijalkowski
@ 2019-06-04 15:54       ` Jonathan Lemon
  2019-06-05  9:00       ` Björn Töpel
  1 sibling, 0 replies; 18+ messages in thread
From: Jonathan Lemon @ 2019-06-04 15:54 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Björn Töpel, magnus.karlsson, netdev, ast, daniel,
	jakub.kicinski, songliubraving, bpf

On 4 Jun 2019, at 8:04, Maciej Fijalkowski wrote:

> On Tue, 4 Jun 2019 10:06:36 +0200
> Björn Töpel <bjorn.topel@intel.com> wrote:
>
>> On 2019-06-03 15:19, Maciej Fijalkowski wrote:
>>> Let's get into the driver via ndo_bpf with command set to 
>>> XDP_SETUP_UMEM
>>> with fill queue that already contains some available entries that 
>>> can be
>>> used by Rx driver rings. Things worked in such way on old version of
>>> xdpsock (that lacked libbpf support) and there's no particular 
>>> reason
>>> for having this preparation done after bind().
>>>
>>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
>>> Signed-off-by: Krzysztof Kazimierczak 
>>> <krzysztof.kazimierczak@intel.com>
>>> ---
>>>   samples/bpf/xdpsock_user.c | 15 ---------------
>>>   tools/lib/bpf/xsk.c        | 19 ++++++++++++++++++-
>>>   2 files changed, 18 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
>>> index d08ee1ab7bb4..e9dceb09b6d1 100644
>>> --- a/samples/bpf/xdpsock_user.c
>>> +++ b/samples/bpf/xdpsock_user.c
>>> @@ -296,8 +296,6 @@ static struct xsk_socket_info 
>>> *xsk_configure_socket(struct xsk_umem_info *umem)
>>>   	struct xsk_socket_config cfg;
>>>   	struct xsk_socket_info *xsk;
>>>   	int ret;
>>> -	u32 idx;
>>> -	int i;
>>>
>>>   	xsk = calloc(1, sizeof(*xsk));
>>>   	if (!xsk)
>>> @@ -318,19 +316,6 @@ static struct xsk_socket_info 
>>> *xsk_configure_socket(struct xsk_umem_info *umem)
>>>   	if (ret)
>>>   		exit_with_error(-ret);
>>>
>>> -	ret = xsk_ring_prod__reserve(&xsk->umem->fq,
>>> -				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
>>> -				     &idx);
>>> -	if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
>>> -		exit_with_error(-ret);
>>> -	for (i = 0;
>>> -	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
>>> -		     XSK_UMEM__DEFAULT_FRAME_SIZE;
>>> -	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
>>> -		*xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
>>> -	xsk_ring_prod__submit(&xsk->umem->fq,
>>> -			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
>>> -
>>>   	return xsk;
>>>   }
>>>
>>> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
>>> index 38667b62f1fe..57dda1389870 100644
>>> --- a/tools/lib/bpf/xsk.c
>>> +++ b/tools/lib/bpf/xsk.c
>>> @@ -529,7 +529,8 @@ int xsk_socket__create(struct xsk_socket 
>>> **xsk_ptr, const char *ifname,
>>>   	struct xdp_mmap_offsets off;
>>>   	struct xsk_socket *xsk;
>>>   	socklen_t optlen;
>>> -	int err;
>>> +	int err, i;
>>> +	u32 idx;
>>>
>>>   	if (!umem || !xsk_ptr || !rx || !tx)
>>>   		return -EFAULT;
>>> @@ -632,6 +633,22 @@ int xsk_socket__create(struct xsk_socket 
>>> **xsk_ptr, const char *ifname,
>>>   	}
>>>   	xsk->tx = tx;
>>>
>>> +	err = xsk_ring_prod__reserve(umem->fill,
>>> +				     XSK_RING_PROD__DEFAULT_NUM_DESCS,
>>> +				     &idx);
>>> +	if (err != XSK_RING_PROD__DEFAULT_NUM_DESCS) {
>>> +		err = -errno;
>>> +		goto out_mmap_tx;
>>> +	}
>>> +
>>> +	for (i = 0;
>>> +	     i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
>>> +		     XSK_UMEM__DEFAULT_FRAME_SIZE;
>>> +	     i += XSK_UMEM__DEFAULT_FRAME_SIZE)
>>> +		*xsk_ring_prod__fill_addr(umem->fill, idx++) = i;
>>> +	xsk_ring_prod__submit(umem->fill,
>>> +			      XSK_RING_PROD__DEFAULT_NUM_DESCS);
>>> +
>>
>> Here, entries are added to the umem fill ring regardless if Rx is 
>> being
>> used or not. For a Tx only setup, this is not what we want, right?
>
> Right, but we have such behavior even without this patch. So I see two 
> options
> here:
> - if you agree with this patch, then I guess we would need to pass the 
> info to
>   libbpf what exactly we are setting up (txonly, rxdrop, l2fwd)?
> - otherwise, we should be passing the opt_bench onto 
> xsk_configure_socket and
>   based on that decide whether we fill the fq or not?
>
>>
>> Thinking out loud here; Now libbpf is making the decision which umem
>> entries that are added to the fill ring. The sample application has 
>> this
>> (naive) scheme. I'm not sure that all applications would like that
>> policy. What do you think?
>>
>
> I find it convenient to have the fill queue in "initialized" state if 
> I am
> making use of it, especially in case when I am doing the ZC so I must 
> give the
> buffers to the driver via fill queue. So why would we bother other 
> applications
> to provide it? I must admit that I haven't used AF_XDP with other apps 
> than the
> example one, so I might not be able to elaborate further. Maybe other 
> people
> have different feelings about it.

I use the library for setting up all the ring bookkeeping, but use my 
own
application (and xdp program).  So I'd prefer not having the library do 
this -
as Björn notes, for some cases, the fill ring may not be populated.  I 
also
have some other use cases where I may not want to populate the fill ring 
until
later.


While convenient, I think there's a limit to what the library should be 
doing
for the user.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call
  2019-06-04 15:04     ` Maciej Fijalkowski
  2019-06-04 15:54       ` Jonathan Lemon
@ 2019-06-05  9:00       ` Björn Töpel
  1 sibling, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2019-06-05  9:00 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Björn Töpel, Karlsson, Magnus, Netdev,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jonathan Lemon, Song Liu, bpf

On Tue, 4 Jun 2019 at 17:06, Maciej Fijalkowski
<maciejromanfijalkowski@gmail.com> wrote:
>
> On Tue, 4 Jun 2019 10:06:36 +0200
> Björn Töpel <bjorn.topel@intel.com> wrote:
>
> > On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> > > Let's get into the driver via ndo_bpf with command set to XDP_SETUP_UMEM
> > > with fill queue that already contains some available entries that can be
> > > used by Rx driver rings. Things worked in such way on old version of
> > > xdpsock (that lacked libbpf support) and there's no particular reason
> > > for having this preparation done after bind().
> > >
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
> > > ---
> > >   samples/bpf/xdpsock_user.c | 15 ---------------
> > >   tools/lib/bpf/xsk.c        | 19 ++++++++++++++++++-
> > >   2 files changed, 18 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> > > index d08ee1ab7bb4..e9dceb09b6d1 100644
> > > --- a/samples/bpf/xdpsock_user.c
> > > +++ b/samples/bpf/xdpsock_user.c
> > > @@ -296,8 +296,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
> > >     struct xsk_socket_config cfg;
> > >     struct xsk_socket_info *xsk;
> > >     int ret;
> > > -   u32 idx;
> > > -   int i;
> > >
> > >     xsk = calloc(1, sizeof(*xsk));
> > >     if (!xsk)
> > > @@ -318,19 +316,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
> > >     if (ret)
> > >             exit_with_error(-ret);
> > >
> > > -   ret = xsk_ring_prod__reserve(&xsk->umem->fq,
> > > -                                XSK_RING_PROD__DEFAULT_NUM_DESCS,
> > > -                                &idx);
> > > -   if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
> > > -           exit_with_error(-ret);
> > > -   for (i = 0;
> > > -        i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> > > -                XSK_UMEM__DEFAULT_FRAME_SIZE;
> > > -        i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> > > -           *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = i;
> > > -   xsk_ring_prod__submit(&xsk->umem->fq,
> > > -                         XSK_RING_PROD__DEFAULT_NUM_DESCS);
> > > -
> > >     return xsk;
> > >   }
> > >
> > > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> > > index 38667b62f1fe..57dda1389870 100644
> > > --- a/tools/lib/bpf/xsk.c
> > > +++ b/tools/lib/bpf/xsk.c
> > > @@ -529,7 +529,8 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> > >     struct xdp_mmap_offsets off;
> > >     struct xsk_socket *xsk;
> > >     socklen_t optlen;
> > > -   int err;
> > > +   int err, i;
> > > +   u32 idx;
> > >
> > >     if (!umem || !xsk_ptr || !rx || !tx)
> > >             return -EFAULT;
> > > @@ -632,6 +633,22 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
> > >     }
> > >     xsk->tx = tx;
> > >
> > > +   err = xsk_ring_prod__reserve(umem->fill,
> > > +                                XSK_RING_PROD__DEFAULT_NUM_DESCS,
> > > +                                &idx);
> > > +   if (err != XSK_RING_PROD__DEFAULT_NUM_DESCS) {
> > > +           err = -errno;
> > > +           goto out_mmap_tx;
> > > +   }
> > > +
> > > +   for (i = 0;
> > > +        i < XSK_RING_PROD__DEFAULT_NUM_DESCS *
> > > +                XSK_UMEM__DEFAULT_FRAME_SIZE;
> > > +        i += XSK_UMEM__DEFAULT_FRAME_SIZE)
> > > +           *xsk_ring_prod__fill_addr(umem->fill, idx++) = i;
> > > +   xsk_ring_prod__submit(umem->fill,
> > > +                         XSK_RING_PROD__DEFAULT_NUM_DESCS);
> > > +
> >
> > Here, entries are added to the umem fill ring regardless if Rx is being
> > used or not. For a Tx only setup, this is not what we want, right?
>
> Right, but we have such behavior even without this patch. So I see two options
> here:
> - if you agree with this patch, then I guess we would need to pass the info to
>   libbpf what exactly we are setting up (txonly, rxdrop, l2fwd)?
> - otherwise, we should be passing the opt_bench onto xsk_configure_socket and
>   based on that decide whether we fill the fq or not?
>
> >
> > Thinking out loud here; Now libbpf is making the decision which umem
> > entries that are added to the fill ring. The sample application has this
> > (naive) scheme. I'm not sure that all applications would like that
> > policy. What do you think?
> >
>
> I find it convenient to have the fill queue in "initialized" state if I am
> making use of it, especially in case when I am doing the ZC so I must give the
> buffers to the driver via fill queue. So why would we bother other applications
> to provide it? I must admit that I haven't used AF_XDP with other apps than the
> example one, so I might not be able to elaborate further. Maybe other people
> have different feelings about it.
>

Personally, I think this scheme is not worth pursuing. I'd just leave
the fill ring work to the application. E.g. DPDK would definitely not
use a scheme like this.

Björn

> > >     sxdp.sxdp_family = PF_XDP;
> > >     sxdp.sxdp_ifindex = xsk->ifindex;
> > >     sxdp.sxdp_queue_id = xsk->queue_id;
> > >
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 3/4] libbpf: move xdp program removal to libbpf
  2019-06-04 15:06     ` Maciej Fijalkowski
@ 2019-06-05  9:03       ` Björn Töpel
  0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2019-06-05  9:03 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Björn Töpel, Karlsson, Magnus, Netdev,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jonathan Lemon, Song Liu, bpf

On Tue, 4 Jun 2019 at 17:07, Maciej Fijalkowski
<maciejromanfijalkowski@gmail.com> wrote:
>
> On Tue, 4 Jun 2019 10:07:25 +0200
> Björn Töpel <bjorn.topel@intel.com> wrote:
>
> >
> > On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> > > Since xsk support in libbpf loads the xdp program interface, make it
> > > also responsible for its removal. Store the prog id in xsk_socket_config
> > > so when removing the program we are still able to compare the current
> > > program id with the id from the attachment time and make a decision
> > > onward.
> > >
> > > While at it, remove the socket/umem in xdpsock's error path.
> > >
> >
> > We're loading a new, or reusing an existing XDP program at socket
> > creation, but tearing it down at *socket delete* is explicitly left to
> > the application.
>
> Are you describing here the old behavior?
>
> >
> > For a per-queue XDP program (tied to the socket), this kind cleanup would
> > make sense.
> >
> > The intention with the libbpf AF_XDP support was to leave the XDP
> > handling to whatever XDP orchestration process availble. It's not part
> > of libbpf. For convenience, *loading/lookup of the XDP program* was
> > added even though this was an asymmetry.
>
> Hmmm ok and I tried to make it symmetric :p
>

Thought a bit more about this, and I think you're right here. It
should be symmetric! Please continue this work! (But keep in mind that
it might go away if/once per-queue programs appear. :-P)

> >
> > For the sample application, this makes sense, but for larger/real
> > applications?
> >
>
> Tough questions on those real apps!
>

:-D

>
> > OTOH I like the idea of a scoped cleanup "when all sockets are gone",
> > the XDP program + maps are removed.
>
> That's happening with patch 4 included from this set (in case it gets fixed :))
>

Ok!

Björn

> >
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > ---
> > >   samples/bpf/xdpsock_user.c | 33 ++++++++++-----------------------
> > >   tools/lib/bpf/xsk.c        | 32 ++++++++++++++++++++++++++++++++
> > >   tools/lib/bpf/xsk.h        |  1 +
> > >   3 files changed, 43 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> > > index e9dceb09b6d1..123862b16dd4 100644
> > > --- a/samples/bpf/xdpsock_user.c
> > > +++ b/samples/bpf/xdpsock_user.c
> > > @@ -68,7 +68,6 @@ static int opt_queue;
> > >   static int opt_poll;
> > >   static int opt_interval = 1;
> > >   static u32 opt_xdp_bind_flags;
> > > -static __u32 prog_id;
> > >
> > >   struct xsk_umem_info {
> > >     struct xsk_ring_prod fq;
> > > @@ -170,22 +169,6 @@ static void *poller(void *arg)
> > >     return NULL;
> > >   }
> > >
> > > -static void remove_xdp_program(void)
> > > -{
> > > -   __u32 curr_prog_id = 0;
> > > -
> > > -   if (bpf_get_link_xdp_id(opt_ifindex, &curr_prog_id, opt_xdp_flags)) {
> > > -           printf("bpf_get_link_xdp_id failed\n");
> > > -           exit(EXIT_FAILURE);
> > > -   }
> > > -   if (prog_id == curr_prog_id)
> > > -           bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> > > -   else if (!curr_prog_id)
> > > -           printf("couldn't find a prog id on a given interface\n");
> > > -   else
> > > -           printf("program on interface changed, not removing\n");
> > > -}
> > > -
> > >   static void int_exit(int sig)
> > >   {
> > >     struct xsk_umem *umem = xsks[0]->umem->umem;
> > > @@ -195,7 +178,6 @@ static void int_exit(int sig)
> > >     dump_stats();
> > >     xsk_socket__delete(xsks[0]->xsk);
> > >     (void)xsk_umem__delete(umem);
> > > -   remove_xdp_program();
> > >
> > >     exit(EXIT_SUCCESS);
> > >   }
> > > @@ -206,7 +188,16 @@ static void __exit_with_error(int error, const char *file, const char *func,
> > >     fprintf(stderr, "%s:%s:%i: errno: %d/\"%s\"\n", file, func,
> > >             line, error, strerror(error));
> > >     dump_stats();
> > > -   remove_xdp_program();
> > > +
> > > +   if (xsks[0]->xsk)
> > > +           xsk_socket__delete(xsks[0]->xsk);
> > > +
> > > +   if (xsks[0]->umem) {
> > > +           struct xsk_umem *umem = xsks[0]->umem->umem;
> > > +
> > > +           (void)xsk_umem__delete(umem);
> > > +   }
> > > +
> > >     exit(EXIT_FAILURE);
> > >   }
> > >
> > > @@ -312,10 +303,6 @@ static struct xsk_socket_info *xsk_configure_socket(struct xsk_umem_info *umem)
> > >     if (ret)
> > >             exit_with_error(-ret);
> > >
> > > -   ret = bpf_get_link_xdp_id(opt_ifindex, &prog_id, opt_xdp_flags);
> > > -   if (ret)
> > > -           exit_with_error(-ret);
> > > -
> > >     return xsk;
> > >   }
> > >
> > > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> > > index 514ab3fb06f4..e28bedb0b078 100644
> > > --- a/tools/lib/bpf/xsk.c
> > > +++ b/tools/lib/bpf/xsk.c
> > > @@ -259,6 +259,8 @@ int xsk_umem__create(struct xsk_umem **umem_ptr, void *umem_area, __u64 size,
> > >   static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> > >   {
> > >     static const int log_buf_size = 16 * 1024;
> > > +   struct bpf_prog_info info = {};
> > > +   __u32 info_len = sizeof(info);
> > >     char log_buf[log_buf_size];
> > >     int err, prog_fd;
> > >
> > > @@ -321,6 +323,14 @@ static int xsk_load_xdp_prog(struct xsk_socket *xsk)
> > >             return err;
> > >     }
> > >
> > > +   err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len);
> > > +   if (err) {
> > > +           pr_warning("can't get prog info - %s\n", strerror(errno));
> > > +           close(prog_fd);
> > > +           return err;
> > > +   }
> > > +   xsk->config.prog_id = info.id;
> > > +
> > >     xsk->prog_fd = prog_fd;
> > >     return 0;
> > >   }
> > > @@ -483,6 +493,25 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
> > >     return err;
> > >   }
> > >
> > > +static void xsk_remove_xdp_prog(struct xsk_socket *xsk)
> > > +{
> > > +   __u32 prog_id = xsk->config.prog_id;
> > > +   __u32 curr_prog_id = 0;
> > > +   int err;
> > > +
> > > +   err = bpf_get_link_xdp_id(xsk->ifindex, &curr_prog_id,
> > > +                             xsk->config.xdp_flags);
> > > +   if (err)
> > > +           return;
> > > +
> > > +   if (prog_id == curr_prog_id)
> > > +           bpf_set_link_xdp_fd(xsk->ifindex, -1, xsk->config.xdp_flags);
> > > +   else if (!curr_prog_id)
> > > +           pr_warning("couldn't find a prog id on a given interface\n");
> > > +   else
> > > +           pr_warning("program on interface changed, not removing\n");
> > > +}
> > > +
> > >   static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> > >   {
> > >     __u32 prog_id = 0;
> > > @@ -506,6 +535,7 @@ static int xsk_setup_xdp_prog(struct xsk_socket *xsk)
> > >             err = xsk_lookup_bpf_maps(xsk);
> > >             if (err)
> > >                     goto out_load;
> > > +           xsk->config.prog_id = prog_id;
> > >     }
> > >
> > >     err = xsk_set_bpf_maps(xsk);
> > > @@ -744,6 +774,8 @@ void xsk_socket__delete(struct xsk_socket *xsk)
> > >
> > >     }
> > >
> > > +   xsk_remove_xdp_prog(xsk);
> > > +
> > >     xsk->umem->refcount--;
> > >     /* Do not close an fd that also has an associated umem connected
> > >      * to it.
> > > diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> > > index 82ea71a0f3ec..e1b23e9432c9 100644
> > > --- a/tools/lib/bpf/xsk.h
> > > +++ b/tools/lib/bpf/xsk.h
> > > @@ -186,6 +186,7 @@ struct xsk_socket_config {
> > >     __u32 tx_size;
> > >     __u32 libbpf_flags;
> > >     __u32 xdp_flags;
> > > +   __u32 prog_id;
> > >     __u16 bind_flags;
> > >   };
> > >
> > >
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present
  2019-06-04 15:07     ` Maciej Fijalkowski
@ 2019-06-05  9:26       ` Björn Töpel
  0 siblings, 0 replies; 18+ messages in thread
From: Björn Töpel @ 2019-06-05  9:26 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Björn Töpel, Karlsson, Magnus, Netdev,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jonathan Lemon, Song Liu, bpf

On Tue, 4 Jun 2019 at 17:07, Maciej Fijalkowski
<maciejromanfijalkowski@gmail.com> wrote:
>
> On Tue, 4 Jun 2019 10:08:03 +0200
> Björn Töpel <bjorn.topel@intel.com> wrote:
>
> > On 2019-06-03 15:19, Maciej Fijalkowski wrote:
> > > In case where multiple xsk sockets are attached to a single interface
> > > and one of them gets detached, the eBPF maps and program are removed.
> > > This should not happen as the rest of xsksocks are still using these
> > > resources.
> > >
> > > In order to fix that, let's have an additional eBPF map with a single
> > > entry that will be used as a xsks count. During the xsk_socket__delete,
> > > remove the resources only when this count is equal to 0.  This map is
> > > not being accessed from eBPF program, so the verifier is not associating
> > > it with the prog, which in turn makes bpf_obj_get_info_by_fd not
> > > reporting this map in nr_map_ids field of struct bpf_prog_info. The
> > > described behaviour brings the need to have this map pinned, so in
> > > case when socket is being created and the libbpf detects the presence of
> > > bpf resources, it will be able to access that map.
> > >
> >
> > This commit is only needed after #3 is applied, right? So, this is a way
> > of refcounting XDP socks?
>
> Yes, but as you pointed out it needs synchronization.
>
> >
> >
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > ---
> > >   tools/lib/bpf/xsk.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--------
> > >   1 file changed, 51 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> > > index e28bedb0b078..88d2c931ad14 100644
> > > --- a/tools/lib/bpf/xsk.c
> > > +++ b/tools/lib/bpf/xsk.c
> > > @@ -44,6 +44,8 @@
> > >    #define PF_XDP AF_XDP
> > >   #endif
> > >
> > > +#define XSKS_CNT_MAP_PATH "/sys/fs/bpf/xsks_cnt_map"
> > > +
> > >   struct xsk_umem {
> > >     struct xsk_ring_prod *fill;
> > >     struct xsk_ring_cons *comp;
> > > @@ -65,6 +67,7 @@ struct xsk_socket {
> > >     int prog_fd;
> > >     int qidconf_map_fd;
> > >     int xsks_map_fd;
> > > +   int xsks_cnt_map_fd;
> > >     __u32 queue_id;
> > >     char ifname[IFNAMSIZ];
> > >   };
> > > @@ -372,7 +375,7 @@ static int xsk_get_max_queues(struct xsk_socket *xsk)
> > >   static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> > >   {
> > >     int max_queues;
> > > -   int fd;
> > > +   int fd, ret;
> > >
> > >     max_queues = xsk_get_max_queues(xsk);
> > >     if (max_queues < 0)
> > > @@ -392,6 +395,24 @@ static int xsk_create_bpf_maps(struct xsk_socket *xsk)
> > >     }
> > >     xsk->xsks_map_fd = fd;
> > >
> > > +   fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, "xsks_cnt_map",
> > > +                            sizeof(int), sizeof(int), 1, 0);
> > > +   if (fd < 0) {
> > > +           close(xsk->qidconf_map_fd);
> > > +           close(xsk->xsks_map_fd);
> > > +           return fd;
> > > +   }
> > > +
> > > +   ret = bpf_obj_pin(fd, XSKS_CNT_MAP_PATH);
> > > +   if (ret < 0) {
> > > +           pr_warning("pinning map failed; is bpffs mounted?\n");
> > > +           close(xsk->qidconf_map_fd);
> > > +           close(xsk->xsks_map_fd);
> > > +           close(fd);
> > > +           return ret;
> > > +   }
> > > +   xsk->xsks_cnt_map_fd = fd;
> > > +
> > >     return 0;
> > >   }
> > >
> > > @@ -456,8 +477,10 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
> > >             close(fd);
> > >     }
> > >
> > > +   xsk->xsks_cnt_map_fd = bpf_obj_get(XSKS_CNT_MAP_PATH);
> > >     err = 0;
> > > -   if (xsk->qidconf_map_fd < 0 || xsk->xsks_map_fd < 0) {
> > > +   if (xsk->qidconf_map_fd < 0 || xsk->xsks_map_fd < 0 ||
> > > +       xsk->xsks_cnt_map_fd < 0) {
> > >             err = -ENOENT;
> > >             xsk_delete_bpf_maps(xsk);
> > >     }
> > > @@ -467,17 +490,25 @@ static int xsk_lookup_bpf_maps(struct xsk_socket *xsk)
> > >     return err;
> > >   }
> > >
> > > -static void xsk_clear_bpf_maps(struct xsk_socket *xsk)
> > > +static void xsk_clear_bpf_maps(struct xsk_socket *xsk, long *xsks_cnt_ptr)
> > >   {
> > > +   long xsks_cnt, key = 0;
> > >     int qid = false;
> > >
> > >     bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
> > >     bpf_map_delete_elem(xsk->xsks_map_fd, &xsk->queue_id);
> > > +   bpf_map_lookup_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt);
> > > +   if (xsks_cnt)
> > > +           xsks_cnt--;
> > > +   bpf_map_update_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt, 0);
> > > +   if (xsks_cnt_ptr)
> > > +           *xsks_cnt_ptr = xsks_cnt;
> >
> > This refcount scheme will not work; There's no synchronization between
> > the updates (cross process)!
>
> Ugh, shame. Let me fix this in v2 :(
>

Ok!

Hmm, there's no way of doing lock_xadds from bpf syscall, right? Any
thoughts how to do this? Maybe there's a need for a sock_ops prog for
XDP sockets...


Björn



> >
> > >   }
> > >
> > >   static int xsk_set_bpf_maps(struct xsk_socket *xsk)
> > >   {
> > >     int qid = true, fd = xsk->fd, err;
> > > +   long xsks_cnt, key = 0;
> > >
> > >     err = bpf_map_update_elem(xsk->qidconf_map_fd, &xsk->queue_id, &qid, 0);
> > >     if (err)
> > > @@ -487,9 +518,18 @@ static int xsk_set_bpf_maps(struct xsk_socket *xsk)
> > >     if (err)
> > >             goto out;
> > >
> > > +   err = bpf_map_lookup_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt);
> > > +   if (err)
> > > +           goto out;
> > > +
> > > +   xsks_cnt++;
> > > +   err = bpf_map_update_elem(xsk->xsks_cnt_map_fd, &key, &xsks_cnt, 0);
> > > +   if (err)
> > > +           goto out;
> > > +
> >
> > Dito.
> >
> > >     return 0;
> > >   out:
> > > -   xsk_clear_bpf_maps(xsk);
> > > +   xsk_clear_bpf_maps(xsk, NULL);
> > >     return err;
> > >   }
> > >
> > > @@ -752,13 +792,18 @@ void xsk_socket__delete(struct xsk_socket *xsk)
> > >     size_t desc_sz = sizeof(struct xdp_desc);
> > >     struct xdp_mmap_offsets off;
> > >     socklen_t optlen;
> > > +   long xsks_cnt;
> > >     int err;
> > >
> > >     if (!xsk)
> > >             return;
> > >
> > > -   xsk_clear_bpf_maps(xsk);
> > > -   xsk_delete_bpf_maps(xsk);
> > > +   xsk_clear_bpf_maps(xsk, &xsks_cnt);
> > > +   unlink(XSKS_CNT_MAP_PATH);
> > > +   if (!xsks_cnt) {
> > > +           xsk_delete_bpf_maps(xsk);
> > > +           xsk_remove_xdp_prog(xsk);
> > > +   }
> > >
> > >     optlen = sizeof(off);
> > >     err = getsockopt(xsk->fd, SOL_XDP, XDP_MMAP_OFFSETS, &off, &optlen);
> > > @@ -774,8 +819,6 @@ void xsk_socket__delete(struct xsk_socket *xsk)
> > >
> > >     }
> > >
> > > -   xsk_remove_xdp_prog(xsk);
> > > -
> > >     xsk->umem->refcount--;
> > >     /* Do not close an fd that also has an associated umem connected
> > >      * to it.
> > >
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-06-05  9:26 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-03 13:19 [RFC PATCH bpf-next 0/4] libbpf: xsk improvements Maciej Fijalkowski
2019-06-03 13:19 ` [RFC PATCH bpf-next 1/4] libbpf: fill the AF_XDP fill queue before bind() call Maciej Fijalkowski
2019-06-04  8:06   ` Björn Töpel
2019-06-04 15:04     ` Maciej Fijalkowski
2019-06-04 15:54       ` Jonathan Lemon
2019-06-05  9:00       ` Björn Töpel
2019-06-03 13:19 ` [RFC PATCH bpf-next 2/4] libbpf: check for channels.max_{t,r}x in xsk_get_max_queues Maciej Fijalkowski
2019-06-04  8:06   ` Björn Töpel
2019-06-04 15:05     ` Maciej Fijalkowski
2019-06-03 13:19 ` [RFC PATCH bpf-next 3/4] libbpf: move xdp program removal to libbpf Maciej Fijalkowski
2019-06-04  8:07   ` Björn Töpel
2019-06-04 15:06     ` Maciej Fijalkowski
2019-06-05  9:03       ` Björn Töpel
2019-06-03 13:19 ` [RFC PATCH bpf-next 4/4] libbpf: don't remove eBPF resources when other xsks are present Maciej Fijalkowski
2019-06-03 18:26   ` Jonathan Lemon
2019-06-04  8:08   ` Björn Töpel
2019-06-04 15:07     ` Maciej Fijalkowski
2019-06-05  9:26       ` Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).