All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v6 0/3] Add support for transmitting packets using XDP in bpf_prog_run()
@ 2022-01-06 19:59 Toke Høiland-Jørgensen
  2022-01-06 19:59 ` [PATCH bpf-next v6 1/3] bpf: Add "live packet" mode for " Toke Høiland-Jørgensen
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-06 19:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh
  Cc: Toke Høiland-Jørgensen, netdev, bpf

This series adds support for transmitting packets using XDP in
bpf_prog_run(), by enabling a new mode "live packet" mode which will handle
the XDP program return codes and redirect the packets to the stack or other
devices.

The primary use case for this is testing the redirect map types and the
ndo_xdp_xmit driver operation without an external traffic generator. But it
turns out to also be useful for creating a programmable traffic generator
in XDP, as well as injecting frames into the stack. A sample traffic
generator, which was included in previous versions of the series, but now
moved to xdp-tools, transmits up to 11.5 Mpps/core on my test machine.

To transmit the frames, the new mode instantiates a page_pool structure in
bpf_prog_run() and initialises the pages to contain XDP frames with the
data passed in by userspace. These frames can then be handled as though
they came from the hardware XDP path, and the existing page_pool code takes
care of returning and recycling them. The setup is optimised for high
performance with a high number of repetitions to support stress testing and
the traffic generator use case; see patch 1 for details.

v6:
- Fix meta vs data pointer setting and add a selftest for it
- Add local_bh_disable() around code passing packets up the stack
- Create a new netns for the selftest and use a TC program instead of the
  forwarding hack to count packets being XDP_PASS'ed from the test prog.
- Check for the correct ingress ifindex in the selftest
- Rebase and drop patches 1-5 that were already merged

v5:
- Rebase to current bpf-next

v4:
- Fix a few code style issues (Alexei)
- Also handle the other return codes: XDP_PASS builds skbs and injects them
  into the stack, and XDP_TX is turned into a redirect out the same
  interface (Alexei).
- Drop the last patch adding an xdp_trafficgen program to samples/bpf; this
  will live in xdp-tools instead (Alexei).
- Add a separate bpf_test_run_xdp_live() function to test_run.c instead of
  entangling the new mode in the existing bpf_test_run().

v3:
- Reorder patches to make sure they all build individually (Patchwork)
- Remove a couple of unused variables (Patchwork)
- Remove unlikely() annotation in slow path and add back John's ACK that I
  accidentally dropped for v2 (John)

v2:
- Split up up __xdp_do_redirect to avoid passing two pointers to it (John)
- Always reset context pointers before each test run (John)
- Use get_mac_addr() from xdp_sample_user.h instead of rolling our own (Kumar)
- Fix wrong offset for metadata pointer

Toke Høiland-Jørgensen (3):
  bpf: Add "live packet" mode for XDP in bpf_prog_run()
  selftests/bpf: Move open_netns() and close_netns() into
    network_helpers.c
  selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()

 include/uapi/linux/bpf.h                      |   2 +
 kernel/bpf/Kconfig                            |   1 +
 net/bpf/test_run.c                            | 290 +++++++++++++++++-
 tools/include/uapi/linux/bpf.h                |   2 +
 tools/testing/selftests/bpf/network_helpers.c |  86 ++++++
 tools/testing/selftests/bpf/network_helpers.h |   9 +
 .../selftests/bpf/prog_tests/tc_redirect.c    |  87 ------
 .../bpf/prog_tests/xdp_do_redirect.c          | 151 +++++++++
 .../bpf/progs/test_xdp_do_redirect.c          |  78 +++++
 9 files changed, 611 insertions(+), 95 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v6 1/3] bpf: Add "live packet" mode for XDP in bpf_prog_run()
  2022-01-06 19:59 [PATCH bpf-next v6 0/3] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
@ 2022-01-06 19:59 ` Toke Høiland-Jørgensen
  2022-01-06 19:59 ` [PATCH bpf-next v6 2/3] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c Toke Høiland-Jørgensen
  2022-01-06 19:59 ` [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
  2 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-06 19:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, netdev, bpf

This adds support for running XDP programs through bpf_prog_run() in a mode
that enables live packet processing of the resulting frames. Previous uses
of bpf_prog_run() for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.

This patch adds a new mode with different semantics. When enabled through
the new BPF_F_TEST_XDP_LIVE_FRAMES flag, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out an egress interface. XDP_TX is translated into an
XDP_REDIRECT operation, since the real XDP_TX action is only possible from
within the network drivers themselves, not from the process context where
bpf_prog_run() is executed.

To achieve this new mode of operation, we create a page pool instance while
setting up the test run, and feed pages from that into the XDP program. The
setup cost of this is amortised over the number of repetitions specified by
userspace.

To support performance testing use case, we further optimise the setup step
so that all pages in the pool are pre-initialised with the packet data, and
pre-computed context and xdp_frame objects stored at the start of each
page. This makes it possible to entirely avoid touching the page content on
each XDP program invocation, and enables sending up to 11.5 Mpps/core on my
test box.

Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.

Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/uapi/linux/bpf.h       |   2 +
 kernel/bpf/Kconfig             |   1 +
 net/bpf/test_run.c             | 290 ++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h |   2 +
 4 files changed, 287 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b0383d371b9a..5ef20deaf49f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1225,6 +1225,8 @@ enum {
 
 /* If set, run the test on the cpu specified by bpf_attr.test.cpu */
 #define BPF_F_TEST_RUN_ON_CPU	(1U << 0)
+/* If set, XDP frames will be transmitted after processing */
+#define BPF_F_TEST_XDP_LIVE_FRAMES	(1U << 1)
 
 /* type for BPF_ENABLE_STATS */
 enum bpf_stats_type {
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index d24d518ddd63..c8c920020d11 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -30,6 +30,7 @@ config BPF_SYSCALL
 	select TASKS_TRACE_RCU
 	select BINARY_PRINTF
 	select NET_SOCK_MSG if NET
+	select PAGE_POOL if NET
 	default n
 	help
 	  Enable the bpf() system call that allows to manipulate BPF programs
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 46dd95755967..4a9af169e2dd 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -14,6 +14,7 @@
 #include <net/sock.h>
 #include <net/tcp.h>
 #include <net/net_namespace.h>
+#include <net/page_pool.h>
 #include <linux/error-injection.h>
 #include <linux/smp.h>
 #include <linux/sock_diag.h>
@@ -52,10 +53,11 @@ static void bpf_test_timer_leave(struct bpf_test_timer *t)
 	rcu_read_unlock();
 }
 
-static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *err, u32 *duration)
+static bool bpf_test_timer_continue(struct bpf_test_timer *t, int iterations,
+				    u32 repeat, int *err, u32 *duration)
 	__must_hold(rcu)
 {
-	t->i++;
+	t->i += iterations;
 	if (t->i >= repeat) {
 		/* We're done. */
 		t->time_spent += ktime_get_ns() - t->time_start;
@@ -87,6 +89,270 @@ static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *e
 	return false;
 }
 
+/* We put this struct at the head of each page with a context and frame
+ * initialised when the page is allocated, so we don't have to do this on each
+ * repetition of the test run.
+ */
+struct xdp_page_head {
+	struct xdp_buff orig_ctx;
+	struct xdp_buff ctx;
+	struct xdp_frame frm;
+	u8 data[];
+};
+
+struct xdp_test_data {
+	struct xdp_buff *orig_ctx;
+	struct xdp_rxq_info rxq;
+	struct net_device *dev;
+	struct page_pool *pp;
+	u32 frame_cnt;
+};
+
+#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head)	\
+			     - sizeof(struct skb_shared_info))
+#define TEST_XDP_BATCH 64
+
+static void xdp_test_run_init_page(struct page *page, void *arg)
+{
+	struct xdp_page_head *head = phys_to_virt(page_to_phys(page));
+	struct xdp_buff *new_ctx, *orig_ctx;
+	u32 headroom = XDP_PACKET_HEADROOM;
+	struct xdp_test_data *xdp = arg;
+	size_t frm_len, meta_len;
+	struct xdp_frame *frm;
+	void *data;
+
+	orig_ctx = xdp->orig_ctx;
+	frm_len = orig_ctx->data_end - orig_ctx->data_meta;
+	meta_len = orig_ctx->data - orig_ctx->data_meta;
+	headroom -= meta_len;
+
+	new_ctx = &head->ctx;
+	frm = &head->frm;
+	data = &head->data;
+	memcpy(data + headroom, orig_ctx->data_meta, frm_len);
+
+	xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &xdp->rxq);
+	xdp_prepare_buff(new_ctx, data, headroom, frm_len, true);
+	new_ctx->data = new_ctx->data_meta + meta_len;
+
+	xdp_update_frame_from_buff(new_ctx, frm);
+	frm->mem = new_ctx->rxq->mem;
+
+	memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
+}
+
+static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_ctx)
+{
+	struct xdp_mem_info mem = {};
+	struct page_pool *pp;
+	int err;
+	struct page_pool_params pp_params = {
+		.order = 0,
+		.flags = 0,
+		.pool_size = NAPI_POLL_WEIGHT * 2,
+		.nid = NUMA_NO_NODE,
+		.max_len = TEST_XDP_FRAME_SIZE,
+		.init_callback = xdp_test_run_init_page,
+		.init_arg = xdp,
+	};
+
+	pp = page_pool_create(&pp_params);
+	if (IS_ERR(pp))
+		return PTR_ERR(pp);
+
+	/* will copy 'mem.id' into pp->xdp_mem_id */
+	err = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pp);
+	if (err) {
+		page_pool_destroy(pp);
+		return err;
+	}
+	xdp->pp = pp;
+
+	/* We create a 'fake' RXQ referencing the original dev, but with an
+	 * xdp_mem_info pointing to our page_pool
+	 */
+	xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0);
+	xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL;
+	xdp->rxq.mem.id = pp->xdp_mem_id;
+	xdp->dev = orig_ctx->rxq->dev;
+	xdp->orig_ctx = orig_ctx;
+
+	return 0;
+}
+
+static void xdp_test_run_teardown(struct xdp_test_data *xdp)
+{
+	struct xdp_mem_info mem = {
+		.id = xdp->pp->xdp_mem_id,
+		.type = MEM_TYPE_PAGE_POOL,
+	};
+
+	xdp_unreg_mem_model(&mem);
+}
+
+static bool ctx_was_changed(struct xdp_page_head *head)
+{
+	return head->orig_ctx.data != head->ctx.data ||
+		head->orig_ctx.data_meta != head->ctx.data_meta ||
+		head->orig_ctx.data_end != head->ctx.data_end;
+}
+
+static void reset_ctx(struct xdp_page_head *head)
+{
+	if (likely(!ctx_was_changed(head)))
+		return;
+
+	head->ctx.data = head->orig_ctx.data;
+	head->ctx.data_meta = head->orig_ctx.data_meta;
+	head->ctx.data_end = head->orig_ctx.data_end;
+	xdp_update_frame_from_buff(&head->ctx, &head->frm);
+}
+
+static int xdp_recv_frames(struct xdp_frame **frames, int nframes,
+			   struct net_device *dev)
+{
+	gfp_t gfp = __GFP_ZERO | GFP_ATOMIC;
+	void *skbs[TEST_XDP_BATCH];
+	int i, n;
+	LIST_HEAD(list);
+
+	n = kmem_cache_alloc_bulk(skbuff_head_cache, gfp, nframes, skbs);
+	if (unlikely(n == 0)) {
+		for (i = 0; i < nframes; i++)
+			xdp_return_frame(frames[i]);
+		return -ENOMEM;
+	}
+
+	local_bh_disable();
+	for (i = 0; i < nframes; i++) {
+		struct xdp_frame *xdpf = frames[i];
+		struct sk_buff *skb = skbs[i];
+
+		skb = __xdp_build_skb_from_frame(xdpf, skb, dev);
+		if (!skb) {
+			xdp_return_frame(xdpf);
+			continue;
+		}
+
+		list_add_tail(&skb->list, &list);
+	}
+	netif_receive_skb_list(&list);
+	local_bh_enable();
+
+	return 0;
+}
+
+static int xdp_test_run_batch(struct xdp_test_data *xdp, struct bpf_prog *prog,
+			      u32 repeat)
+{
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	int err = 0, act, ret, i, nframes = 0, batch_sz;
+	struct xdp_frame *frames[TEST_XDP_BATCH];
+	struct xdp_page_head *head;
+	struct xdp_frame *frm;
+	bool redirect = false;
+	struct xdp_buff *ctx;
+	struct page *page;
+
+	batch_sz = min_t(u32, repeat, TEST_XDP_BATCH);
+	xdp_set_return_frame_no_direct();
+
+	for (i = 0; i < batch_sz; i++) {
+		page = page_pool_dev_alloc_pages(xdp->pp);
+		if (!page) {
+			err = -ENOMEM;
+			goto out;
+		}
+
+		head = phys_to_virt(page_to_phys(page));
+		reset_ctx(head);
+		ctx = &head->ctx;
+		frm = &head->frm;
+		xdp->frame_cnt++;
+
+		act = bpf_prog_run_xdp(prog, ctx);
+
+		/* if program changed pkt bounds we need to update the xdp_frame */
+		if (unlikely(ctx_was_changed(head))) {
+			err = xdp_update_frame_from_buff(ctx, frm);
+			if (err) {
+				xdp_return_buff(ctx);
+				goto out;
+			}
+		}
+
+		switch (act) {
+		case XDP_TX:
+			/* we can't do a real XDP_TX since we're not in the
+			 * driver, so turn it into a REDIRECT back to the same
+			 * index
+			 */
+			ri->tgt_index = xdp->dev->ifindex;
+			ri->map_id = INT_MAX;
+			ri->map_type = BPF_MAP_TYPE_UNSPEC;
+			fallthrough;
+		case XDP_REDIRECT:
+			redirect = true;
+			err = xdp_do_redirect_frame(xdp->dev, ctx, frm, prog);
+			if (err) {
+				xdp_return_buff(ctx);
+				goto out;
+			}
+			break;
+		case XDP_PASS:
+			frames[nframes++] = frm;
+			break;
+		default:
+			bpf_warn_invalid_xdp_action(NULL, prog, act);
+			fallthrough;
+		case XDP_DROP:
+			xdp_return_buff(ctx);
+			break;
+		}
+	}
+
+out:
+	if (redirect)
+		xdp_do_flush();
+	if (nframes) {
+		ret = xdp_recv_frames(frames, nframes, xdp->dev);
+		if (ret)
+			err = ret;
+	}
+
+	xdp_clear_return_frame_no_direct();
+	return err;
+}
+
+static int bpf_test_run_xdp_live(struct bpf_prog *prog, struct xdp_buff *ctx,
+				 u32 repeat, u32 *time)
+
+{
+	struct bpf_test_timer t = { .mode = NO_MIGRATE };
+	struct xdp_test_data xdp = {};
+	int ret;
+
+	if (!repeat)
+		repeat = 1;
+
+	ret = xdp_test_run_setup(&xdp, ctx);
+	if (ret)
+		return ret;
+
+	bpf_test_timer_enter(&t);
+	do {
+		xdp.frame_cnt = 0;
+		ret = xdp_test_run_batch(&xdp, prog, repeat - t.i);
+		if (unlikely(ret < 0))
+			break;
+	} while (bpf_test_timer_continue(&t, xdp.frame_cnt, repeat, &ret, time));
+	bpf_test_timer_leave(&t);
+
+	xdp_test_run_teardown(&xdp);
+	return ret;
+}
+
 static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 			u32 *retval, u32 *time, bool xdp)
 {
@@ -118,7 +384,7 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 			*retval = bpf_prog_run_xdp(prog, ctx);
 		else
 			*retval = bpf_prog_run(prog, ctx);
-	} while (bpf_test_timer_continue(&t, repeat, &ret, time));
+	} while (bpf_test_timer_continue(&t, 1, repeat, &ret, time));
 	bpf_reset_run_ctx(old_ctx);
 	bpf_test_timer_leave(&t);
 
@@ -757,13 +1023,14 @@ static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md)
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
+	bool do_live = (kattr->test.flags & BPF_F_TEST_XDP_LIVE_FRAMES);
 	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	u32 headroom = XDP_PACKET_HEADROOM;
 	u32 size = kattr->test.data_size_in;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
 	struct xdp_buff xdp = {};
-	u32 retval, duration;
+	u32 retval = 0, duration;
 	struct xdp_md *ctx;
 	u32 max_data_sz;
 	void *data;
@@ -773,6 +1040,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	    prog->expected_attach_type == BPF_XDP_CPUMAP)
 		return -EINVAL;
 
+	if (kattr->test.flags & ~BPF_F_TEST_XDP_LIVE_FRAMES)
+		return -EINVAL;
+
 	ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
@@ -781,7 +1051,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		/* There can't be user provided data before the meta data */
 		if (ctx->data_meta || ctx->data_end != size ||
 		    ctx->data > ctx->data_end ||
-		    unlikely(xdp_metalen_invalid(ctx->data)))
+		    unlikely(xdp_metalen_invalid(ctx->data)) ||
+		    (do_live && (kattr->test.data_out || kattr->test.ctx_out)))
 			goto free_ctx;
 		/* Meta data is allocated from the headroom */
 		headroom -= ctx->data;
@@ -807,7 +1078,10 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 
 	if (repeat > 1)
 		bpf_prog_change_xdp(NULL, prog);
-	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
+	if (do_live)
+		ret = bpf_test_run_xdp_live(prog, &xdp, repeat, &duration);
+	else
+		ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	/* We convert the xdp_buff back to an xdp_md before checking the return
 	 * code so the reference count of any held netdevice will be decremented
 	 * even if the test run failed.
@@ -905,7 +1179,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	do {
 		retval = bpf_flow_dissect(prog, &ctx, eth->h_proto, ETH_HLEN,
 					  size, flags);
-	} while (bpf_test_timer_continue(&t, repeat, &ret, &duration));
+	} while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration));
 	bpf_test_timer_leave(&t);
 
 	if (ret < 0)
@@ -1000,7 +1274,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat
 	do {
 		ctx.selected_sk = NULL;
 		retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, bpf_prog_run);
-	} while (bpf_test_timer_continue(&t, repeat, &ret, &duration));
+	} while (bpf_test_timer_continue(&t, 1, repeat, &ret, &duration));
 	bpf_test_timer_leave(&t);
 
 	if (ret < 0)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b0383d371b9a..5ef20deaf49f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1225,6 +1225,8 @@ enum {
 
 /* If set, run the test on the cpu specified by bpf_attr.test.cpu */
 #define BPF_F_TEST_RUN_ON_CPU	(1U << 0)
+/* If set, XDP frames will be transmitted after processing */
+#define BPF_F_TEST_XDP_LIVE_FRAMES	(1U << 1)
 
 /* type for BPF_ENABLE_STATS */
 enum bpf_stats_type {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v6 2/3] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c
  2022-01-06 19:59 [PATCH bpf-next v6 0/3] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
  2022-01-06 19:59 ` [PATCH bpf-next v6 1/3] bpf: Add "live packet" mode for " Toke Høiland-Jørgensen
@ 2022-01-06 19:59 ` Toke Høiland-Jørgensen
  2022-01-06 19:59 ` [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
  2 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-06 19:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, Shuah Khan, netdev, bpf

These will also be used by the xdp_do_redirect test being added in the next
commit.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/testing/selftests/bpf/network_helpers.c | 86 ++++++++++++++++++
 tools/testing/selftests/bpf/network_helpers.h |  9 ++
 .../selftests/bpf/prog_tests/tc_redirect.c    | 87 -------------------
 3 files changed, 95 insertions(+), 87 deletions(-)

diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
index 6db1af8fdee7..2bb1f9b3841d 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -1,18 +1,25 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE
+
 #include <errno.h>
 #include <stdbool.h>
 #include <stdio.h>
 #include <string.h>
 #include <unistd.h>
+#include <sched.h>
 
 #include <arpa/inet.h>
+#include <sys/mount.h>
+#include <sys/stat.h>
 
 #include <linux/err.h>
 #include <linux/in.h>
 #include <linux/in6.h>
+#include <linux/limits.h>
 
 #include "bpf_util.h"
 #include "network_helpers.h"
+#include "test_progs.h"
 
 #define clean_errno() (errno == 0 ? "None" : strerror(errno))
 #define log_err(MSG, ...) ({						\
@@ -356,3 +363,82 @@ char *ping_command(int family)
 	}
 	return "ping";
 }
+
+struct nstoken {
+	int orig_netns_fd;
+};
+
+static int setns_by_fd(int nsfd)
+{
+	int err;
+
+	err = setns(nsfd, CLONE_NEWNET);
+	close(nsfd);
+
+	if (!ASSERT_OK(err, "setns"))
+		return err;
+
+	/* Switch /sys to the new namespace so that e.g. /sys/class/net
+	 * reflects the devices in the new namespace.
+	 */
+	err = unshare(CLONE_NEWNS);
+	if (!ASSERT_OK(err, "unshare"))
+		return err;
+
+	/* Make our /sys mount private, so the following umount won't
+	 * trigger the global umount in case it's shared.
+	 */
+	err = mount("none", "/sys", NULL, MS_PRIVATE, NULL);
+	if (!ASSERT_OK(err, "remount private /sys"))
+		return err;
+
+	err = umount2("/sys", MNT_DETACH);
+	if (!ASSERT_OK(err, "umount2 /sys"))
+		return err;
+
+	err = mount("sysfs", "/sys", "sysfs", 0, NULL);
+	if (!ASSERT_OK(err, "mount /sys"))
+		return err;
+
+	err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL);
+	if (!ASSERT_OK(err, "mount /sys/fs/bpf"))
+		return err;
+
+	return 0;
+}
+
+struct nstoken *open_netns(const char *name)
+{
+	int nsfd;
+	char nspath[PATH_MAX];
+	int err;
+	struct nstoken *token;
+
+	token = malloc(sizeof(struct nstoken));
+	if (!ASSERT_OK_PTR(token, "malloc token"))
+		return NULL;
+
+	token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net"))
+		goto fail;
+
+	snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
+	nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
+	if (!ASSERT_GE(nsfd, 0, "open netns fd"))
+		goto fail;
+
+	err = setns_by_fd(nsfd);
+	if (!ASSERT_OK(err, "setns_by_fd"))
+		goto fail;
+
+	return token;
+fail:
+	free(token);
+	return NULL;
+}
+
+void close_netns(struct nstoken *token)
+{
+	ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd");
+	free(token);
+}
diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
index d198181a5648..a4b3b2f9877b 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -55,4 +55,13 @@ int make_sockaddr(int family, const char *addr_str, __u16 port,
 		  struct sockaddr_storage *addr, socklen_t *len);
 char *ping_command(int family);
 
+struct nstoken;
+/**
+ * open_netns() - Switch to specified network namespace by name.
+ *
+ * Returns token with which to restore the original namespace
+ * using close_netns().
+ */
+struct nstoken *open_netns(const char *name);
+void close_netns(struct nstoken *token);
 #endif
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
index c2426df58e17..0a13c7eb40f3 100644
--- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
@@ -17,10 +17,8 @@
 #include <linux/if_tun.h>
 #include <linux/limits.h>
 #include <linux/sysctl.h>
-#include <sched.h>
 #include <stdbool.h>
 #include <stdio.h>
-#include <sys/mount.h>
 #include <sys/stat.h>
 #include <unistd.h>
 
@@ -84,91 +82,6 @@ static int write_file(const char *path, const char *newval)
 	return 0;
 }
 
-struct nstoken {
-	int orig_netns_fd;
-};
-
-static int setns_by_fd(int nsfd)
-{
-	int err;
-
-	err = setns(nsfd, CLONE_NEWNET);
-	close(nsfd);
-
-	if (!ASSERT_OK(err, "setns"))
-		return err;
-
-	/* Switch /sys to the new namespace so that e.g. /sys/class/net
-	 * reflects the devices in the new namespace.
-	 */
-	err = unshare(CLONE_NEWNS);
-	if (!ASSERT_OK(err, "unshare"))
-		return err;
-
-	/* Make our /sys mount private, so the following umount won't
-	 * trigger the global umount in case it's shared.
-	 */
-	err = mount("none", "/sys", NULL, MS_PRIVATE, NULL);
-	if (!ASSERT_OK(err, "remount private /sys"))
-		return err;
-
-	err = umount2("/sys", MNT_DETACH);
-	if (!ASSERT_OK(err, "umount2 /sys"))
-		return err;
-
-	err = mount("sysfs", "/sys", "sysfs", 0, NULL);
-	if (!ASSERT_OK(err, "mount /sys"))
-		return err;
-
-	err = mount("bpffs", "/sys/fs/bpf", "bpf", 0, NULL);
-	if (!ASSERT_OK(err, "mount /sys/fs/bpf"))
-		return err;
-
-	return 0;
-}
-
-/**
- * open_netns() - Switch to specified network namespace by name.
- *
- * Returns token with which to restore the original namespace
- * using close_netns().
- */
-static struct nstoken *open_netns(const char *name)
-{
-	int nsfd;
-	char nspath[PATH_MAX];
-	int err;
-	struct nstoken *token;
-
-	token = malloc(sizeof(struct nstoken));
-	if (!ASSERT_OK_PTR(token, "malloc token"))
-		return NULL;
-
-	token->orig_netns_fd = open("/proc/self/ns/net", O_RDONLY);
-	if (!ASSERT_GE(token->orig_netns_fd, 0, "open /proc/self/ns/net"))
-		goto fail;
-
-	snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
-	nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
-	if (!ASSERT_GE(nsfd, 0, "open netns fd"))
-		goto fail;
-
-	err = setns_by_fd(nsfd);
-	if (!ASSERT_OK(err, "setns_by_fd"))
-		goto fail;
-
-	return token;
-fail:
-	free(token);
-	return NULL;
-}
-
-static void close_netns(struct nstoken *token)
-{
-	ASSERT_OK(setns_by_fd(token->orig_netns_fd), "setns_by_fd");
-	free(token);
-}
-
 static int netns_setup_namespaces(const char *verb)
 {
 	const char * const *ns = namespaces;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
  2022-01-06 19:59 [PATCH bpf-next v6 0/3] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
  2022-01-06 19:59 ` [PATCH bpf-next v6 1/3] bpf: Add "live packet" mode for " Toke Høiland-Jørgensen
  2022-01-06 19:59 ` [PATCH bpf-next v6 2/3] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c Toke Høiland-Jørgensen
@ 2022-01-06 19:59 ` Toke Høiland-Jørgensen
  2022-01-07  0:50   ` Alexei Starovoitov
  2 siblings, 1 reply; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-06 19:59 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh
  Cc: Toke Høiland-Jørgensen, Shuah Khan, netdev, bpf

This adds a selftest for the XDP_REDIRECT facility in bpf_prog_run, that
redirects packets into a veth and counts them using an XDP program on the
other side of the veth pair and a TC program on the local side of the veth.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 .../bpf/prog_tests/xdp_do_redirect.c          | 151 ++++++++++++++++++
 .../bpf/progs/test_xdp_do_redirect.c          |  78 +++++++++
 2 files changed, 229 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
new file mode 100644
index 000000000000..3789c380f24e
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <net/if.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ipv6.h>
+#include <linux/in6.h>
+#include <linux/udp.h>
+#include <bpf/bpf_endian.h>
+#include "test_xdp_do_redirect.skel.h"
+
+#define SYS(fmt, ...)						\
+	({							\
+		char cmd[1024];					\
+		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
+		if (!ASSERT_OK(system(cmd), cmd))		\
+			goto out;				\
+	})
+
+struct udp_packet {
+	struct ethhdr eth;
+	struct ipv6hdr iph;
+	struct udphdr udp;
+	__u8 payload[64 - sizeof(struct udphdr)
+		     - sizeof(struct ethhdr) - sizeof(struct ipv6hdr)];
+} __packed;
+
+static struct udp_packet pkt_udp = {
+	.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
+	.eth.h_dest = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55},
+	.eth.h_source = {0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb},
+	.iph.version = 6,
+	.iph.nexthdr = IPPROTO_UDP,
+	.iph.payload_len = bpf_htons(sizeof(struct udp_packet)
+				     - offsetof(struct udp_packet, udp)),
+	.iph.hop_limit = 2,
+	.iph.saddr.s6_addr16 = {bpf_htons(0xfc00), 0, 0, 0, 0, 0, 0, bpf_htons(1)},
+	.iph.daddr.s6_addr16 = {bpf_htons(0xfc00), 0, 0, 0, 0, 0, 0, bpf_htons(2)},
+	.udp.source = bpf_htons(1),
+	.udp.dest = bpf_htons(1),
+	.udp.len = bpf_htons(sizeof(struct udp_packet)
+			     - offsetof(struct udp_packet, udp)),
+	.payload = {0x42}, /* receiver XDP program matches on this */
+};
+
+static int attach_tc_prog(struct bpf_tc_hook *hook, int fd)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd);
+	int ret;
+
+	ret = bpf_tc_hook_create(hook);
+	if (!ASSERT_OK(ret, "create tc hook"))
+		return ret;
+
+	ret = bpf_tc_attach(hook, &opts);
+	if (!ASSERT_OK(ret, "bpf_tc_attach")) {
+		bpf_tc_hook_destroy(hook);
+		return ret;
+	}
+
+	return 0;
+}
+
+#define NUM_PKTS 10
+void test_xdp_do_redirect(void)
+{
+	int err, xdp_prog_fd, tc_prog_fd, ifindex_src, ifindex_dst;
+	char data[sizeof(pkt_udp) + sizeof(__u32)];
+	struct test_xdp_do_redirect *skel = NULL;
+	struct nstoken *nstoken = NULL;
+	struct bpf_link *link;
+
+	struct xdp_md ctx_in = { .data = sizeof(__u32),
+				 .data_end = sizeof(data) };
+	DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+			    .data_in = &data,
+			    .data_size_in = sizeof(data),
+			    .ctx_in = &ctx_in,
+			    .ctx_size_in = sizeof(ctx_in),
+			    .flags = BPF_F_TEST_XDP_LIVE_FRAMES,
+			    .repeat = NUM_PKTS,
+		);
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
+			    .attach_point = BPF_TC_INGRESS);
+
+	memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
+	*((__u32 *)data) = 0x42; /* metadata test value */
+
+	skel = test_xdp_do_redirect__open();
+	if (!ASSERT_OK_PTR(skel, "skel"))
+		return;
+
+	/* The XDP program we run with bpf_prog_run() will cycle through all
+	 * three xmit (PASS/TX/REDIRECT) return codes starting from above, and
+	 * ending up with PASS, so we should end up with two packets on the dst
+	 * iface and NUM_PKTS-2 in the TC hook. We match the packets on the UDP
+	 * payload.
+	 */
+	SYS("ip netns add testns");
+	nstoken = open_netns("testns");
+	if (!ASSERT_OK_PTR(nstoken, "setns"))
+		goto out;
+
+	SYS("ip link add veth_src type veth peer name veth_dst");
+	SYS("ip link set dev veth_src up");
+	SYS("ip link set dev veth_dst up");
+
+	ifindex_src = if_nametoindex("veth_src");
+	ifindex_dst = if_nametoindex("veth_dst");
+	if (!ASSERT_NEQ(ifindex_src, 0, "ifindex_src") ||
+	    !ASSERT_NEQ(ifindex_dst, 0, "ifindex_dst"))
+		goto out;
+
+	memcpy(skel->rodata->expect_dst, &pkt_udp.eth.h_dest, ETH_ALEN);
+	skel->rodata->ifindex_out = ifindex_src; /* redirect back to the same iface */
+	skel->rodata->ifindex_in = ifindex_src;
+	ctx_in.ingress_ifindex = ifindex_src;
+	tc_hook.ifindex = ifindex_src;
+
+	if (!ASSERT_OK(test_xdp_do_redirect__load(skel), "load"))
+		goto out;
+
+	link = bpf_program__attach_xdp(skel->progs.xdp_count_pkts, ifindex_dst);
+	if (!ASSERT_OK_PTR(link, "prog_attach"))
+		goto out;
+	skel->links.xdp_count_pkts = link;
+
+	tc_prog_fd = bpf_program__fd(skel->progs.tc_count_pkts);
+	if (attach_tc_prog(&tc_hook, tc_prog_fd))
+		goto out;
+
+	xdp_prog_fd = bpf_program__fd(skel->progs.xdp_redirect);
+	err = bpf_prog_test_run_opts(xdp_prog_fd, &opts);
+	if (!ASSERT_OK(err, "prog_run"))
+		goto out_tc;
+
+	/* wait for the packets to be flushed */
+	kern_sync_rcu();
+
+	ASSERT_EQ(skel->bss->pkts_seen_xdp, 2, "pkt_count_xdp");
+	ASSERT_EQ(skel->bss->pkts_seen_tc, NUM_PKTS - 2, "pkt_count_tc");
+
+out_tc:
+	bpf_tc_hook_destroy(&tc_hook);
+out:
+	if (nstoken)
+		close_netns(nstoken);
+	system("ip netns del testns");
+	test_xdp_do_redirect__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
new file mode 100644
index 000000000000..cdb0ddb691c9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+#define ETH_ALEN 6
+const volatile int ifindex_out;
+const volatile int ifindex_in;
+const volatile __u8 expect_dst[ETH_ALEN];
+volatile int pkts_seen_xdp = 0;
+volatile int pkts_seen_tc = 0;
+volatile int retcode = XDP_REDIRECT;
+
+SEC("xdp")
+int xdp_redirect(struct xdp_md *xdp)
+{
+	__u32 *metadata = (void *)(long)xdp->data_meta;
+	void *data = (void *)(long)xdp->data;
+	int ret = retcode;
+
+	if (xdp->ingress_ifindex != ifindex_in)
+		return XDP_ABORTED;
+
+	if (metadata + 1 > data)
+		return XDP_ABORTED;
+
+	if (*metadata != 0x42)
+		return XDP_ABORTED;
+
+	if (bpf_xdp_adjust_meta(xdp, 4))
+		return XDP_ABORTED;
+
+	if (retcode > XDP_PASS)
+		retcode--;
+
+	if (ret == XDP_REDIRECT)
+		return bpf_redirect(ifindex_out, 0);
+
+	return ret;
+}
+
+static bool check_pkt(void *data, void *data_end)
+{
+	struct ethhdr *eth = data;
+	struct ipv6hdr *iph = (void *)(eth + 1);
+	struct udphdr *udp = (void *)(iph + 1);
+	__u8 *payload = (void *)(udp + 1);
+
+	if (payload + 1 > data_end)
+		return false;
+
+	return iph->nexthdr == IPPROTO_UDP && *payload == 0x42;
+}
+
+SEC("xdp")
+int xdp_count_pkts(struct xdp_md *xdp)
+{
+	void *data = (void *)(long)xdp->data;
+	void *data_end = (void *)(long)xdp->data_end;
+
+	if (check_pkt(data, data_end))
+		pkts_seen_xdp++;
+
+	return XDP_PASS;
+}
+
+SEC("tc")
+int tc_count_pkts(struct __sk_buff *skb)
+{
+	void *data = (void *)(long)skb->data;
+	void *data_end = (void *)(long)skb->data_end;
+
+	if (check_pkt(data, data_end))
+		pkts_seen_tc++;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
  2022-01-06 19:59 ` [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
@ 2022-01-07  0:50   ` Alexei Starovoitov
  2022-01-07 15:54     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2022-01-07  0:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Shuah Khan, Network Development, bpf

On Thu, Jan 6, 2022 at 11:59 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> +
> +#define NUM_PKTS 10

I'm afraid this needs more work.
Just bumping above to 1M I got:
[  254.165911] ================================
[  254.166387] WARNING: inconsistent lock state
[  254.166882] 5.16.0-rc7-02011-g64923127f1b3 #3784 Tainted: G           O
[  254.167659] --------------------------------
[  254.168140] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  254.168793] swapper/7/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
[  254.169373] ffff888113d24220 (&r->producer_lock){+.?.}-{2:2}, at:
veth_xmit+0x361/0x830
[  254.170317] {SOFTIRQ-ON-W} state was registered at:
[  254.170921]   lock_acquire+0x18a/0x450
[  254.171371]   _raw_spin_lock+0x2f/0x40
[  254.171815]   veth_xdp_xmit+0x1d7/0x8c0
[  254.172241]   veth_ndo_xdp_xmit+0x1d/0x50
[  254.172689]   bq_xmit_all+0x562/0xc30
[  254.173159]   __dev_flush+0xb1/0x220
[  254.173586]   xdp_do_flush+0xa/0x20
[  254.173983]   xdp_test_run_batch.constprop.25+0x90c/0xf00
[  254.174564]   bpf_test_run_xdp_live+0x369/0x480
[  254.175038]   bpf_prog_test_run_xdp+0x63f/0xe50
[  254.175512]   __sys_bpf+0x688/0x4410
[  254.175923]   __x64_sys_bpf+0x75/0xb0
[  254.176327]   do_syscall_64+0x34/0x80
[  254.176733]   entry_SYSCALL_64_after_hwframe+0x44/0xae
[  254.177297] irq event stamp: 130862
[  254.177681] hardirqs last  enabled at (130862):
[<ffffffff812d0812>] call_rcu+0x2a2/0x640
[  254.178561] hardirqs last disabled at (130861):
[<ffffffff812d08bd>] call_rcu+0x34d/0x640
[  254.179404] softirqs last  enabled at (130814):
[<ffffffff83c00534>] __do_softirq+0x534/0x835
[  254.180332] softirqs last disabled at (130839):
[<ffffffff811389f7>] irq_exit_rcu+0xe7/0x120
[  254.181255]
[  254.181255] other info that might help us debug this:
[  254.181969]  Possible unsafe locking scenario:
[  254.183172]   lock(&r->producer_lock);
[  254.183590]   <Interrupt>
[  254.183893]     lock(&r->producer_lock);
[  254.184321]
[  254.184321]  *** DEADLOCK ***
[  254.184321]
[  254.185047] 5 locks held by swapper/7/0:
[  254.185501]  #0: ffff8881f6d89db8 ((&ndev->rs_timer)){+.-.}-{0:0},
at: call_timer_fn+0xc8/0x440
[  254.186496]  #1: ffffffff854415e0 (rcu_read_lock){....}-{1:2}, at:
ndisc_send_skb+0x761/0x12e0
[  254.187444]  #2: ffffffff85441580 (rcu_read_lock_bh){....}-{1:2},
at: ip6_finish_output2+0x2da/0x1e00
[  254.188447]  #3: ffffffff85441580 (rcu_read_lock_bh){....}-{1:2},
at: __dev_queue_xmit+0x1de/0x2910
[  254.189502]  #4: ffffffff854415e0 (rcu_read_lock){....}-{1:2}, at:
veth_xmit+0x41/0x830
[  254.190455]
[  254.190455] stack backtrace:
[  254.190963] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           O
  5.16.0-rc7-02011-g64923127f1b3 #3784
[  254.192109] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[  254.193427] Call Trace:
[  254.193711]  <IRQ>
[  254.193945]  dump_stack_lvl+0x44/0x57
[  254.194418]  mark_lock.part.54+0x157b/0x2210
[  254.194940]  ? mark_lock.part.54+0xfd/0x2210
[  254.195451]  ? print_usage_bug+0x80/0x80
[  254.195896]  ? rcu_read_lock_sched_held+0x91/0xc0
[  254.196413]  ? rcu_read_lock_bh_held+0xa0/0xa0
[  254.196903]  ? rcu_read_lock_bh_held+0xa0/0xa0
[  254.197389]  ? find_held_lock+0x33/0x1c0
[  254.197826]  ? lock_release+0x3a1/0x650
[  254.198251]  ? __stack_depot_save+0x274/0x490
[  254.198742]  ? lock_acquire+0x19a/0x450
[  254.199175]  ? lock_downgrade+0x690/0x690
[  254.199626]  ? do_raw_spin_lock+0x11d/0x270
[  254.200091]  ? rwlock_bug.part.2+0x90/0x90
[  254.200550]  __lock_acquire+0x151f/0x6310
[  254.201000]  ? mark_lock.part.54+0xfd/0x2210
[  254.201470]  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
[  254.202083]  ? lock_is_held_type+0xda/0x130
[  254.202592]  ? rcu_read_lock_sched_held+0x91/0xc0
[  254.203134]  ? rcu_read_lock_bh_held+0xa0/0xa0
[  254.203630]  lock_acquire+0x18a/0x450
[  254.204041]  ? veth_xmit+0x361/0x830
[  254.204455]  ? lock_release+0x650/0x650
[  254.204932]  ? eth_gro_receive+0xc60/0xc60
[  254.205421]  ? rcu_read_lock_held+0x91/0xa0
[  254.205912]  _raw_spin_lock+0x2f/0x40
[  254.206314]  ? veth_xmit+0x361/0x830
[  254.206707]  veth_xmit+0x361/0x830

I suspect it points out that local_bh_disable is needed
around xdp_do_flush.

That's why I asked you to test it with something
more than 3 in NUM_PKTS.
What values did you test it with? I hope not just 10.

Please make sure XDP_PASS/TX/REDIRECT are all stress tested.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
  2022-01-07  0:50   ` Alexei Starovoitov
@ 2022-01-07 15:54     ` Toke Høiland-Jørgensen
  2022-01-07 16:05       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-07 15:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Shuah Khan, Network Development, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Thu, Jan 6, 2022 at 11:59 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> +
>> +#define NUM_PKTS 10
>
> I'm afraid this needs more work.
> Just bumping above to 1M I got:
> [  254.165911] ================================
> [  254.166387] WARNING: inconsistent lock state
> [  254.166882] 5.16.0-rc7-02011-g64923127f1b3 #3784 Tainted: G           O
> [  254.167659] --------------------------------
> [  254.168140] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [  254.168793] swapper/7/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
> [  254.169373] ffff888113d24220 (&r->producer_lock){+.?.}-{2:2}, at:
> veth_xmit+0x361/0x830
> [  254.170317] {SOFTIRQ-ON-W} state was registered at:
> [  254.170921]   lock_acquire+0x18a/0x450
> [  254.171371]   _raw_spin_lock+0x2f/0x40
> [  254.171815]   veth_xdp_xmit+0x1d7/0x8c0
> [  254.172241]   veth_ndo_xdp_xmit+0x1d/0x50
> [  254.172689]   bq_xmit_all+0x562/0xc30
> [  254.173159]   __dev_flush+0xb1/0x220
> [  254.173586]   xdp_do_flush+0xa/0x20
> [  254.173983]   xdp_test_run_batch.constprop.25+0x90c/0xf00
> [  254.174564]   bpf_test_run_xdp_live+0x369/0x480
> [  254.175038]   bpf_prog_test_run_xdp+0x63f/0xe50
> [  254.175512]   __sys_bpf+0x688/0x4410
> [  254.175923]   __x64_sys_bpf+0x75/0xb0
> [  254.176327]   do_syscall_64+0x34/0x80
> [  254.176733]   entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  254.177297] irq event stamp: 130862
> [  254.177681] hardirqs last  enabled at (130862):
> [<ffffffff812d0812>] call_rcu+0x2a2/0x640
> [  254.178561] hardirqs last disabled at (130861):
> [<ffffffff812d08bd>] call_rcu+0x34d/0x640
> [  254.179404] softirqs last  enabled at (130814):
> [<ffffffff83c00534>] __do_softirq+0x534/0x835
> [  254.180332] softirqs last disabled at (130839):
> [<ffffffff811389f7>] irq_exit_rcu+0xe7/0x120
> [  254.181255]
> [  254.181255] other info that might help us debug this:
> [  254.181969]  Possible unsafe locking scenario:
> [  254.183172]   lock(&r->producer_lock);
> [  254.183590]   <Interrupt>
> [  254.183893]     lock(&r->producer_lock);
> [  254.184321]
> [  254.184321]  *** DEADLOCK ***
> [  254.184321]
> [  254.185047] 5 locks held by swapper/7/0:
> [  254.185501]  #0: ffff8881f6d89db8 ((&ndev->rs_timer)){+.-.}-{0:0},
> at: call_timer_fn+0xc8/0x440
> [  254.186496]  #1: ffffffff854415e0 (rcu_read_lock){....}-{1:2}, at:
> ndisc_send_skb+0x761/0x12e0
> [  254.187444]  #2: ffffffff85441580 (rcu_read_lock_bh){....}-{1:2},
> at: ip6_finish_output2+0x2da/0x1e00
> [  254.188447]  #3: ffffffff85441580 (rcu_read_lock_bh){....}-{1:2},
> at: __dev_queue_xmit+0x1de/0x2910
> [  254.189502]  #4: ffffffff854415e0 (rcu_read_lock){....}-{1:2}, at:
> veth_xmit+0x41/0x830
> [  254.190455]
> [  254.190455] stack backtrace:
> [  254.190963] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           O
>   5.16.0-rc7-02011-g64923127f1b3 #3784
> [  254.192109] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> [  254.193427] Call Trace:
> [  254.193711]  <IRQ>
> [  254.193945]  dump_stack_lvl+0x44/0x57
> [  254.194418]  mark_lock.part.54+0x157b/0x2210
> [  254.194940]  ? mark_lock.part.54+0xfd/0x2210
> [  254.195451]  ? print_usage_bug+0x80/0x80
> [  254.195896]  ? rcu_read_lock_sched_held+0x91/0xc0
> [  254.196413]  ? rcu_read_lock_bh_held+0xa0/0xa0
> [  254.196903]  ? rcu_read_lock_bh_held+0xa0/0xa0
> [  254.197389]  ? find_held_lock+0x33/0x1c0
> [  254.197826]  ? lock_release+0x3a1/0x650
> [  254.198251]  ? __stack_depot_save+0x274/0x490
> [  254.198742]  ? lock_acquire+0x19a/0x450
> [  254.199175]  ? lock_downgrade+0x690/0x690
> [  254.199626]  ? do_raw_spin_lock+0x11d/0x270
> [  254.200091]  ? rwlock_bug.part.2+0x90/0x90
> [  254.200550]  __lock_acquire+0x151f/0x6310
> [  254.201000]  ? mark_lock.part.54+0xfd/0x2210
> [  254.201470]  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
> [  254.202083]  ? lock_is_held_type+0xda/0x130
> [  254.202592]  ? rcu_read_lock_sched_held+0x91/0xc0
> [  254.203134]  ? rcu_read_lock_bh_held+0xa0/0xa0
> [  254.203630]  lock_acquire+0x18a/0x450
> [  254.204041]  ? veth_xmit+0x361/0x830
> [  254.204455]  ? lock_release+0x650/0x650
> [  254.204932]  ? eth_gro_receive+0xc60/0xc60
> [  254.205421]  ? rcu_read_lock_held+0x91/0xa0
> [  254.205912]  _raw_spin_lock+0x2f/0x40
> [  254.206314]  ? veth_xmit+0x361/0x830
> [  254.206707]  veth_xmit+0x361/0x830
>
> I suspect it points out that local_bh_disable is needed
> around xdp_do_flush.
>
> That's why I asked you to test it with something
> more than 3 in NUM_PKTS.
> What values did you test it with? I hope not just 10.
>
> Please make sure XDP_PASS/TX/REDIRECT are all stress tested.

Okay, finally managed to reproduce this; it did not show up at all in my
own tests.

Did you run the old version of the selftest by any chance? At least I
can only reproduce it with the forwarding sysctl enabled; it happens
because the XDP_PASS path races with the XDP_REDIRECT path and end up
trying to grab the same lock, which only happens when the XDP_PASS path
sends the packets back out the same interface. The fix is to extend the
local_bh_disable() to cover the full loop in xdp_test_run_batch().

I'll send an update with that fixed. But I'm not sure what to do about
the selftest? I can keep the forwarding enabled + 1 million iterations -
that seems to trigger the bug fairly reliably for me, but it takes a bit
longer to run. Is that acceptable?

-Toke


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
  2022-01-07 15:54     ` Toke Høiland-Jørgensen
@ 2022-01-07 16:05       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-01-07 16:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Shuah Khan, Network Development, bpf

Toke Høiland-Jørgensen <toke@redhat.com> writes:

> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
>> On Thu, Jan 6, 2022 at 11:59 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> +
>>> +#define NUM_PKTS 10
>>
>> I'm afraid this needs more work.
>> Just bumping above to 1M I got:
>> [  254.165911] ================================
>> [  254.166387] WARNING: inconsistent lock state
>> [  254.166882] 5.16.0-rc7-02011-g64923127f1b3 #3784 Tainted: G           O
>> [  254.167659] --------------------------------
>> [  254.168140] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
>> [  254.168793] swapper/7/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
>> [  254.169373] ffff888113d24220 (&r->producer_lock){+.?.}-{2:2}, at:
>> veth_xmit+0x361/0x830
>> [  254.170317] {SOFTIRQ-ON-W} state was registered at:
>> [  254.170921]   lock_acquire+0x18a/0x450
>> [  254.171371]   _raw_spin_lock+0x2f/0x40
>> [  254.171815]   veth_xdp_xmit+0x1d7/0x8c0
>> [  254.172241]   veth_ndo_xdp_xmit+0x1d/0x50
>> [  254.172689]   bq_xmit_all+0x562/0xc30
>> [  254.173159]   __dev_flush+0xb1/0x220
>> [  254.173586]   xdp_do_flush+0xa/0x20
>> [  254.173983]   xdp_test_run_batch.constprop.25+0x90c/0xf00
>> [  254.174564]   bpf_test_run_xdp_live+0x369/0x480
>> [  254.175038]   bpf_prog_test_run_xdp+0x63f/0xe50
>> [  254.175512]   __sys_bpf+0x688/0x4410
>> [  254.175923]   __x64_sys_bpf+0x75/0xb0
>> [  254.176327]   do_syscall_64+0x34/0x80
>> [  254.176733]   entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [  254.177297] irq event stamp: 130862
>> [  254.177681] hardirqs last  enabled at (130862):
>> [<ffffffff812d0812>] call_rcu+0x2a2/0x640
>> [  254.178561] hardirqs last disabled at (130861):
>> [<ffffffff812d08bd>] call_rcu+0x34d/0x640
>> [  254.179404] softirqs last  enabled at (130814):
>> [<ffffffff83c00534>] __do_softirq+0x534/0x835
>> [  254.180332] softirqs last disabled at (130839):
>> [<ffffffff811389f7>] irq_exit_rcu+0xe7/0x120
>> [  254.181255]
>> [  254.181255] other info that might help us debug this:
>> [  254.181969]  Possible unsafe locking scenario:
>> [  254.183172]   lock(&r->producer_lock);
>> [  254.183590]   <Interrupt>
>> [  254.183893]     lock(&r->producer_lock);
>> [  254.184321]
>> [  254.184321]  *** DEADLOCK ***
>> [  254.184321]
>> [  254.185047] 5 locks held by swapper/7/0:
>> [  254.185501]  #0: ffff8881f6d89db8 ((&ndev->rs_timer)){+.-.}-{0:0},
>> at: call_timer_fn+0xc8/0x440
>> [  254.186496]  #1: ffffffff854415e0 (rcu_read_lock){....}-{1:2}, at:
>> ndisc_send_skb+0x761/0x12e0
>> [  254.187444]  #2: ffffffff85441580 (rcu_read_lock_bh){....}-{1:2},
>> at: ip6_finish_output2+0x2da/0x1e00
>> [  254.188447]  #3: ffffffff85441580 (rcu_read_lock_bh){....}-{1:2},
>> at: __dev_queue_xmit+0x1de/0x2910
>> [  254.189502]  #4: ffffffff854415e0 (rcu_read_lock){....}-{1:2}, at:
>> veth_xmit+0x41/0x830
>> [  254.190455]
>> [  254.190455] stack backtrace:
>> [  254.190963] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           O
>>   5.16.0-rc7-02011-g64923127f1b3 #3784
>> [  254.192109] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
>> [  254.193427] Call Trace:
>> [  254.193711]  <IRQ>
>> [  254.193945]  dump_stack_lvl+0x44/0x57
>> [  254.194418]  mark_lock.part.54+0x157b/0x2210
>> [  254.194940]  ? mark_lock.part.54+0xfd/0x2210
>> [  254.195451]  ? print_usage_bug+0x80/0x80
>> [  254.195896]  ? rcu_read_lock_sched_held+0x91/0xc0
>> [  254.196413]  ? rcu_read_lock_bh_held+0xa0/0xa0
>> [  254.196903]  ? rcu_read_lock_bh_held+0xa0/0xa0
>> [  254.197389]  ? find_held_lock+0x33/0x1c0
>> [  254.197826]  ? lock_release+0x3a1/0x650
>> [  254.198251]  ? __stack_depot_save+0x274/0x490
>> [  254.198742]  ? lock_acquire+0x19a/0x450
>> [  254.199175]  ? lock_downgrade+0x690/0x690
>> [  254.199626]  ? do_raw_spin_lock+0x11d/0x270
>> [  254.200091]  ? rwlock_bug.part.2+0x90/0x90
>> [  254.200550]  __lock_acquire+0x151f/0x6310
>> [  254.201000]  ? mark_lock.part.54+0xfd/0x2210
>> [  254.201470]  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
>> [  254.202083]  ? lock_is_held_type+0xda/0x130
>> [  254.202592]  ? rcu_read_lock_sched_held+0x91/0xc0
>> [  254.203134]  ? rcu_read_lock_bh_held+0xa0/0xa0
>> [  254.203630]  lock_acquire+0x18a/0x450
>> [  254.204041]  ? veth_xmit+0x361/0x830
>> [  254.204455]  ? lock_release+0x650/0x650
>> [  254.204932]  ? eth_gro_receive+0xc60/0xc60
>> [  254.205421]  ? rcu_read_lock_held+0x91/0xa0
>> [  254.205912]  _raw_spin_lock+0x2f/0x40
>> [  254.206314]  ? veth_xmit+0x361/0x830
>> [  254.206707]  veth_xmit+0x361/0x830
>>
>> I suspect it points out that local_bh_disable is needed
>> around xdp_do_flush.
>>
>> That's why I asked you to test it with something
>> more than 3 in NUM_PKTS.
>> What values did you test it with? I hope not just 10.
>>
>> Please make sure XDP_PASS/TX/REDIRECT are all stress tested.
>
> Okay, finally managed to reproduce this; it did not show up at all in my
> own tests.
>
> Did you run the old version of the selftest by any chance? At least I
> can only reproduce it with the forwarding sysctl enabled; it happens
> because the XDP_PASS path races with the XDP_REDIRECT path and end up
> trying to grab the same lock, which only happens when the XDP_PASS path
> sends the packets back out the same interface. The fix is to extend the
> local_bh_disable() to cover the full loop in xdp_test_run_batch().
>
> I'll send an update with that fixed. But I'm not sure what to do about
> the selftest? I can keep the forwarding enabled + 1 million iterations -
> that seems to trigger the bug fairly reliably for me, but it takes a bit
> longer to run. Is that acceptable?

The absolute difference is just over three seconds on my machine, BTW:

1M pkts:

[root@(none) bpf]# time ./test_progs -a xdp_do_redirect
#221 xdp_do_redirect:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

real	0m5,042s
user	0m0,109s
sys	0m3,968s

10 pkts:

[root@(none) bpf]# time ./test_progs -a xdp_do_redirect
#221 xdp_do_redirect:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

real	0m1,823s
user	0m0,117s
sys	0m0,685s


-Toke


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-07 16:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06 19:59 [PATCH bpf-next v6 0/3] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
2022-01-06 19:59 ` [PATCH bpf-next v6 1/3] bpf: Add "live packet" mode for " Toke Høiland-Jørgensen
2022-01-06 19:59 ` [PATCH bpf-next v6 2/3] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c Toke Høiland-Jørgensen
2022-01-06 19:59 ` [PATCH bpf-next v6 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
2022-01-07  0:50   ` Alexei Starovoitov
2022-01-07 15:54     ` Toke Høiland-Jørgensen
2022-01-07 16:05       ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.