bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run()
@ 2021-12-11 18:41 Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 1/8] xdp: Allow registering memory model without rxq reference Toke Høiland-Jørgensen
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh
  Cc: Toke Høiland-Jørgensen, netdev, bpf

This series adds support for transmitting packets using XDP in
bpf_prog_run(), by enabling the xdp_do_redirect() callback so XDP programs
can perform "real" redirects to devices or maps, using an opt-in flag when
executing the program.

The primary use case for this is testing the redirect map types and the
ndo_xdp_xmit driver operation without generating external traffic. But it
turns out to also be useful for creating a programmable traffic generator.
The last patch adds a sample traffic generator to bpf/samples, which
can transmit up to 11.5 Mpps/core on my test machine.

To transmit the frames, the new mode instantiates a page_pool structure in
bpf_prog_run() and initialises the pages with the data passed in by
userspace. These pages can then be redirected using the normal redirection
mechanism, and the existing page_pool code takes care of returning and
recycling them. The setup is optimised for high performance with a high
number of repetitions to support stress testing and the traffic generator
use case; see patch 6 for details.

The series is structured as follows: Patches 1-2 adds a few features to
page_pool that are needed for the usage in bpf_prog_run(). Similarly,
patches 3-5 performs a couple of preparatory refactorings of the XDP
redirect and memory management code. Patch 6 adds the support to
bpf_prog_run() itself, patch 7 adds a selftest, and patch 8 adds the
traffic generator example to samples/bpf.

v3:
- Reorder patches to make sure they all build individually (Patchwork)
- Remove a couple of unused variables (Patchwork)
- Remove unlikely() annotation in slow path and add back John's ACK that I
  accidentally dropped for v2 (John)

v2:
- Split up up __xdp_do_redirect to avoid passing two pointers to it (John)
- Always reset context pointers before each test run (John)
- Use get_mac_addr() from xdp_sample_user.h instead of rolling our own (Kumar)
- Fix wrong offset for metadata pointer

Toke Høiland-Jørgensen (8):
  xdp: Allow registering memory model without rxq reference
  page_pool: Add callback to init pages when they are allocated
  page_pool: Store the XDP mem id
  xdp: Move conversion to xdp_frame out of map functions
  xdp: add xdp_do_redirect_frame() for pre-computed xdp_frames
  bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
  selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
  samples/bpf: Add xdp_trafficgen sample

 include/linux/bpf.h                           |  20 +-
 include/linux/filter.h                        |   4 +
 include/net/page_pool.h                       |  11 +-
 include/net/xdp.h                             |   3 +
 include/uapi/linux/bpf.h                      |   2 +
 kernel/bpf/Kconfig                            |   1 +
 kernel/bpf/cpumap.c                           |   8 +-
 kernel/bpf/devmap.c                           |  32 +-
 net/bpf/test_run.c                            | 217 ++++++++-
 net/core/filter.c                             |  73 ++-
 net/core/page_pool.c                          |   6 +-
 net/core/xdp.c                                |  94 ++--
 samples/bpf/.gitignore                        |   1 +
 samples/bpf/Makefile                          |   4 +
 samples/bpf/xdp_redirect.bpf.c                |  34 ++
 samples/bpf/xdp_trafficgen_user.c             | 421 ++++++++++++++++++
 tools/include/uapi/linux/bpf.h                |   2 +
 .../bpf/prog_tests/xdp_do_redirect.c          |  74 +++
 .../bpf/progs/test_xdp_do_redirect.c          |  34 ++
 19 files changed, 947 insertions(+), 94 deletions(-)
 create mode 100644 samples/bpf/xdp_trafficgen_user.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c

-- 
2.34.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 1/8] xdp: Allow registering memory model without rxq reference
  2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
@ 2021-12-11 18:41 ` Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 2/8] page_pool: Add callback to init pages when they are allocated Toke Høiland-Jørgensen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh
  Cc: Toke Høiland-Jørgensen, netdev, bpf

The functions that register an XDP memory model take a struct xdp_rxq as
parameter, but the RXQ is not actually used for anything other than pulling
out the struct xdp_mem_info that it embeds. So refactor the register
functions and export variants that just take a pointer to the xdp_mem_info.

This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a
page_pool instance that is not connected to any network device.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/xdp.h |  3 ++
 net/core/xdp.c    | 92 +++++++++++++++++++++++++++++++----------------
 2 files changed, 65 insertions(+), 30 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 447f9b1578f3..8f0812e4996d 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -260,6 +260,9 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
 int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 			       enum xdp_mem_type type, void *allocator);
 void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq);
+int xdp_reg_mem_model(struct xdp_mem_info *mem,
+		      enum xdp_mem_type type, void *allocator);
+void xdp_unreg_mem_model(struct xdp_mem_info *mem);
 
 /* Drivers not supporting XDP metadata can use this helper, which
  * rejects any room expansion for metadata as a result.
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 5ddc29f29bad..ac476c84a986 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -110,20 +110,15 @@ static void mem_allocator_disconnect(void *allocator)
 	mutex_unlock(&mem_id_lock);
 }
 
-void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq)
+void xdp_unreg_mem_model(struct xdp_mem_info *mem)
 {
 	struct xdp_mem_allocator *xa;
-	int type = xdp_rxq->mem.type;
-	int id = xdp_rxq->mem.id;
+	int type = mem->type;
+	int id = mem->id;
 
 	/* Reset mem info to defaults */
-	xdp_rxq->mem.id = 0;
-	xdp_rxq->mem.type = 0;
-
-	if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
-		WARN(1, "Missing register, driver bug");
-		return;
-	}
+	mem->id = 0;
+	mem->type = 0;
 
 	if (id == 0)
 		return;
@@ -135,6 +130,17 @@ void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq)
 		rcu_read_unlock();
 	}
 }
+EXPORT_SYMBOL_GPL(xdp_unreg_mem_model);
+
+void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq)
+{
+	if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
+		WARN(1, "Missing register, driver bug");
+		return;
+	}
+
+	xdp_unreg_mem_model(&xdp_rxq->mem);
+}
 EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg_mem_model);
 
 void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq)
@@ -259,28 +265,24 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
 	return true;
 }
 
-int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
-			       enum xdp_mem_type type, void *allocator)
+static struct xdp_mem_allocator *__xdp_reg_mem_model(struct xdp_mem_info *mem,
+						     enum xdp_mem_type type,
+						     void *allocator)
 {
 	struct xdp_mem_allocator *xdp_alloc;
 	gfp_t gfp = GFP_KERNEL;
 	int id, errno, ret;
 	void *ptr;
 
-	if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
-		WARN(1, "Missing register, driver bug");
-		return -EFAULT;
-	}
-
 	if (!__is_supported_mem_type(type))
-		return -EOPNOTSUPP;
+		return ERR_PTR(-EOPNOTSUPP);
 
-	xdp_rxq->mem.type = type;
+	mem->type = type;
 
 	if (!allocator) {
 		if (type == MEM_TYPE_PAGE_POOL)
-			return -EINVAL; /* Setup time check page_pool req */
-		return 0;
+			return ERR_PTR(-EINVAL); /* Setup time check page_pool req */
+		return NULL;
 	}
 
 	/* Delay init of rhashtable to save memory if feature isn't used */
@@ -290,13 +292,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 		mutex_unlock(&mem_id_lock);
 		if (ret < 0) {
 			WARN_ON(1);
-			return ret;
+			return ERR_PTR(ret);
 		}
 	}
 
 	xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
 	if (!xdp_alloc)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	mutex_lock(&mem_id_lock);
 	id = __mem_id_cyclic_get(gfp);
@@ -304,15 +306,15 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 		errno = id;
 		goto err;
 	}
-	xdp_rxq->mem.id = id;
-	xdp_alloc->mem  = xdp_rxq->mem;
+	mem->id = id;
+	xdp_alloc->mem = *mem;
 	xdp_alloc->allocator = allocator;
 
 	/* Insert allocator into ID lookup table */
 	ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
 	if (IS_ERR(ptr)) {
-		ida_simple_remove(&mem_id_pool, xdp_rxq->mem.id);
-		xdp_rxq->mem.id = 0;
+		ida_simple_remove(&mem_id_pool, mem->id);
+		mem->id = 0;
 		errno = PTR_ERR(ptr);
 		goto err;
 	}
@@ -322,13 +324,43 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
 
 	mutex_unlock(&mem_id_lock);
 
-	trace_mem_connect(xdp_alloc, xdp_rxq);
-	return 0;
+	return xdp_alloc;
 err:
 	mutex_unlock(&mem_id_lock);
 	kfree(xdp_alloc);
-	return errno;
+	return ERR_PTR(errno);
+}
+
+int xdp_reg_mem_model(struct xdp_mem_info *mem,
+		      enum xdp_mem_type type, void *allocator)
+{
+	struct xdp_mem_allocator *xdp_alloc;
+
+	xdp_alloc = __xdp_reg_mem_model(mem, type, allocator);
+	if (IS_ERR(xdp_alloc))
+		return PTR_ERR(xdp_alloc);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xdp_reg_mem_model);
+
+int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
+			       enum xdp_mem_type type, void *allocator)
+{
+	struct xdp_mem_allocator *xdp_alloc;
+
+	if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
+		WARN(1, "Missing register, driver bug");
+		return -EFAULT;
+	}
+
+	xdp_alloc = __xdp_reg_mem_model(&xdp_rxq->mem, type, allocator);
+	if (IS_ERR(xdp_alloc))
+		return PTR_ERR(xdp_alloc);
+
+	trace_mem_connect(xdp_alloc, xdp_rxq);
+	return 0;
 }
+
 EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
 
 /* XDP RX runs under NAPI protection, and in different delivery error
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 2/8] page_pool: Add callback to init pages when they are allocated
  2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 1/8] xdp: Allow registering memory model without rxq reference Toke Høiland-Jørgensen
@ 2021-12-11 18:41 ` Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 3/8] page_pool: Store the XDP mem id Toke Høiland-Jørgensen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh
  Cc: Toke Høiland-Jørgensen, netdev, bpf

Add a new callback function to page_pool that, if set, will be called every
time a new page is allocated. This will be used from bpf_test_run() to
initialise the page data with the data provided by userspace when running
XDP programs with redirect turned on.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/page_pool.h | 2 ++
 net/core/page_pool.c    | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 3855f069627f..a71201854c41 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -80,6 +80,8 @@ struct page_pool_params {
 	enum dma_data_direction dma_dir; /* DMA mapping direction */
 	unsigned int	max_len; /* max DMA sync memory size */
 	unsigned int	offset;  /* DMA addr offset */
+	void (*init_callback)(struct page *page, void *arg);
+	void *init_arg;
 };
 
 struct page_pool {
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9b60e4301a44..c3b134a86ec9 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -219,6 +219,8 @@ static void page_pool_set_pp_info(struct page_pool *pool,
 {
 	page->pp = pool;
 	page->pp_magic |= PP_SIGNATURE;
+	if (pool->p.init_callback)
+		pool->p.init_callback(page, pool->p.init_arg);
 }
 
 static void page_pool_clear_pp_info(struct page *page)
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 3/8] page_pool: Store the XDP mem id
  2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 1/8] xdp: Allow registering memory model without rxq reference Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 2/8] page_pool: Add callback to init pages when they are allocated Toke Høiland-Jørgensen
@ 2021-12-11 18:41 ` Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 4/8] xdp: Move conversion to xdp_frame out of map functions Toke Høiland-Jørgensen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh
  Cc: Toke Høiland-Jørgensen, netdev, bpf

Store the XDP mem ID inside the page_pool struct so it can be retrieved
later for use in bpf_prog_run().

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/page_pool.h | 9 +++++++--
 net/core/page_pool.c    | 4 +++-
 net/core/xdp.c          | 2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index a71201854c41..6bc0409c4ffd 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -96,6 +96,7 @@ struct page_pool {
 	unsigned int frag_offset;
 	struct page *frag_page;
 	long frag_users;
+	u32 xdp_mem_id;
 
 	/*
 	 * Data structure for allocation side
@@ -170,9 +171,12 @@ bool page_pool_return_skb_page(struct page *page);
 
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 
+struct xdp_mem_info;
+
 #ifdef CONFIG_PAGE_POOL
 void page_pool_destroy(struct page_pool *pool);
-void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *));
+void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
+			   struct xdp_mem_info *mem);
 void page_pool_release_page(struct page_pool *pool, struct page *page);
 void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 			     int count);
@@ -182,7 +186,8 @@ static inline void page_pool_destroy(struct page_pool *pool)
 }
 
 static inline void page_pool_use_xdp_mem(struct page_pool *pool,
-					 void (*disconnect)(void *))
+					 void (*disconnect)(void *),
+					 struct xdp_mem_info *mem)
 {
 }
 static inline void page_pool_release_page(struct page_pool *pool,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index c3b134a86ec9..8442a7965602 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -695,10 +695,12 @@ static void page_pool_release_retry(struct work_struct *wq)
 	schedule_delayed_work(&pool->release_dw, DEFER_TIME);
 }
 
-void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *))
+void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
+			   struct xdp_mem_info *mem)
 {
 	refcount_inc(&pool->user_cnt);
 	pool->disconnect = disconnect;
+	pool->xdp_mem_id = mem->id;
 }
 
 void page_pool_destroy(struct page_pool *pool)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index ac476c84a986..2901bb7004cc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -320,7 +320,7 @@ static struct xdp_mem_allocator *__xdp_reg_mem_model(struct xdp_mem_info *mem,
 	}
 
 	if (type == MEM_TYPE_PAGE_POOL)
-		page_pool_use_xdp_mem(allocator, mem_allocator_disconnect);
+		page_pool_use_xdp_mem(allocator, mem_allocator_disconnect, mem);
 
 	mutex_unlock(&mem_id_lock);
 
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 4/8] xdp: Move conversion to xdp_frame out of map functions
  2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
                   ` (2 preceding siblings ...)
  2021-12-11 18:41 ` [PATCH bpf-next v3 3/8] page_pool: Store the XDP mem id Toke Høiland-Jørgensen
@ 2021-12-11 18:41 ` Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 5/8] xdp: add xdp_do_redirect_frame() for pre-computed xdp_frames Toke Høiland-Jørgensen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, netdev, bpf

All map redirect functions except XSK maps convert xdp_buff to xdp_frame
before enqueueing it. So move this conversion of out the map functions
and into xdp_do_redirect(). This removes a bit of duplicated code, but more
importantly it makes it possible to support caller-allocated xdp_frame
structures, which will be added in a subsequent commit.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/bpf.h | 20 ++++++++++----------
 kernel/bpf/cpumap.c |  8 +-------
 kernel/bpf/devmap.c | 32 +++++++++++---------------------
 net/core/filter.c   | 24 +++++++++++++++++-------
 4 files changed, 39 insertions(+), 45 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8bbf08fbab66..691bb397500e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1621,17 +1621,17 @@ void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
 struct btf *bpf_get_btf_vmlinux(void);
 
 /* Map specifics */
-struct xdp_buff;
+struct xdp_frame;
 struct sk_buff;
 struct bpf_dtab_netdev;
 struct bpf_cpu_map_entry;
 
 void __dev_flush(void);
-int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 		    struct net_device *dev_rx);
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf,
 		    struct net_device *dev_rx);
-int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
+int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
 			  struct bpf_map *map, bool exclude_ingress);
 int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
 			     struct bpf_prog *xdp_prog);
@@ -1640,7 +1640,7 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 			   bool exclude_ingress);
 
 void __cpu_map_flush(void);
-int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
+int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf,
 		    struct net_device *dev_rx);
 int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu,
 			     struct sk_buff *skb);
@@ -1818,26 +1818,26 @@ static inline void __dev_flush(void)
 {
 }
 
-struct xdp_buff;
+struct xdp_frame;
 struct bpf_dtab_netdev;
 struct bpf_cpu_map_entry;
 
 static inline
-int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 		    struct net_device *dev_rx)
 {
 	return 0;
 }
 
 static inline
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf,
 		    struct net_device *dev_rx)
 {
 	return 0;
 }
 
 static inline
-int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
+int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
 			  struct bpf_map *map, bool exclude_ingress)
 {
 	return 0;
@@ -1865,7 +1865,7 @@ static inline void __cpu_map_flush(void)
 }
 
 static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu,
-				  struct xdp_buff *xdp,
+				  struct xdp_frame *xdpf,
 				  struct net_device *dev_rx)
 {
 	return 0;
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 585b2b77ccc4..12798b2c68d9 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -746,15 +746,9 @@ static void bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf)
 		list_add(&bq->flush_node, flush_list);
 }
 
-int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
+int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf,
 		    struct net_device *dev_rx)
 {
-	struct xdp_frame *xdpf;
-
-	xdpf = xdp_convert_buff_to_frame(xdp);
-	if (unlikely(!xdpf))
-		return -EOVERFLOW;
-
 	/* Info needed when constructing SKB on remote CPU */
 	xdpf->dev_rx = dev_rx;
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f02d04540c0c..f29f439fac76 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -467,24 +467,19 @@ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 	bq->q[bq->count++] = xdpf;
 }
 
-static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+static inline int __xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 				struct net_device *dev_rx,
 				struct bpf_prog *xdp_prog)
 {
-	struct xdp_frame *xdpf;
 	int err;
 
 	if (!dev->netdev_ops->ndo_xdp_xmit)
 		return -EOPNOTSUPP;
 
-	err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data);
+	err = xdp_ok_fwd_dev(dev, xdpf->len);
 	if (unlikely(err))
 		return err;
 
-	xdpf = xdp_convert_buff_to_frame(xdp);
-	if (unlikely(!xdpf))
-		return -EOVERFLOW;
-
 	bq_enqueue(dev, xdpf, dev_rx, xdp_prog);
 	return 0;
 }
@@ -520,27 +515,27 @@ static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_dtab_netdev
 	return act;
 }
 
-int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 		    struct net_device *dev_rx)
 {
-	return __xdp_enqueue(dev, xdp, dev_rx, NULL);
+	return __xdp_enqueue(dev, xdpf, dev_rx, NULL);
 }
 
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf,
 		    struct net_device *dev_rx)
 {
 	struct net_device *dev = dst->dev;
 
-	return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog);
+	return __xdp_enqueue(dev, xdpf, dev_rx, dst->xdp_prog);
 }
 
-static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp)
+static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf)
 {
 	if (!obj ||
 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
 		return false;
 
-	if (xdp_ok_fwd_dev(obj->dev, xdp->data_end - xdp->data))
+	if (xdp_ok_fwd_dev(obj->dev, xdpf->len))
 		return false;
 
 	return true;
@@ -586,14 +581,13 @@ static int get_upper_ifindexes(struct net_device *dev, int *indexes)
 	return n;
 }
 
-int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
+int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
 			  struct bpf_map *map, bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
 	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
-	struct xdp_frame *xdpf;
 	int num_excluded = 0;
 	unsigned int i;
 	int err;
@@ -603,15 +597,11 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 		excluded_devices[num_excluded++] = dev_rx->ifindex;
 	}
 
-	xdpf = xdp_convert_buff_to_frame(xdp);
-	if (unlikely(!xdpf))
-		return -EOVERFLOW;
-
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = rcu_dereference_check(dtab->netdev_map[i],
 						    rcu_read_lock_bh_held());
-			if (!is_valid_dst(dst, xdp))
+			if (!is_valid_dst(dst, xdpf))
 				continue;
 
 			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
@@ -634,7 +624,7 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_rcu(dst, head, index_hlist,
 						 lockdep_is_held(&dtab->index_lock)) {
-				if (!is_valid_dst(dst, xdp))
+				if (!is_valid_dst(dst, xdpf))
 					continue;
 
 				if (is_ifindex_excluded(excluded_devices, num_excluded,
diff --git a/net/core/filter.c b/net/core/filter.c
index fe27c91e3758..bfa4ffbced35 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3964,12 +3964,24 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 	enum bpf_map_type map_type = ri->map_type;
 	void *fwd = ri->tgt_value;
 	u32 map_id = ri->map_id;
+	struct xdp_frame *xdpf;
 	struct bpf_map *map;
 	int err;
 
 	ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
 	ri->map_type = BPF_MAP_TYPE_UNSPEC;
 
+	if (map_type == BPF_MAP_TYPE_XSKMAP) {
+		err = __xsk_map_redirect(fwd, xdp);
+		goto out;
+	}
+
+	xdpf = xdp_convert_buff_to_frame(xdp);
+	if (unlikely(!xdpf)) {
+		err = -EOVERFLOW;
+		goto err;
+	}
+
 	switch (map_type) {
 	case BPF_MAP_TYPE_DEVMAP:
 		fallthrough;
@@ -3977,17 +3989,14 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		map = READ_ONCE(ri->map);
 		if (unlikely(map)) {
 			WRITE_ONCE(ri->map, NULL);
-			err = dev_map_enqueue_multi(xdp, dev, map,
+			err = dev_map_enqueue_multi(xdpf, dev, map,
 						    ri->flags & BPF_F_EXCLUDE_INGRESS);
 		} else {
-			err = dev_map_enqueue(fwd, xdp, dev);
+			err = dev_map_enqueue(fwd, xdpf, dev);
 		}
 		break;
 	case BPF_MAP_TYPE_CPUMAP:
-		err = cpu_map_enqueue(fwd, xdp, dev);
-		break;
-	case BPF_MAP_TYPE_XSKMAP:
-		err = __xsk_map_redirect(fwd, xdp);
+		err = cpu_map_enqueue(fwd, xdpf, dev);
 		break;
 	case BPF_MAP_TYPE_UNSPEC:
 		if (map_id == INT_MAX) {
@@ -3996,7 +4005,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 				err = -EINVAL;
 				break;
 			}
-			err = dev_xdp_enqueue(fwd, xdp, dev);
+			err = dev_xdp_enqueue(fwd, xdpf, dev);
 			break;
 		}
 		fallthrough;
@@ -4004,6 +4013,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		err = -EBADRQC;
 	}
 
+out:
 	if (unlikely(err))
 		goto err;
 
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 5/8] xdp: add xdp_do_redirect_frame() for pre-computed xdp_frames
  2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
                   ` (3 preceding siblings ...)
  2021-12-11 18:41 ` [PATCH bpf-next v3 4/8] xdp: Move conversion to xdp_frame out of map functions Toke Høiland-Jørgensen
@ 2021-12-11 18:41 ` Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run() Toke Høiland-Jørgensen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, netdev, bpf

Add an xdp_do_redirect_frame() variant which supports pre-computed
xdp_frame structures. This will be used in bpf_prog_run() to avoid having
to write to the xdp_frame structure when the XDP program doesn't modify the
frame boundaries.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/filter.h |  4 +++
 net/core/filter.c      | 65 +++++++++++++++++++++++++++++++++++-------
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index b6a216eb217a..845452c83e0f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1022,6 +1022,10 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
 int xdp_do_redirect(struct net_device *dev,
 		    struct xdp_buff *xdp,
 		    struct bpf_prog *prog);
+int xdp_do_redirect_frame(struct net_device *dev,
+			  struct xdp_buff *xdp,
+			  struct xdp_frame *xdpf,
+			  struct bpf_prog *prog);
 void xdp_do_flush(void);
 
 /* The xdp_do_flush_map() helper has been renamed to drop the _map suffix, as
diff --git a/net/core/filter.c b/net/core/filter.c
index bfa4ffbced35..629188642b4e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3957,26 +3957,44 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
 }
 EXPORT_SYMBOL_GPL(xdp_master_redirect);
 
-int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
-		    struct bpf_prog *xdp_prog)
+static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri,
+					struct net_device *dev,
+					struct xdp_buff *xdp,
+					struct bpf_prog *xdp_prog)
 {
-	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 	enum bpf_map_type map_type = ri->map_type;
 	void *fwd = ri->tgt_value;
 	u32 map_id = ri->map_id;
-	struct xdp_frame *xdpf;
-	struct bpf_map *map;
 	int err;
 
 	ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
 	ri->map_type = BPF_MAP_TYPE_UNSPEC;
 
-	if (map_type == BPF_MAP_TYPE_XSKMAP) {
-		err = __xsk_map_redirect(fwd, xdp);
-		goto out;
-	}
+	err = __xsk_map_redirect(fwd, xdp);
+	if (unlikely(err))
+		goto err;
+
+	_trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index);
+	return 0;
+err:
+	_trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err);
+	return err;
+}
+
+static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri,
+						   struct net_device *dev,
+						   struct xdp_frame *xdpf,
+						   struct bpf_prog *xdp_prog)
+{
+	enum bpf_map_type map_type = ri->map_type;
+	void *fwd = ri->tgt_value;
+	u32 map_id = ri->map_id;
+	struct bpf_map *map;
+	int err;
+
+	ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
+	ri->map_type = BPF_MAP_TYPE_UNSPEC;
 
-	xdpf = xdp_convert_buff_to_frame(xdp);
 	if (unlikely(!xdpf)) {
 		err = -EOVERFLOW;
 		goto err;
@@ -4013,7 +4031,6 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		err = -EBADRQC;
 	}
 
-out:
 	if (unlikely(err))
 		goto err;
 
@@ -4023,8 +4040,34 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 	_trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err);
 	return err;
 }
+
+int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
+		    struct bpf_prog *xdp_prog)
+{
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	enum bpf_map_type map_type = ri->map_type;
+
+	if (map_type == BPF_MAP_TYPE_XSKMAP)
+		return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog);
+
+	return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp),
+				       xdp_prog);
+}
 EXPORT_SYMBOL_GPL(xdp_do_redirect);
 
+int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp,
+			  struct xdp_frame *xdpf, struct bpf_prog *xdp_prog)
+{
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+	enum bpf_map_type map_type = ri->map_type;
+
+	if (map_type == BPF_MAP_TYPE_XSKMAP)
+		return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog);
+
+	return __xdp_do_redirect_frame(ri, dev, xdpf, xdp_prog);
+}
+EXPORT_SYMBOL_GPL(xdp_do_redirect_frame);
+
 static int xdp_do_generic_redirect_map(struct net_device *dev,
 				       struct sk_buff *skb,
 				       struct xdp_buff *xdp,
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
  2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
                   ` (4 preceding siblings ...)
  2021-12-11 18:41 ` [PATCH bpf-next v3 5/8] xdp: add xdp_do_redirect_frame() for pre-computed xdp_frames Toke Høiland-Jørgensen
@ 2021-12-11 18:41 ` Toke Høiland-Jørgensen
  2021-12-12  2:43   ` Alexei Starovoitov
  2021-12-11 18:41 ` [PATCH bpf-next v3 7/8] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample Toke Høiland-Jørgensen
  7 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, netdev, bpf

This adds support for doing real redirects when an XDP program returns
XDP_REDIRECT in bpf_prog_run(). To achieve this, we create a page pool
instance while setting up the test run, and feed pages from that into the
XDP program. The setup cost of this is amortised over the number of
repetitions specified by userspace.

To support performance testing use case, we further optimise the setup step
so that all pages in the pool are pre-initialised with the packet data, and
pre-computed context and xdp_frame objects stored at the start of each
page. This makes it possible to entirely avoid touching the page content on
each XDP program invocation, and enables sending up to 11.5 Mpps/core on my
test box.

Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.

Previous uses of bpf_prog_run() for XDP returned the modified packet data
and return code to userspace, which is a different semantic then this new
redirect mode. For this reason, the caller has to set the new
BPF_F_TEST_XDP_DO_REDIRECT flag when calling bpf_prog_run() to opt in to
the different semantics. Enabling this flag is only allowed if not setting
ctx_out and data_out in the test specification, since it means frames will
be redirected somewhere else, so they can't be returned.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/uapi/linux/bpf.h       |   2 +
 kernel/bpf/Kconfig             |   1 +
 net/bpf/test_run.c             | 217 +++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h |   2 +
 4 files changed, 210 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c26871263f1f..224a6e3261f5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1225,6 +1225,8 @@ enum {
 
 /* If set, run the test on the cpu specified by bpf_attr.test.cpu */
 #define BPF_F_TEST_RUN_ON_CPU	(1U << 0)
+/* If set, support performing redirection of XDP frames */
+#define BPF_F_TEST_XDP_DO_REDIRECT	(1U << 1)
 
 /* type for BPF_ENABLE_STATS */
 enum bpf_stats_type {
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index d24d518ddd63..c8c920020d11 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -30,6 +30,7 @@ config BPF_SYSCALL
 	select TASKS_TRACE_RCU
 	select BINARY_PRINTF
 	select NET_SOCK_MSG if NET
+	select PAGE_POOL if NET
 	default n
 	help
 	  Enable the bpf() system call that allows to manipulate BPF programs
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 46dd95755967..02bd0616e64d 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -14,6 +14,7 @@
 #include <net/sock.h>
 #include <net/tcp.h>
 #include <net/net_namespace.h>
+#include <net/page_pool.h>
 #include <linux/error-injection.h>
 #include <linux/smp.h>
 #include <linux/sock_diag.h>
@@ -23,19 +24,34 @@
 #include <trace/events/bpf_test_run.h>
 
 struct bpf_test_timer {
-	enum { NO_PREEMPT, NO_MIGRATE } mode;
+	enum { NO_PREEMPT, NO_MIGRATE, XDP } mode;
 	u32 i;
 	u64 time_start, time_spent;
+	struct {
+		struct xdp_buff *orig_ctx;
+		struct xdp_rxq_info rxq;
+		struct page_pool *pp;
+		u16 frame_cnt;
+	} xdp;
 };
 
 static void bpf_test_timer_enter(struct bpf_test_timer *t)
 	__acquires(rcu)
 {
 	rcu_read_lock();
-	if (t->mode == NO_PREEMPT)
+	switch (t->mode) {
+	case NO_PREEMPT:
 		preempt_disable();
-	else
+		break;
+	case XDP:
+		migrate_disable();
+		xdp_set_return_frame_no_direct();
+		t->xdp.frame_cnt = 0;
+		break;
+	case NO_MIGRATE:
 		migrate_disable();
+		break;
+	}
 
 	t->time_start = ktime_get_ns();
 }
@@ -45,10 +61,18 @@ static void bpf_test_timer_leave(struct bpf_test_timer *t)
 {
 	t->time_start = 0;
 
-	if (t->mode == NO_PREEMPT)
+	switch (t->mode) {
+	case NO_PREEMPT:
 		preempt_enable();
-	else
+		break;
+	case XDP:
+		xdp_do_flush();
+		xdp_clear_return_frame_no_direct();
+		fallthrough;
+	case NO_MIGRATE:
 		migrate_enable();
+		break;
+	}
 	rcu_read_unlock();
 }
 
@@ -87,13 +111,161 @@ static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *e
 	return false;
 }
 
+/* We put this struct at the head of each page with a context and frame
+ * initialised when the page is allocated, so we don't have to do this on each
+ * repetition of the test run.
+ */
+struct xdp_page_head {
+	struct xdp_buff orig_ctx;
+	struct xdp_buff ctx;
+	struct xdp_frame frm;
+	u8 data[];
+};
+
+#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head)	\
+			     - sizeof(struct skb_shared_info))
+
+static void bpf_test_run_xdp_init_page(struct page *page, void *arg)
+{
+	struct xdp_page_head *head = phys_to_virt(page_to_phys(page));
+	struct xdp_buff *new_ctx, *orig_ctx;
+	u32 headroom = XDP_PACKET_HEADROOM;
+	struct bpf_test_timer *t = arg;
+	size_t frm_len, meta_len;
+	struct xdp_frame *frm;
+	void *data;
+
+	orig_ctx = t->xdp.orig_ctx;
+	frm_len = orig_ctx->data_end - orig_ctx->data_meta;
+	meta_len = orig_ctx->data - orig_ctx->data_meta;
+	headroom -= meta_len;
+
+	new_ctx = &head->ctx;
+	frm = &head->frm;
+	data = &head->data;
+	memcpy(data + headroom, orig_ctx->data_meta, frm_len);
+
+	xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &t->xdp.rxq);
+	xdp_prepare_buff(new_ctx, data, headroom, frm_len, true);
+	new_ctx->data_meta = new_ctx->data + meta_len;
+
+	xdp_update_frame_from_buff(new_ctx, frm);
+	frm->mem = new_ctx->rxq->mem;
+
+	memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
+}
+
+static int bpf_test_run_xdp_setup(struct bpf_test_timer *t, struct xdp_buff *orig_ctx)
+{
+	struct xdp_mem_info mem = {};
+	struct page_pool *pp;
+	int err;
+	struct page_pool_params pp_params = {
+		.order = 0,
+		.flags = 0,
+		.pool_size = NAPI_POLL_WEIGHT * 2,
+		.nid = NUMA_NO_NODE,
+		.max_len = TEST_XDP_FRAME_SIZE,
+		.init_callback = bpf_test_run_xdp_init_page,
+		.init_arg = t,
+	};
+
+	pp = page_pool_create(&pp_params);
+	if (IS_ERR(pp))
+		return PTR_ERR(pp);
+
+	/* will copy 'mem->id' into pp->xdp_mem_id */
+	err = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pp);
+	if (err) {
+		page_pool_destroy(pp);
+		return err;
+	}
+	t->xdp.pp = pp;
+
+	/* We create a 'fake' RXQ referencing the original dev, but with an
+	 * xdp_mem_info pointing to our page_pool
+	 */
+	xdp_rxq_info_reg(&t->xdp.rxq, orig_ctx->rxq->dev, 0, 0);
+	t->xdp.rxq.mem.type = MEM_TYPE_PAGE_POOL;
+	t->xdp.rxq.mem.id = pp->xdp_mem_id;
+	t->xdp.orig_ctx = orig_ctx;
+
+	return 0;
+}
+
+static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t)
+{
+	struct xdp_mem_info mem = {
+		.id = t->xdp.pp->xdp_mem_id,
+		.type = MEM_TYPE_PAGE_POOL,
+	};
+	xdp_unreg_mem_model(&mem);
+}
+
+static bool ctx_was_changed(struct xdp_page_head *head)
+{
+	return (head->orig_ctx.data != head->ctx.data ||
+		head->orig_ctx.data_meta != head->ctx.data_meta ||
+		head->orig_ctx.data_end != head->ctx.data_end);
+}
+
+static void reset_ctx(struct xdp_page_head *head)
+{
+	if (likely(!ctx_was_changed(head)))
+		return;
+
+	head->ctx.data = head->orig_ctx.data;
+	head->ctx.data_meta = head->orig_ctx.data_meta;
+	head->ctx.data_end = head->orig_ctx.data_end;
+	xdp_update_frame_from_buff(&head->ctx, &head->frm);
+}
+
+static int bpf_test_run_xdp_redirect(struct bpf_test_timer *t,
+				     struct bpf_prog *prog, struct xdp_buff *orig_ctx)
+{
+	struct xdp_page_head *head;
+	struct xdp_buff *ctx;
+	struct page *page;
+	int ret, err = 0;
+
+	page = page_pool_dev_alloc_pages(t->xdp.pp);
+	if (!page)
+		return -ENOMEM;
+
+	head = phys_to_virt(page_to_phys(page));
+	reset_ctx(head);
+	ctx = &head->ctx;
+
+	ret = bpf_prog_run_xdp(prog, ctx);
+	if (ret == XDP_REDIRECT) {
+		struct xdp_frame *frm = &head->frm;
+
+		/* if program changed pkt bounds we need to update the xdp_frame */
+		if (unlikely(ctx_was_changed(head)))
+			xdp_update_frame_from_buff(ctx, frm);
+
+		err = xdp_do_redirect_frame(ctx->rxq->dev, ctx, frm, prog);
+		if (err)
+			ret = err;
+	}
+	if (ret != XDP_REDIRECT)
+		xdp_return_buff(ctx);
+
+	if (++t->xdp.frame_cnt >= NAPI_POLL_WEIGHT) {
+		xdp_do_flush();
+		t->xdp.frame_cnt = 0;
+	}
+
+	return ret;
+}
+
 static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
-			u32 *retval, u32 *time, bool xdp)
+			u32 *retval, u32 *time, bool xdp, bool xdp_redirect)
 {
 	struct bpf_prog_array_item item = {.prog = prog};
 	struct bpf_run_ctx *old_ctx;
 	struct bpf_cg_run_ctx run_ctx;
-	struct bpf_test_timer t = { NO_MIGRATE };
+	struct bpf_test_timer t = { .mode = (xdp && xdp_redirect) ? XDP : NO_MIGRATE };
 	enum bpf_cgroup_storage_type stype;
 	int ret;
 
@@ -110,14 +282,26 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 	if (!repeat)
 		repeat = 1;
 
+	if (t.mode == XDP) {
+		ret = bpf_test_run_xdp_setup(&t, ctx);
+		if (ret)
+			return ret;
+	}
+
 	bpf_test_timer_enter(&t);
 	old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
 	do {
 		run_ctx.prog_item = &item;
-		if (xdp)
+		if (xdp && xdp_redirect) {
+			ret = bpf_test_run_xdp_redirect(&t, prog, ctx);
+			if (unlikely(ret < 0))
+				break;
+			*retval = ret;
+		} else if (xdp) {
 			*retval = bpf_prog_run_xdp(prog, ctx);
-		else
+		} else {
 			*retval = bpf_prog_run(prog, ctx);
+		}
 	} while (bpf_test_timer_continue(&t, repeat, &ret, time));
 	bpf_reset_run_ctx(old_ctx);
 	bpf_test_timer_leave(&t);
@@ -125,6 +309,9 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 	for_each_cgroup_storage_type(stype)
 		bpf_cgroup_storage_free(item.cgroup_storage[stype]);
 
+	if (t.mode == XDP)
+		bpf_test_run_xdp_teardown(&t);
+
 	return ret;
 }
 
@@ -663,7 +850,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	ret = convert___skb_to_skb(skb, ctx);
 	if (ret)
 		goto out;
-	ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false);
+	ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false, false);
 	if (ret)
 		goto out;
 	if (!is_l2) {
@@ -757,6 +944,7 @@ static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md)
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
+	bool do_redirect = (kattr->test.flags & BPF_F_TEST_XDP_DO_REDIRECT);
 	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	u32 headroom = XDP_PACKET_HEADROOM;
 	u32 size = kattr->test.data_size_in;
@@ -773,6 +961,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	    prog->expected_attach_type == BPF_XDP_CPUMAP)
 		return -EINVAL;
 
+	if (kattr->test.flags & ~BPF_F_TEST_XDP_DO_REDIRECT)
+		return -EINVAL;
+
 	ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
@@ -781,7 +972,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		/* There can't be user provided data before the meta data */
 		if (ctx->data_meta || ctx->data_end != size ||
 		    ctx->data > ctx->data_end ||
-		    unlikely(xdp_metalen_invalid(ctx->data)))
+		    unlikely(xdp_metalen_invalid(ctx->data)) ||
+		    (do_redirect && (kattr->test.data_out || kattr->test.ctx_out)))
 			goto free_ctx;
 		/* Meta data is allocated from the headroom */
 		headroom -= ctx->data;
@@ -807,7 +999,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 
 	if (repeat > 1)
 		bpf_prog_change_xdp(NULL, prog);
-	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
+	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration,
+			   true, do_redirect);
 	/* We convert the xdp_buff back to an xdp_md before checking the return
 	 * code so the reference count of any held netdevice will be decremented
 	 * even if the test run failed.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c26871263f1f..224a6e3261f5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1225,6 +1225,8 @@ enum {
 
 /* If set, run the test on the cpu specified by bpf_attr.test.cpu */
 #define BPF_F_TEST_RUN_ON_CPU	(1U << 0)
+/* If set, support performing redirection of XDP frames */
+#define BPF_F_TEST_XDP_DO_REDIRECT	(1U << 1)
 
 /* type for BPF_ENABLE_STATS */
 enum bpf_stats_type {
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 7/8] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
  2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
                   ` (5 preceding siblings ...)
  2021-12-11 18:41 ` [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run() Toke Høiland-Jørgensen
@ 2021-12-11 18:41 ` Toke Høiland-Jørgensen
  2021-12-11 18:41 ` [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample Toke Høiland-Jørgensen
  7 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh
  Cc: Toke Høiland-Jørgensen, Shuah Khan, netdev, bpf

This adds a selftest for the XDP_REDIRECT facility in bpf_prog_run, that
redirects packets into a veth and counts them using an XDP program on the
other side of the veth pair.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 .../bpf/prog_tests/xdp_do_redirect.c          | 74 +++++++++++++++++++
 .../bpf/progs/test_xdp_do_redirect.c          | 34 +++++++++
 2 files changed, 108 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
new file mode 100644
index 000000000000..c2effcf076a6
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <net/if.h>
+#include "test_xdp_do_redirect.skel.h"
+
+#define SYS(fmt, ...)						\
+	({							\
+		char cmd[1024];					\
+		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
+		if (!ASSERT_OK(system(cmd), cmd))		\
+			goto fail;				\
+	})
+
+#define NUM_PKTS 10
+void test_xdp_do_redirect(void)
+{
+	struct test_xdp_do_redirect *skel = NULL;
+	struct ipv6_packet data = pkt_v6;
+	struct xdp_md ctx_in = { .data_end = sizeof(data) };
+	__u8 dst_mac[ETH_ALEN] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55};
+	__u8 src_mac[ETH_ALEN] = {0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb};
+	DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+			    .data_in = &data,
+			    .data_size_in = sizeof(data),
+			    .ctx_in = &ctx_in,
+			    .ctx_size_in = sizeof(ctx_in),
+			    .flags = BPF_F_TEST_XDP_DO_REDIRECT,
+			    .repeat = NUM_PKTS,
+		);
+	int err, prog_fd, ifindex_src, ifindex_dst;
+	struct bpf_link *link;
+
+	memcpy(data.eth.h_dest, dst_mac, ETH_ALEN);
+	memcpy(data.eth.h_source, src_mac, ETH_ALEN);
+
+	skel = test_xdp_do_redirect__open();
+	if (!ASSERT_OK_PTR(skel, "skel"))
+		return;
+
+	SYS("ip link add veth_src type veth peer name veth_dst");
+	SYS("ip link set dev veth_src up");
+	SYS("ip link set dev veth_dst up");
+
+	ifindex_src = if_nametoindex("veth_src");
+	ifindex_dst = if_nametoindex("veth_dst");
+	if (!ASSERT_NEQ(ifindex_src, 0, "ifindex_src") ||
+	    !ASSERT_NEQ(ifindex_dst, 0, "ifindex_dst"))
+		goto fail;
+
+	memcpy(skel->rodata->expect_dst, dst_mac, ETH_ALEN);
+	skel->rodata->ifindex_out = ifindex_src;
+
+	if (!ASSERT_OK(test_xdp_do_redirect__load(skel), "load"))
+		goto fail;
+
+	link = bpf_program__attach_xdp(skel->progs.xdp_count_pkts, ifindex_dst);
+	if (!ASSERT_OK_PTR(link, "prog_attach"))
+		goto fail;
+	skel->links.xdp_count_pkts = link;
+
+	prog_fd = bpf_program__fd(skel->progs.xdp_redirect_notouch);
+	err = bpf_prog_test_run_opts(prog_fd, &opts);
+	if (!ASSERT_OK(err, "prog_run"))
+		goto fail;
+
+	/* wait for the packets to be flushed */
+	kern_sync_rcu();
+
+	ASSERT_EQ(skel->bss->pkts_seen, NUM_PKTS, "pkt_count");
+fail:
+	system("ip link del dev veth_src");
+	test_xdp_do_redirect__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
new file mode 100644
index 000000000000..254ebf523f37
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+#define ETH_ALEN 6
+const volatile int ifindex_out;
+const volatile __u8 expect_dst[ETH_ALEN];
+volatile int pkts_seen = 0;
+
+SEC("xdp")
+int xdp_redirect_notouch(struct xdp_md *xdp)
+{
+	return bpf_redirect(ifindex_out, 0);
+}
+
+SEC("xdp")
+int xdp_count_pkts(struct xdp_md *xdp)
+{
+	void *data = (void *)(long)xdp->data;
+	void *data_end = (void *)(long)xdp->data_end;
+	struct ethhdr *eth = data;
+	int i;
+
+	if (eth + 1 > data_end)
+		return XDP_ABORTED;
+
+	for (i = 0; i < ETH_ALEN; i++)
+		if (expect_dst[i] != eth->h_dest[i])
+			return XDP_ABORTED;
+	pkts_seen++;
+	return XDP_DROP;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample
  2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
                   ` (6 preceding siblings ...)
  2021-12-11 18:41 ` [PATCH bpf-next v3 7/8] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
@ 2021-12-11 18:41 ` Toke Høiland-Jørgensen
  2021-12-12  2:47   ` Alexei Starovoitov
  7 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-11 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, netdev, bpf

This adds an XDP-based traffic generator sample which uses the DO_REDIRECT
flag of bpf_prog_run(). It works by building the initial packet in
userspace and passing it to the kernel where an XDP program redirects the
packet to the target interface. The traffic generator supports two modes of
operation: one that just sends copies of the same packet as fast as it can
without touching the packet data at all, and one that rewrites the
destination port number of each packet, making the generated traffic span a
range of port numbers.

The dynamic mode is included to demonstrate how the bpf_prog_run() facility
enables building a completely programmable packet generator using XDP.
Using the dynamic mode has about a 10% overhead compared to the static
mode, because the latter completely avoids touching the page data.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 samples/bpf/.gitignore            |   1 +
 samples/bpf/Makefile              |   4 +
 samples/bpf/xdp_redirect.bpf.c    |  34 +++
 samples/bpf/xdp_trafficgen_user.c | 421 ++++++++++++++++++++++++++++++
 4 files changed, 460 insertions(+)
 create mode 100644 samples/bpf/xdp_trafficgen_user.c

diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore
index 0e7bfdbff80a..935672cbdd80 100644
--- a/samples/bpf/.gitignore
+++ b/samples/bpf/.gitignore
@@ -49,6 +49,7 @@ xdp_redirect_map_multi
 xdp_router_ipv4
 xdp_rxq_info
 xdp_sample_pkts
+xdp_trafficgen
 xdp_tx_iptunnel
 xdpsock
 xdpsock_ctrl_proc
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 38638845db9d..d827e0680945 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -58,6 +58,7 @@ tprogs-y += xdp_redirect_cpu
 tprogs-y += xdp_redirect_map_multi
 tprogs-y += xdp_redirect_map
 tprogs-y += xdp_redirect
+tprogs-y += xdp_trafficgen
 tprogs-y += xdp_monitor
 
 # Libbpf dependencies
@@ -123,6 +124,7 @@ xdp_redirect_map_multi-objs := xdp_redirect_map_multi_user.o $(XDP_SAMPLE)
 xdp_redirect_cpu-objs := xdp_redirect_cpu_user.o $(XDP_SAMPLE)
 xdp_redirect_map-objs := xdp_redirect_map_user.o $(XDP_SAMPLE)
 xdp_redirect-objs := xdp_redirect_user.o $(XDP_SAMPLE)
+xdp_trafficgen-objs := xdp_trafficgen_user.o $(XDP_SAMPLE)
 xdp_monitor-objs := xdp_monitor_user.o $(XDP_SAMPLE)
 
 # Tell kbuild to always build the programs
@@ -226,6 +228,7 @@ TPROGLDLIBS_map_perf_test	+= -lrt
 TPROGLDLIBS_test_overhead	+= -lrt
 TPROGLDLIBS_xdpsock		+= -pthread -lcap
 TPROGLDLIBS_xsk_fwd		+= -pthread
+TPROGLDLIBS_xdp_trafficgen	+= -pthread
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
 # make M=samples/bpf LLC=~/git/llvm-project/llvm/build/bin/llc CLANG=~/git/llvm-project/llvm/build/bin/clang
@@ -341,6 +344,7 @@ $(obj)/xdp_redirect_cpu_user.o: $(obj)/xdp_redirect_cpu.skel.h
 $(obj)/xdp_redirect_map_multi_user.o: $(obj)/xdp_redirect_map_multi.skel.h
 $(obj)/xdp_redirect_map_user.o: $(obj)/xdp_redirect_map.skel.h
 $(obj)/xdp_redirect_user.o: $(obj)/xdp_redirect.skel.h
+$(obj)/xdp_trafficgen_user.o: $(obj)/xdp_redirect.skel.h
 $(obj)/xdp_monitor_user.o: $(obj)/xdp_monitor.skel.h
 
 $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
diff --git a/samples/bpf/xdp_redirect.bpf.c b/samples/bpf/xdp_redirect.bpf.c
index 7c02bacfe96b..a09c6f576b79 100644
--- a/samples/bpf/xdp_redirect.bpf.c
+++ b/samples/bpf/xdp_redirect.bpf.c
@@ -39,6 +39,40 @@ int xdp_redirect_prog(struct xdp_md *ctx)
 	return bpf_redirect(ifindex_out, 0);
 }
 
+SEC("xdp")
+int xdp_redirect_notouch(struct xdp_md *ctx)
+{
+	return bpf_redirect(ifindex_out, 0);
+}
+
+const volatile __u16 port_start;
+const volatile __u16 port_range;
+volatile __u16 next_port = 0;
+
+SEC("xdp")
+int xdp_redirect_update_port(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data = (void *)(long)ctx->data;
+	__u16 cur_port, cksum_diff;
+	struct udphdr *hdr;
+
+	hdr = data + (sizeof(struct ethhdr) + sizeof(struct ipv6hdr));
+	if (hdr + 1 > data_end)
+		return XDP_ABORTED;
+
+	cur_port = bpf_ntohs(hdr->dest);
+	cksum_diff = next_port - cur_port;
+	if (cksum_diff) {
+		hdr->check = bpf_htons(~(~bpf_ntohs(hdr->check) + cksum_diff));
+		hdr->dest = bpf_htons(next_port);
+	}
+	if (next_port++ >= port_start + port_range - 1)
+		next_port = port_start;
+
+	return bpf_redirect(ifindex_out, 0);
+}
+
 /* Redirect require an XDP bpf_prog loaded on the TX device */
 SEC("xdp")
 int xdp_redirect_dummy_prog(struct xdp_md *ctx)
diff --git a/samples/bpf/xdp_trafficgen_user.c b/samples/bpf/xdp_trafficgen_user.c
new file mode 100644
index 000000000000..03f3a7b3260d
--- /dev/null
+++ b/samples/bpf/xdp_trafficgen_user.c
@@ -0,0 +1,421 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2021 Toke Høiland-Jørgensen <toke@redhat.com>
+ */
+static const char *__doc__ =
+"XDP trafficgen tool, using bpf_redirect helper\n"
+"Usage: xdp_trafficgen [options] <IFINDEX|IFNAME>_OUT\n";
+
+#define _GNU_SOURCE
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ipv6.h>
+#include <linux/in6.h>
+#include <linux/udp.h>
+#include <assert.h>
+#include <errno.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <net/if.h>
+#include <unistd.h>
+#include <libgen.h>
+#include <limits.h>
+#include <getopt.h>
+#include <pthread.h>
+#include <arpa/inet.h>
+#include <netinet/ether.h>
+#include <sys/resource.h>
+#include <sys/ioctl.h>
+#include <bpf/bpf.h>
+#include <bpf/bpf_endian.h>
+#include <bpf/libbpf.h>
+#include "bpf_util.h"
+#include "xdp_sample_user.h"
+#include "xdp_redirect.skel.h"
+
+static int mask = SAMPLE_REDIRECT_ERR_CNT |
+		  SAMPLE_EXCEPTION_CNT | SAMPLE_DEVMAP_XMIT_CNT_MULTI;
+
+DEFINE_SAMPLE_INIT(xdp_redirect);
+
+static const struct option long_options[] = {
+	{"dst-mac",	required_argument,	NULL, 'm' },
+	{"src-mac",	required_argument,	NULL, 'M' },
+	{"dst-ip",	required_argument,	NULL, 'a' },
+	{"src-ip",	required_argument,	NULL, 'A' },
+	{"dst-port",	required_argument,	NULL, 'p' },
+	{"src-port",	required_argument,	NULL, 'P' },
+	{"dynamic-ports", required_argument,	NULL, 'd' },
+	{"help",	no_argument,		NULL, 'h' },
+	{"stats",	no_argument,		NULL, 's' },
+	{"interval",	required_argument,	NULL, 'i' },
+	{"n-pkts",	required_argument,	NULL, 'n' },
+	{"threads",	required_argument,	NULL, 't' },
+	{"verbose",	no_argument,		NULL, 'v' },
+	{}
+};
+
+static int sample_res;
+static bool sample_exited;
+
+static void *run_samples(void *arg)
+{
+	unsigned long *interval = arg;
+
+	sample_res = sample_run(*interval, NULL, NULL);
+	sample_exited = true;
+	return NULL;
+}
+
+struct ipv6_packet {
+	struct ethhdr eth;
+	struct ipv6hdr iph;
+	struct udphdr udp;
+	__u8 payload[64 - sizeof(struct udphdr)
+		     - sizeof(struct ethhdr) - sizeof(struct ipv6hdr)];
+} __packed;
+static struct ipv6_packet pkt_v6 = {
+	.eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
+	.iph.version = 6,
+	.iph.nexthdr = IPPROTO_UDP,
+	.iph.payload_len = bpf_htons(sizeof(struct ipv6_packet)
+				     - offsetof(struct ipv6_packet, udp)),
+	.iph.hop_limit = 1,
+	.iph.saddr.s6_addr16 = {bpf_htons(0xfe80), 0, 0, 0, 0, 0, 0, bpf_htons(1)},
+	.iph.daddr.s6_addr16 = {bpf_htons(0xfe80), 0, 0, 0, 0, 0, 0, bpf_htons(2)},
+	.udp.source = bpf_htons(1),
+	.udp.dest = bpf_htons(1),
+	.udp.len = bpf_htons(sizeof(struct ipv6_packet)
+			     - offsetof(struct ipv6_packet, udp)),
+};
+
+struct thread_config {
+	void *pkt;
+	size_t pkt_size;
+	__u32 cpu_core_id;
+	__u32 num_pkts;
+	int prog_fd;
+};
+
+struct config {
+	__be64 src_mac;
+	__be64 dst_mac;
+	struct in6_addr src_ip;
+	struct in6_addr dst_ip;
+	__be16 src_port;
+	__be16 dst_port;
+	int ifindex;
+	char ifname[IFNAMSIZ];
+};
+
+static void *run_traffic(void *arg)
+{
+	const struct thread_config *cfg = arg;
+	struct xdp_md ctx_in = {
+		.data_end = cfg->pkt_size,
+	};
+	DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+			    .data_in = cfg->pkt,
+			    .data_size_in = cfg->pkt_size,
+			    .ctx_in = &ctx_in,
+			    .ctx_size_in = sizeof(ctx_in),
+			    .repeat = cfg->num_pkts ?: 1 << 24,
+			    .flags = BPF_F_TEST_XDP_DO_REDIRECT,
+		);
+	__u64 iterations = 0;
+	cpu_set_t cpu_cores;
+	int err;
+
+	CPU_ZERO(&cpu_cores);
+	CPU_SET(cfg->cpu_core_id, &cpu_cores);
+	pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_cores);
+	do {
+		err = bpf_prog_test_run_opts(cfg->prog_fd, &opts);
+		if (err) {
+			printf("bpf_prog_test_run ret %d errno %d\n", err, errno);
+			break;
+		}
+		iterations += opts.repeat;
+	} while (!sample_exited && (!cfg->num_pkts || cfg->num_pkts < iterations));
+	return NULL;
+}
+
+static __be16 calc_udp_cksum(const struct ipv6_packet *pkt)
+{
+	__u32 chksum = pkt->iph.nexthdr + bpf_ntohs(pkt->iph.payload_len);
+	int i;
+
+	for (i = 0; i < 8; i++) {
+		chksum += bpf_ntohs(pkt->iph.saddr.s6_addr16[i]);
+		chksum += bpf_ntohs(pkt->iph.daddr.s6_addr16[i]);
+	}
+	chksum += bpf_ntohs(pkt->udp.source);
+	chksum += bpf_ntohs(pkt->udp.dest);
+	chksum += bpf_ntohs(pkt->udp.len);
+
+	while (chksum >> 16)
+		chksum = (chksum & 0xFFFF) + (chksum >> 16);
+	return bpf_htons(~chksum);
+}
+
+static int prepare_pkt(struct config *cfg)
+{
+	__be64 src_mac = cfg->src_mac;
+	struct in6_addr nulladdr = {};
+	int i, err;
+
+	if (!src_mac) {
+		err = get_mac_addr(cfg->ifindex, &src_mac);
+		if (err)
+			return err;
+	}
+	for (i = 0; i < 6 ; i++) {
+		pkt_v6.eth.h_source[i] = *((__u8 *)&src_mac + i);
+		if (cfg->dst_mac)
+			pkt_v6.eth.h_dest[i] = *((__u8 *)&cfg->dst_mac + i);
+	}
+	if (memcmp(&cfg->src_ip, &nulladdr, sizeof(nulladdr)))
+		pkt_v6.iph.saddr = cfg->src_ip;
+	if (memcmp(&cfg->dst_ip, &nulladdr, sizeof(nulladdr)))
+		pkt_v6.iph.daddr = cfg->dst_ip;
+	if (cfg->src_port)
+		pkt_v6.udp.source = cfg->src_port;
+	if (cfg->dst_port)
+		pkt_v6.udp.dest = cfg->dst_port;
+	pkt_v6.udp.check = calc_udp_cksum(&pkt_v6);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	unsigned long interval = 2, threads = 1, dynports = 0;
+	__u64 num_pkts = 0;
+	pthread_t sample_thread, *runner_threads = NULL;
+	struct thread_config *t = NULL, tcfg = {
+		.pkt = &pkt_v6,
+		.pkt_size = sizeof(pkt_v6),
+	};
+	int ret = EXIT_FAIL_OPTION;
+	struct xdp_redirect *skel;
+	struct config cfg = {};
+	bool error = true;
+	int opt, i, err;
+
+	while ((opt = getopt_long(argc, argv, "a:A:d:hi:m:M:n:p:P:t:vs",
+				  long_options, NULL)) != -1) {
+		switch (opt) {
+		case 'a':
+			if (!inet_pton(AF_INET6, optarg, &cfg.dst_ip)) {
+				fprintf(stderr, "Invalid IPv6 address: %s\n", optarg);
+				return -1;
+			}
+			break;
+		case 'A':
+			if (!inet_pton(AF_INET6, optarg, &cfg.src_ip)) {
+				fprintf(stderr, "Invalid IPv6 address: %s\n", optarg);
+				return -1;
+			}
+			break;
+		case 'd':
+			dynports = strtoul(optarg, NULL, 0);
+			if (dynports < 2 || dynports >= 65535) {
+				fprintf(stderr, "Dynamic port range must be >1 and < 65535\n");
+				return -1;
+			}
+			break;
+		case 'i':
+			interval = strtoul(optarg, NULL, 0);
+			if (interval < 1 || interval == ULONG_MAX) {
+				fprintf(stderr, "Need non-zero interval\n");
+				return -1;
+			}
+			break;
+		case 't':
+			threads = strtoul(optarg, NULL, 0);
+			if (threads < 1 || threads == ULONG_MAX) {
+				fprintf(stderr, "Need at least 1 thread\n");
+				return -1;
+			}
+			break;
+		case 'm':
+		case 'M':
+			struct ether_addr *a;
+
+			a = ether_aton(optarg);
+			if (!a) {
+				fprintf(stderr, "Invalid MAC: %s\n", optarg);
+				return -1;
+			}
+			if (opt == 'm')
+				memcpy(&cfg.dst_mac, a, sizeof(*a));
+			else
+				memcpy(&cfg.src_mac, a, sizeof(*a));
+			break;
+		case 'n':
+			num_pkts = strtoull(optarg, NULL, 0);
+			if (num_pkts >= 1ULL << 32) {
+				fprintf(stderr, "Can send up to 2^32-1 pkts or infinite (0)\n");
+				return -1;
+			}
+			tcfg.num_pkts = num_pkts;
+			break;
+		case 'p':
+		case 'P':
+			unsigned long p;
+
+			p = strtoul(optarg, NULL, 0);
+			if (!p || p > 0xFFFF) {
+				fprintf(stderr, "Invalid port: %s\n", optarg);
+				return -1;
+			}
+			if (opt == 'p')
+				cfg.dst_port = bpf_htons(p);
+			else
+				cfg.src_port = bpf_htons(p);
+			break;
+		case 'v':
+			sample_switch_mode();
+			break;
+		case 's':
+			mask |= SAMPLE_REDIRECT_CNT;
+			break;
+		case 'h':
+			error = false;
+		default:
+			sample_usage(argv, long_options, __doc__, mask, error);
+			return ret;
+		}
+	}
+
+	if (argc <= optind) {
+		sample_usage(argv, long_options, __doc__, mask, true);
+		return ret;
+	}
+
+	cfg.ifindex = if_nametoindex(argv[optind]);
+	if (!cfg.ifindex)
+		cfg.ifindex = strtoul(argv[optind], NULL, 0);
+
+	if (!cfg.ifindex) {
+		fprintf(stderr, "Bad interface index or name\n");
+		sample_usage(argv, long_options, __doc__, mask, true);
+		goto end;
+	}
+
+	if (!if_indextoname(cfg.ifindex, cfg.ifname)) {
+		fprintf(stderr, "Failed to if_indextoname for %d: %s\n", cfg.ifindex,
+			strerror(errno));
+		goto end;
+	}
+
+	err = prepare_pkt(&cfg);
+	if (err)
+		goto end;
+
+	if (dynports) {
+		if (!cfg.dst_port) {
+			fprintf(stderr, "Must specify dst port when using dynamic port range\n");
+			goto end;
+		}
+
+		if (dynports + bpf_ntohs(cfg.dst_port) - 1 > 65535) {
+			fprintf(stderr, "Dynamic port range must end <= 65535\n");
+			goto end;
+		}
+	}
+
+	skel = xdp_redirect__open();
+	if (!skel) {
+		fprintf(stderr, "Failed to xdp_redirect__open: %s\n", strerror(errno));
+		ret = EXIT_FAIL_BPF;
+		goto end;
+	}
+
+	ret = sample_init_pre_load(skel);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to sample_init_pre_load: %s\n", strerror(-ret));
+		ret = EXIT_FAIL_BPF;
+		goto end_destroy;
+	}
+
+	skel->rodata->to_match[0] = cfg.ifindex;
+	skel->rodata->ifindex_out = cfg.ifindex;
+	skel->rodata->port_start = bpf_ntohs(cfg.dst_port);
+	skel->rodata->port_range = dynports;
+	skel->bss->next_port = bpf_ntohs(cfg.dst_port);
+
+	ret = xdp_redirect__load(skel);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to xdp_redirect__load: %s\n", strerror(errno));
+		ret = EXIT_FAIL_BPF;
+		goto end_destroy;
+	}
+
+	if (dynports)
+		tcfg.prog_fd = bpf_program__fd(skel->progs.xdp_redirect_update_port);
+	else
+		tcfg.prog_fd = bpf_program__fd(skel->progs.xdp_redirect_notouch);
+
+	ret = sample_init(skel, mask);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to initialize sample: %s\n", strerror(-ret));
+		ret = EXIT_FAIL;
+		goto end_destroy;
+	}
+
+	ret = EXIT_FAIL;
+
+	runner_threads = calloc(sizeof(pthread_t), threads);
+	if (!runner_threads) {
+		fprintf(stderr, "Couldn't allocate memory\n");
+		goto end_destroy;
+	}
+	t = calloc(sizeof(struct thread_config), threads);
+	if (!t) {
+		fprintf(stderr, "Couldn't allocate memory\n");
+		goto end_destroy;
+	}
+
+	printf("Transmitting on %s (ifindex %d; driver %s)\n",
+	       cfg.ifname, cfg.ifindex, get_driver_name(cfg.ifindex));
+
+	sample_exited = false;
+	ret = pthread_create(&sample_thread, NULL, run_samples, &interval);
+	if (ret < 0) {
+		fprintf(stderr, "Failed to create sample thread: %s\n", strerror(-ret));
+		goto end_destroy;
+	}
+	sleep(1);
+	for (i = 0; i < threads; i++) {
+		memcpy(&t[i], &tcfg, sizeof(tcfg));
+		tcfg.cpu_core_id++;
+
+		ret = pthread_create(&runner_threads[i], NULL, run_traffic, &t[i]);
+		if (ret < 0) {
+			fprintf(stderr, "Failed to create traffic thread: %s\n", strerror(-ret));
+			ret = EXIT_FAIL;
+			goto end_cancel;
+		}
+	}
+	pthread_join(sample_thread, NULL);
+	for (i = 0; i < 0; i++)
+		pthread_join(runner_threads[i], NULL);
+	ret = sample_res;
+	goto end_destroy;
+
+end_cancel:
+	pthread_cancel(sample_thread);
+	for (i = 0; i < 0; i++)
+		pthread_cancel(runner_threads[i]);
+end_destroy:
+	xdp_redirect__destroy(skel);
+	free(runner_threads);
+	free(t);
+end:
+	sample_exit(ret);
+}
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
  2021-12-11 18:41 ` [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run() Toke Høiland-Jørgensen
@ 2021-12-12  2:43   ` Alexei Starovoitov
  2021-12-13 16:26     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2021-12-12  2:43 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> +
> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t)
> +{
> +       struct xdp_mem_info mem = {
> +               .id = t->xdp.pp->xdp_mem_id,
> +               .type = MEM_TYPE_PAGE_POOL,
> +       };

pls add a new line.

> +       xdp_unreg_mem_model(&mem);
> +}
> +
> +static bool ctx_was_changed(struct xdp_page_head *head)
> +{
> +       return (head->orig_ctx.data != head->ctx.data ||
> +               head->orig_ctx.data_meta != head->ctx.data_meta ||
> +               head->orig_ctx.data_end != head->ctx.data_end);

redundant ()

>         bpf_test_timer_enter(&t);
>         old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>         do {
>                 run_ctx.prog_item = &item;
> -               if (xdp)
> +               if (xdp && xdp_redirect) {
> +                       ret = bpf_test_run_xdp_redirect(&t, prog, ctx);
> +                       if (unlikely(ret < 0))
> +                               break;
> +                       *retval = ret;
> +               } else if (xdp) {
>                         *retval = bpf_prog_run_xdp(prog, ctx);

Can we do this unconditionally without introducing a new uapi flag?
I mean "return bpf_redirect()" was a nop under test_run.
What kind of tests might break if it stops being a nop?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample
  2021-12-11 18:41 ` [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample Toke Høiland-Jørgensen
@ 2021-12-12  2:47   ` Alexei Starovoitov
  2021-12-13 16:28     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2021-12-12  2:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> This adds an XDP-based traffic generator sample which uses the DO_REDIRECT
> flag of bpf_prog_run(). It works by building the initial packet in
> userspace and passing it to the kernel where an XDP program redirects the
> packet to the target interface. The traffic generator supports two modes of
> operation: one that just sends copies of the same packet as fast as it can
> without touching the packet data at all, and one that rewrites the
> destination port number of each packet, making the generated traffic span a
> range of port numbers.
>
> The dynamic mode is included to demonstrate how the bpf_prog_run() facility
> enables building a completely programmable packet generator using XDP.
> Using the dynamic mode has about a 10% overhead compared to the static
> mode, because the latter completely avoids touching the page data.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  samples/bpf/.gitignore            |   1 +
>  samples/bpf/Makefile              |   4 +
>  samples/bpf/xdp_redirect.bpf.c    |  34 +++
>  samples/bpf/xdp_trafficgen_user.c | 421 ++++++++++++++++++++++++++++++
>  4 files changed, 460 insertions(+)
>  create mode 100644 samples/bpf/xdp_trafficgen_user.c

I think it deserves to be in tools/bpf/
samples/bpf/ bit rots too often now.
imo everything in there either needs to be converted to selftests/bpf
or deleted.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
  2021-12-12  2:43   ` Alexei Starovoitov
@ 2021-12-13 16:26     ` Toke Høiland-Jørgensen
  2021-12-14  0:02       ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-13 16:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> +
>> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t)
>> +{
>> +       struct xdp_mem_info mem = {
>> +               .id = t->xdp.pp->xdp_mem_id,
>> +               .type = MEM_TYPE_PAGE_POOL,
>> +       };
>
> pls add a new line.
>
>> +       xdp_unreg_mem_model(&mem);
>> +}
>> +
>> +static bool ctx_was_changed(struct xdp_page_head *head)
>> +{
>> +       return (head->orig_ctx.data != head->ctx.data ||
>> +               head->orig_ctx.data_meta != head->ctx.data_meta ||
>> +               head->orig_ctx.data_end != head->ctx.data_end);
>
> redundant ()
>
>>         bpf_test_timer_enter(&t);
>>         old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>>         do {
>>                 run_ctx.prog_item = &item;
>> -               if (xdp)
>> +               if (xdp && xdp_redirect) {
>> +                       ret = bpf_test_run_xdp_redirect(&t, prog, ctx);
>> +                       if (unlikely(ret < 0))
>> +                               break;
>> +                       *retval = ret;
>> +               } else if (xdp) {
>>                         *retval = bpf_prog_run_xdp(prog, ctx);
>
> Can we do this unconditionally without introducing a new uapi flag?
> I mean "return bpf_redirect()" was a nop under test_run.
> What kind of tests might break if it stops being a nop?

Well, I view the existing mode of bpf_prog_test_run() with XDP as a way
to write XDP unit tests: it allows you to submit a packet, run your XDP
program on it, and check that it returned the right value and did the
right modifications. This means if you XDP program does 'return
bpf_redirect()', userspace will still get the XDP_REDIRECT value and so
it can check correctness of your XDP program.

With this flag the behaviour changes quite drastically, in that it will
actually put packets on the wire instead of getting back the program
return. So I think it makes more sense to make it a separate opt-in
mode; the old behaviour can still be useful for checking XDP program
behaviour.

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample
  2021-12-12  2:47   ` Alexei Starovoitov
@ 2021-12-13 16:28     ` Toke Høiland-Jørgensen
  2021-12-14  0:05       ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-13 16:28 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> This adds an XDP-based traffic generator sample which uses the DO_REDIRECT
>> flag of bpf_prog_run(). It works by building the initial packet in
>> userspace and passing it to the kernel where an XDP program redirects the
>> packet to the target interface. The traffic generator supports two modes of
>> operation: one that just sends copies of the same packet as fast as it can
>> without touching the packet data at all, and one that rewrites the
>> destination port number of each packet, making the generated traffic span a
>> range of port numbers.
>>
>> The dynamic mode is included to demonstrate how the bpf_prog_run() facility
>> enables building a completely programmable packet generator using XDP.
>> Using the dynamic mode has about a 10% overhead compared to the static
>> mode, because the latter completely avoids touching the page data.
>>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>  samples/bpf/.gitignore            |   1 +
>>  samples/bpf/Makefile              |   4 +
>>  samples/bpf/xdp_redirect.bpf.c    |  34 +++
>>  samples/bpf/xdp_trafficgen_user.c | 421 ++++++++++++++++++++++++++++++
>>  4 files changed, 460 insertions(+)
>>  create mode 100644 samples/bpf/xdp_trafficgen_user.c
>
> I think it deserves to be in tools/bpf/
> samples/bpf/ bit rots too often now.
> imo everything in there either needs to be converted to selftests/bpf
> or deleted.

I think there's value in having a separate set of utilities that are
more user-facing than the selftests. But I do agree that it's annoying
they bit rot. So how about we fix that instead? Andrii suggested just
integrating the build of samples/bpf into selftests[0], so I'll look
into that after the holidays. But in the meantime I don't think there's
any harm in adding this utility here?

-Toke

[0] https://lore.kernel.org/r/CAEf4BzYv3ONhy-JnQUtknzgBSK0gpm9GBJYtbAiJQe50_eX7Uw@mail.gmail.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
  2021-12-13 16:26     ` Toke Høiland-Jørgensen
@ 2021-12-14  0:02       ` Alexei Starovoitov
  2021-12-14  0:36         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2021-12-14  0:02 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

On Mon, Dec 13, 2021 at 8:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> +
> >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t)
> >> +{
> >> +       struct xdp_mem_info mem = {
> >> +               .id = t->xdp.pp->xdp_mem_id,
> >> +               .type = MEM_TYPE_PAGE_POOL,
> >> +       };
> >
> > pls add a new line.
> >
> >> +       xdp_unreg_mem_model(&mem);
> >> +}
> >> +
> >> +static bool ctx_was_changed(struct xdp_page_head *head)
> >> +{
> >> +       return (head->orig_ctx.data != head->ctx.data ||
> >> +               head->orig_ctx.data_meta != head->ctx.data_meta ||
> >> +               head->orig_ctx.data_end != head->ctx.data_end);
> >
> > redundant ()
> >
> >>         bpf_test_timer_enter(&t);
> >>         old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >>         do {
> >>                 run_ctx.prog_item = &item;
> >> -               if (xdp)
> >> +               if (xdp && xdp_redirect) {
> >> +                       ret = bpf_test_run_xdp_redirect(&t, prog, ctx);
> >> +                       if (unlikely(ret < 0))
> >> +                               break;
> >> +                       *retval = ret;
> >> +               } else if (xdp) {
> >>                         *retval = bpf_prog_run_xdp(prog, ctx);
> >
> > Can we do this unconditionally without introducing a new uapi flag?
> > I mean "return bpf_redirect()" was a nop under test_run.
> > What kind of tests might break if it stops being a nop?
>
> Well, I view the existing mode of bpf_prog_test_run() with XDP as a way
> to write XDP unit tests: it allows you to submit a packet, run your XDP
> program on it, and check that it returned the right value and did the
> right modifications. This means if you XDP program does 'return
> bpf_redirect()', userspace will still get the XDP_REDIRECT value and so
> it can check correctness of your XDP program.
>
> With this flag the behaviour changes quite drastically, in that it will
> actually put packets on the wire instead of getting back the program
> return. So I think it makes more sense to make it a separate opt-in
> mode; the old behaviour can still be useful for checking XDP program
> behaviour.

Ok that all makes sense.
How about using prog_run to feed the data into proper netdev?
XDP prog may or may not attach to it (this detail is tbd) and
prog_run would use prog_fd and ifindex to trigger RX (yes, receive)
in that netdev. XDP prog will execute and will be able to perform
all actions (not only XDP_REDIRECT).
XDP_PASS would pass the packet to the stack, etc.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample
  2021-12-13 16:28     ` Toke Høiland-Jørgensen
@ 2021-12-14  0:05       ` Alexei Starovoitov
  2021-12-14  0:37         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2021-12-14  0:05 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

On Mon, Dec 13, 2021 at 8:28 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> This adds an XDP-based traffic generator sample which uses the DO_REDIRECT
> >> flag of bpf_prog_run(). It works by building the initial packet in
> >> userspace and passing it to the kernel where an XDP program redirects the
> >> packet to the target interface. The traffic generator supports two modes of
> >> operation: one that just sends copies of the same packet as fast as it can
> >> without touching the packet data at all, and one that rewrites the
> >> destination port number of each packet, making the generated traffic span a
> >> range of port numbers.
> >>
> >> The dynamic mode is included to demonstrate how the bpf_prog_run() facility
> >> enables building a completely programmable packet generator using XDP.
> >> Using the dynamic mode has about a 10% overhead compared to the static
> >> mode, because the latter completely avoids touching the page data.
> >>
> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> ---
> >>  samples/bpf/.gitignore            |   1 +
> >>  samples/bpf/Makefile              |   4 +
> >>  samples/bpf/xdp_redirect.bpf.c    |  34 +++
> >>  samples/bpf/xdp_trafficgen_user.c | 421 ++++++++++++++++++++++++++++++
> >>  4 files changed, 460 insertions(+)
> >>  create mode 100644 samples/bpf/xdp_trafficgen_user.c
> >
> > I think it deserves to be in tools/bpf/
> > samples/bpf/ bit rots too often now.
> > imo everything in there either needs to be converted to selftests/bpf
> > or deleted.
>
> I think there's value in having a separate set of utilities that are
> more user-facing than the selftests. But I do agree that it's annoying
> they bit rot. So how about we fix that instead? Andrii suggested just
> integrating the build of samples/bpf into selftests[0], so I'll look
> into that after the holidays. But in the meantime I don't think there's
> any harm in adding this utility here?

I think samples/bpf building would help to stabilize bitroting,
but the question of the right home for this trafficgen tool remains.
I think it's best to keep it outside of the kernel tree.
It's not any more special than all other libbpf and bcc tools.
I think xdp-tools repo or bcc could be a home for it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
  2021-12-14  0:02       ` Alexei Starovoitov
@ 2021-12-14  0:36         ` Toke Høiland-Jørgensen
  2021-12-14  3:45           ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-14  0:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Dec 13, 2021 at 8:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> +
>> >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t)
>> >> +{
>> >> +       struct xdp_mem_info mem = {
>> >> +               .id = t->xdp.pp->xdp_mem_id,
>> >> +               .type = MEM_TYPE_PAGE_POOL,
>> >> +       };
>> >
>> > pls add a new line.
>> >
>> >> +       xdp_unreg_mem_model(&mem);
>> >> +}
>> >> +
>> >> +static bool ctx_was_changed(struct xdp_page_head *head)
>> >> +{
>> >> +       return (head->orig_ctx.data != head->ctx.data ||
>> >> +               head->orig_ctx.data_meta != head->ctx.data_meta ||
>> >> +               head->orig_ctx.data_end != head->ctx.data_end);
>> >
>> > redundant ()
>> >
>> >>         bpf_test_timer_enter(&t);
>> >>         old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>> >>         do {
>> >>                 run_ctx.prog_item = &item;
>> >> -               if (xdp)
>> >> +               if (xdp && xdp_redirect) {
>> >> +                       ret = bpf_test_run_xdp_redirect(&t, prog, ctx);
>> >> +                       if (unlikely(ret < 0))
>> >> +                               break;
>> >> +                       *retval = ret;
>> >> +               } else if (xdp) {
>> >>                         *retval = bpf_prog_run_xdp(prog, ctx);
>> >
>> > Can we do this unconditionally without introducing a new uapi flag?
>> > I mean "return bpf_redirect()" was a nop under test_run.
>> > What kind of tests might break if it stops being a nop?
>>
>> Well, I view the existing mode of bpf_prog_test_run() with XDP as a way
>> to write XDP unit tests: it allows you to submit a packet, run your XDP
>> program on it, and check that it returned the right value and did the
>> right modifications. This means if you XDP program does 'return
>> bpf_redirect()', userspace will still get the XDP_REDIRECT value and so
>> it can check correctness of your XDP program.
>>
>> With this flag the behaviour changes quite drastically, in that it will
>> actually put packets on the wire instead of getting back the program
>> return. So I think it makes more sense to make it a separate opt-in
>> mode; the old behaviour can still be useful for checking XDP program
>> behaviour.
>
> Ok that all makes sense.

Great!

> How about using prog_run to feed the data into proper netdev?
> XDP prog may or may not attach to it (this detail is tbd) and
> prog_run would use prog_fd and ifindex to trigger RX (yes, receive)
> in that netdev. XDP prog will execute and will be able to perform
> all actions (not only XDP_REDIRECT).
> XDP_PASS would pass the packet to the stack, etc.

Hmm, that's certainly an interesting idea! I don't think we can actually
run the XDP hook on the netdev itself (since that is deep in the
driver), but we can emulate it: we just need to do what this version of
the patch is doing, but add handling of the other return codes.

XDP_PASS could be supported by basically copying what cpumap is doing
(turn the frames into skbs and call netif_receive_skb_list()), but
XDP_TX would have to be implemented via ndo_xdp_xmit(), so it becomes
equivalent to a REDIRECT back to the same interface. That's probably OK,
though, right?

I'll try this out for the next version, thanks for the idea!

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample
  2021-12-14  0:05       ` Alexei Starovoitov
@ 2021-12-14  0:37         ` Toke Høiland-Jørgensen
  2021-12-14  3:28           ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-14  0:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Dec 13, 2021 at 8:28 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> This adds an XDP-based traffic generator sample which uses the DO_REDIRECT
>> >> flag of bpf_prog_run(). It works by building the initial packet in
>> >> userspace and passing it to the kernel where an XDP program redirects the
>> >> packet to the target interface. The traffic generator supports two modes of
>> >> operation: one that just sends copies of the same packet as fast as it can
>> >> without touching the packet data at all, and one that rewrites the
>> >> destination port number of each packet, making the generated traffic span a
>> >> range of port numbers.
>> >>
>> >> The dynamic mode is included to demonstrate how the bpf_prog_run() facility
>> >> enables building a completely programmable packet generator using XDP.
>> >> Using the dynamic mode has about a 10% overhead compared to the static
>> >> mode, because the latter completely avoids touching the page data.
>> >>
>> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> ---
>> >>  samples/bpf/.gitignore            |   1 +
>> >>  samples/bpf/Makefile              |   4 +
>> >>  samples/bpf/xdp_redirect.bpf.c    |  34 +++
>> >>  samples/bpf/xdp_trafficgen_user.c | 421 ++++++++++++++++++++++++++++++
>> >>  4 files changed, 460 insertions(+)
>> >>  create mode 100644 samples/bpf/xdp_trafficgen_user.c
>> >
>> > I think it deserves to be in tools/bpf/
>> > samples/bpf/ bit rots too often now.
>> > imo everything in there either needs to be converted to selftests/bpf
>> > or deleted.
>>
>> I think there's value in having a separate set of utilities that are
>> more user-facing than the selftests. But I do agree that it's annoying
>> they bit rot. So how about we fix that instead? Andrii suggested just
>> integrating the build of samples/bpf into selftests[0], so I'll look
>> into that after the holidays. But in the meantime I don't think there's
>> any harm in adding this utility here?
>
> I think samples/bpf building would help to stabilize bitroting,
> but the question of the right home for this trafficgen tool remains.
> I think it's best to keep it outside of the kernel tree.
> It's not any more special than all other libbpf and bcc tools.
> I think xdp-tools repo or bcc could be a home for it.

Alright, I'll drop it from the next version and put it into xdp-tools.
I've been contemplating doing the same for some of the other tools
(xdp_redirect* and xdp_monitor, for instance). Any opinion on that?

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample
  2021-12-14  0:37         ` Toke Høiland-Jørgensen
@ 2021-12-14  3:28           ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2021-12-14  3:28 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

On Mon, Dec 13, 2021 at 4:37 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Mon, Dec 13, 2021 at 8:28 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>
> >> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> This adds an XDP-based traffic generator sample which uses the DO_REDIRECT
> >> >> flag of bpf_prog_run(). It works by building the initial packet in
> >> >> userspace and passing it to the kernel where an XDP program redirects the
> >> >> packet to the target interface. The traffic generator supports two modes of
> >> >> operation: one that just sends copies of the same packet as fast as it can
> >> >> without touching the packet data at all, and one that rewrites the
> >> >> destination port number of each packet, making the generated traffic span a
> >> >> range of port numbers.
> >> >>
> >> >> The dynamic mode is included to demonstrate how the bpf_prog_run() facility
> >> >> enables building a completely programmable packet generator using XDP.
> >> >> Using the dynamic mode has about a 10% overhead compared to the static
> >> >> mode, because the latter completely avoids touching the page data.
> >> >>
> >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> ---
> >> >>  samples/bpf/.gitignore            |   1 +
> >> >>  samples/bpf/Makefile              |   4 +
> >> >>  samples/bpf/xdp_redirect.bpf.c    |  34 +++
> >> >>  samples/bpf/xdp_trafficgen_user.c | 421 ++++++++++++++++++++++++++++++
> >> >>  4 files changed, 460 insertions(+)
> >> >>  create mode 100644 samples/bpf/xdp_trafficgen_user.c
> >> >
> >> > I think it deserves to be in tools/bpf/
> >> > samples/bpf/ bit rots too often now.
> >> > imo everything in there either needs to be converted to selftests/bpf
> >> > or deleted.
> >>
> >> I think there's value in having a separate set of utilities that are
> >> more user-facing than the selftests. But I do agree that it's annoying
> >> they bit rot. So how about we fix that instead? Andrii suggested just
> >> integrating the build of samples/bpf into selftests[0], so I'll look
> >> into that after the holidays. But in the meantime I don't think there's
> >> any harm in adding this utility here?
> >
> > I think samples/bpf building would help to stabilize bitroting,
> > but the question of the right home for this trafficgen tool remains.
> > I think it's best to keep it outside of the kernel tree.
> > It's not any more special than all other libbpf and bcc tools.
> > I think xdp-tools repo or bcc could be a home for it.
>
> Alright, I'll drop it from the next version and put it into xdp-tools.
> I've been contemplating doing the same for some of the other tools
> (xdp_redirect* and xdp_monitor, for instance). Any opinion on that?

Please move them too if you don't mind.
samples/ just doesn't have the right production quality vibe.
Everything in there is a sample code. In other words a toy application.
I hope xdp_trafficgen aims to be a solid maintained tool with
a man page that distros will ship eventually.
So starting with a good home is important.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
  2021-12-14  0:36         ` Toke Høiland-Jørgensen
@ 2021-12-14  3:45           ` Alexei Starovoitov
  2021-12-14 11:46             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2021-12-14  3:45 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

On Mon, Dec 13, 2021 at 4:36 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Mon, Dec 13, 2021 at 8:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>
> >> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >> +
> >> >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t)
> >> >> +{
> >> >> +       struct xdp_mem_info mem = {
> >> >> +               .id = t->xdp.pp->xdp_mem_id,
> >> >> +               .type = MEM_TYPE_PAGE_POOL,
> >> >> +       };
> >> >
> >> > pls add a new line.
> >> >
> >> >> +       xdp_unreg_mem_model(&mem);
> >> >> +}
> >> >> +
> >> >> +static bool ctx_was_changed(struct xdp_page_head *head)
> >> >> +{
> >> >> +       return (head->orig_ctx.data != head->ctx.data ||
> >> >> +               head->orig_ctx.data_meta != head->ctx.data_meta ||
> >> >> +               head->orig_ctx.data_end != head->ctx.data_end);
> >> >
> >> > redundant ()
> >> >
> >> >>         bpf_test_timer_enter(&t);
> >> >>         old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> >> >>         do {
> >> >>                 run_ctx.prog_item = &item;
> >> >> -               if (xdp)
> >> >> +               if (xdp && xdp_redirect) {
> >> >> +                       ret = bpf_test_run_xdp_redirect(&t, prog, ctx);
> >> >> +                       if (unlikely(ret < 0))
> >> >> +                               break;
> >> >> +                       *retval = ret;
> >> >> +               } else if (xdp) {
> >> >>                         *retval = bpf_prog_run_xdp(prog, ctx);
> >> >
> >> > Can we do this unconditionally without introducing a new uapi flag?
> >> > I mean "return bpf_redirect()" was a nop under test_run.
> >> > What kind of tests might break if it stops being a nop?
> >>
> >> Well, I view the existing mode of bpf_prog_test_run() with XDP as a way
> >> to write XDP unit tests: it allows you to submit a packet, run your XDP
> >> program on it, and check that it returned the right value and did the
> >> right modifications. This means if you XDP program does 'return
> >> bpf_redirect()', userspace will still get the XDP_REDIRECT value and so
> >> it can check correctness of your XDP program.
> >>
> >> With this flag the behaviour changes quite drastically, in that it will
> >> actually put packets on the wire instead of getting back the program
> >> return. So I think it makes more sense to make it a separate opt-in
> >> mode; the old behaviour can still be useful for checking XDP program
> >> behaviour.
> >
> > Ok that all makes sense.
>
> Great!
>
> > How about using prog_run to feed the data into proper netdev?
> > XDP prog may or may not attach to it (this detail is tbd) and
> > prog_run would use prog_fd and ifindex to trigger RX (yes, receive)
> > in that netdev. XDP prog will execute and will be able to perform
> > all actions (not only XDP_REDIRECT).
> > XDP_PASS would pass the packet to the stack, etc.
>
> Hmm, that's certainly an interesting idea! I don't think we can actually
> run the XDP hook on the netdev itself (since that is deep in the
> driver), but we can emulate it: we just need to do what this version of
> the patch is doing, but add handling of the other return codes.
>
> XDP_PASS could be supported by basically copying what cpumap is doing
> (turn the frames into skbs and call netif_receive_skb_list()), but
> XDP_TX would have to be implemented via ndo_xdp_xmit(), so it becomes
> equivalent to a REDIRECT back to the same interface. That's probably OK,
> though, right?

Yep. Something like this.
imo the individual BPF_F_TEST_XDP_DO_REDIRECT knob doesn't look right.
It's tweaking the prog run from no side effects execution model
to partial side effects.
If we want to run xdp prog with side effects it probably should
behave like normal execution on the netdev when it receives the packet.
We might not even need to create a new netdev for that.
I can imagine a bpf_prog_run operating on eth0 with a packet prepared
by the user space.
Like injecting a packet right into the driver and xdp part of it.
If prog says XDP_PASS the packet will go up the stack like normal.
So this mechanism could be used to inject packets into the stack.
Obviously buffer management is an issue in the traditional NIC
when a packet doesn't come from the wire.
Also doing this in every driver would be a pain.
So we need some common infra to inject the user packet into a netdev
like it was received by this netdev. It could be a change for tuntap
or for veth or not related to netdev at all.
After XDP_PASS it doesn't need to be fast. skb will get allocated
and the stack might see it as it arrived from ifindex=N regardless
of the HW of that netdev.
XDP_TX would xmit right out of that ifindex=netdev.
and XDP_REDIRECT would redirect to a different netdev.
At the end there will be less special cases and page_pool tweaks.
Thought the patches 1-5 look fine, it still feels a bit custom
just for this particular BPF_F_TEST_XDP_DO_REDIRECT use case.
With more generic bpf_run_prog(xdp_prog_fd, ifindex_of_netdev)
it might reduce custom handling.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
  2021-12-14  3:45           ` Alexei Starovoitov
@ 2021-12-14 11:46             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-12-14 11:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, Network Development, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Dec 13, 2021 at 4:36 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Mon, Dec 13, 2021 at 8:26 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> >>
>> >> > On Sat, Dec 11, 2021 at 10:43 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >> +
>> >> >> +static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t)
>> >> >> +{
>> >> >> +       struct xdp_mem_info mem = {
>> >> >> +               .id = t->xdp.pp->xdp_mem_id,
>> >> >> +               .type = MEM_TYPE_PAGE_POOL,
>> >> >> +       };
>> >> >
>> >> > pls add a new line.
>> >> >
>> >> >> +       xdp_unreg_mem_model(&mem);
>> >> >> +}
>> >> >> +
>> >> >> +static bool ctx_was_changed(struct xdp_page_head *head)
>> >> >> +{
>> >> >> +       return (head->orig_ctx.data != head->ctx.data ||
>> >> >> +               head->orig_ctx.data_meta != head->ctx.data_meta ||
>> >> >> +               head->orig_ctx.data_end != head->ctx.data_end);
>> >> >
>> >> > redundant ()
>> >> >
>> >> >>         bpf_test_timer_enter(&t);
>> >> >>         old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
>> >> >>         do {
>> >> >>                 run_ctx.prog_item = &item;
>> >> >> -               if (xdp)
>> >> >> +               if (xdp && xdp_redirect) {
>> >> >> +                       ret = bpf_test_run_xdp_redirect(&t, prog, ctx);
>> >> >> +                       if (unlikely(ret < 0))
>> >> >> +                               break;
>> >> >> +                       *retval = ret;
>> >> >> +               } else if (xdp) {
>> >> >>                         *retval = bpf_prog_run_xdp(prog, ctx);
>> >> >
>> >> > Can we do this unconditionally without introducing a new uapi flag?
>> >> > I mean "return bpf_redirect()" was a nop under test_run.
>> >> > What kind of tests might break if it stops being a nop?
>> >>
>> >> Well, I view the existing mode of bpf_prog_test_run() with XDP as a way
>> >> to write XDP unit tests: it allows you to submit a packet, run your XDP
>> >> program on it, and check that it returned the right value and did the
>> >> right modifications. This means if you XDP program does 'return
>> >> bpf_redirect()', userspace will still get the XDP_REDIRECT value and so
>> >> it can check correctness of your XDP program.
>> >>
>> >> With this flag the behaviour changes quite drastically, in that it will
>> >> actually put packets on the wire instead of getting back the program
>> >> return. So I think it makes more sense to make it a separate opt-in
>> >> mode; the old behaviour can still be useful for checking XDP program
>> >> behaviour.
>> >
>> > Ok that all makes sense.
>>
>> Great!
>>
>> > How about using prog_run to feed the data into proper netdev?
>> > XDP prog may or may not attach to it (this detail is tbd) and
>> > prog_run would use prog_fd and ifindex to trigger RX (yes, receive)
>> > in that netdev. XDP prog will execute and will be able to perform
>> > all actions (not only XDP_REDIRECT).
>> > XDP_PASS would pass the packet to the stack, etc.
>>
>> Hmm, that's certainly an interesting idea! I don't think we can actually
>> run the XDP hook on the netdev itself (since that is deep in the
>> driver), but we can emulate it: we just need to do what this version of
>> the patch is doing, but add handling of the other return codes.
>>
>> XDP_PASS could be supported by basically copying what cpumap is doing
>> (turn the frames into skbs and call netif_receive_skb_list()), but
>> XDP_TX would have to be implemented via ndo_xdp_xmit(), so it becomes
>> equivalent to a REDIRECT back to the same interface. That's probably OK,
>> though, right?
>
> Yep. Something like this.
> imo the individual BPF_F_TEST_XDP_DO_REDIRECT knob doesn't look right.
> It's tweaking the prog run from no side effects execution model
> to partial side effects.
> If we want to run xdp prog with side effects it probably should
> behave like normal execution on the netdev when it receives the packet.
> We might not even need to create a new netdev for that.
> I can imagine a bpf_prog_run operating on eth0 with a packet prepared
> by the user space.
> Like injecting a packet right into the driver and xdp part of it.
> If prog says XDP_PASS the packet will go up the stack like normal.
> So this mechanism could be used to inject packets into the stack.
> Obviously buffer management is an issue in the traditional NIC
> when a packet doesn't come from the wire.
> Also doing this in every driver would be a pain.
> So we need some common infra to inject the user packet into a netdev
> like it was received by this netdev. It could be a change for tuntap
> or for veth or not related to netdev at all.

What you're describing is basically what the cpumap code does; except it
doesn't handle XDP_TX, and it doesn't do buffer management. But I
already implemented the latter, and the former is straight-forward to do
as a special-case XDP_REDIRECT. So my plan is to try this out and see
what that looks like :)

> After XDP_PASS it doesn't need to be fast. skb will get allocated
> and the stack might see it as it arrived from ifindex=N regardless
> of the HW of that netdev.
> XDP_TX would xmit right out of that ifindex=netdev.
> and XDP_REDIRECT would redirect to a different netdev.
> At the end there will be less special cases and page_pool tweaks.
> Thought the patches 1-5 look fine, it still feels a bit custom
> just for this particular BPF_F_TEST_XDP_DO_REDIRECT use case.
> With more generic bpf_run_prog(xdp_prog_fd, ifindex_of_netdev)
> it might reduce custom handling.

Yup, totally makes sense!

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-12-14 11:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-11 18:41 [PATCH bpf-next v3 0/8] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
2021-12-11 18:41 ` [PATCH bpf-next v3 1/8] xdp: Allow registering memory model without rxq reference Toke Høiland-Jørgensen
2021-12-11 18:41 ` [PATCH bpf-next v3 2/8] page_pool: Add callback to init pages when they are allocated Toke Høiland-Jørgensen
2021-12-11 18:41 ` [PATCH bpf-next v3 3/8] page_pool: Store the XDP mem id Toke Høiland-Jørgensen
2021-12-11 18:41 ` [PATCH bpf-next v3 4/8] xdp: Move conversion to xdp_frame out of map functions Toke Høiland-Jørgensen
2021-12-11 18:41 ` [PATCH bpf-next v3 5/8] xdp: add xdp_do_redirect_frame() for pre-computed xdp_frames Toke Høiland-Jørgensen
2021-12-11 18:41 ` [PATCH bpf-next v3 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run() Toke Høiland-Jørgensen
2021-12-12  2:43   ` Alexei Starovoitov
2021-12-13 16:26     ` Toke Høiland-Jørgensen
2021-12-14  0:02       ` Alexei Starovoitov
2021-12-14  0:36         ` Toke Høiland-Jørgensen
2021-12-14  3:45           ` Alexei Starovoitov
2021-12-14 11:46             ` Toke Høiland-Jørgensen
2021-12-11 18:41 ` [PATCH bpf-next v3 7/8] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
2021-12-11 18:41 ` [PATCH bpf-next v3 8/8] samples/bpf: Add xdp_trafficgen sample Toke Høiland-Jørgensen
2021-12-12  2:47   ` Alexei Starovoitov
2021-12-13 16:28     ` Toke Høiland-Jørgensen
2021-12-14  0:05       ` Alexei Starovoitov
2021-12-14  0:37         ` Toke Høiland-Jørgensen
2021-12-14  3:28           ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).