All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc
@ 2023-12-04 20:56 Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c Daniel Xu
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: bpf, linux-kernel, linux-kselftest, netdev, llvm,
	steffen.klassert, antony.antony, alexei.starovoitov,
	yonghong.song, eddyz87
  Cc: devel

This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
xfrm_state_put(). The intent is to support software RSS (via XDP) for
the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
on (hopefully) reproducible AWS testbeds indicate that single tunnel
pcpu ipsec can reach line rate on 100G ENA nics.

Note this patchset only tests/shows generic xfrm_state access. The
"secret sauce" (if you can really even call it that) involves accessing
a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
available here [1].

[0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
[1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce2863ecb27/xdp-bench/xdp_redirect_cpumap.bpf.c#L385-L406

Changes from v3:
* Place all xfrm bpf integrations in xfrm_bpf.c
* Avoid using nval as a temporary
* Rebase to bpf-next
* Remove extraneous __failure_unpriv annotation for verifier tests

Changes from v2:
* Fix/simplify BPF_CORE_WRITE_BITFIELD() algorithm
* Added verifier tests for bitfield writes
* Fix state leakage across test_tunnel subtests

Changes from v1:
* Move xfrm tunnel tests to test_progs
* Fix writing to opts->error when opts is invalid
* Use __bpf_kfunc_start_defs()
* Remove unused vxlanhdr definition
* Add and use BPF_CORE_WRITE_BITFIELD() macro
* Make series bisect clean

Changes from RFCv2:
* Rebased to ipsec-next
* Fix netns leak

Changes from RFCv1:
* Add Antony's commit tags
* Add KF_ACQUIRE and KF_RELEASE semantics


Daniel Xu (10):
  xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c
  bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
  bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
  libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
  bpf: selftests: test_loader: Support __btf_path() annotation
  libbpf: selftests: Add verifier tests for CO-RE bitfield writes
  bpf: selftests: test_tunnel: Setup fresh topology for each subtest
  bpf: selftests: test_tunnel: Use vmlinux.h declarations
  bpf: selftests: Move xfrm tunnel test to test_progs
  bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()

 include/net/xfrm.h                            |   9 +
 net/xfrm/Makefile                             |   7 +-
 net/xfrm/xfrm_bpf.c                           | 232 ++++++++++++++++++
 net/xfrm/xfrm_interface_bpf.c                 | 110 ---------
 net/xfrm/xfrm_policy.c                        |   2 +
 tools/lib/bpf/bpf_core_read.h                 |  32 +++
 .../selftests/bpf/prog_tests/test_tunnel.c    | 162 +++++++++++-
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 tools/testing/selftests/bpf/progs/bpf_misc.h  |   1 +
 .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
 .../selftests/bpf/progs/test_tunnel_kern.c    | 138 ++++++-----
 .../bpf/progs/verifier_bitfield_write.c       | 100 ++++++++
 tools/testing/selftests/bpf/test_loader.c     |   7 +
 tools/testing/selftests/bpf/test_tunnel.sh    |  92 -------
 14 files changed, 624 insertions(+), 271 deletions(-)
 create mode 100644 net/xfrm/xfrm_bpf.c
 delete mode 100644 net/xfrm/xfrm_interface_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_bitfield_write.c

-- 
2.42.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-05  1:58   ` Alexei Starovoitov
  2023-12-07 11:52   ` Steffen Klassert
  2023-12-04 20:56 ` [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: davem, Herbert Xu, steffen.klassert, pabeni, kuba, edumazet,
	antony.antony, alexei.starovoitov, yonghong.song, eddyz87
  Cc: netdev, linux-kernel, bpf, devel

This commit moves the contents of xfrm_interface_bpf.c into a new file,
xfrm_bpf.c This is in preparation for adding more xfrm kfuncs. We'd like
to keep all the bpf integrations in a single file.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 net/xfrm/Makefile                             |  7 +------
 net/xfrm/{xfrm_interface_bpf.c => xfrm_bpf.c} | 12 ++++++++----
 2 files changed, 9 insertions(+), 10 deletions(-)
 rename net/xfrm/{xfrm_interface_bpf.c => xfrm_bpf.c} (88%)

diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile
index cd47f88921f5..29fff452280d 100644
--- a/net/xfrm/Makefile
+++ b/net/xfrm/Makefile
@@ -5,12 +5,6 @@
 
 xfrm_interface-$(CONFIG_XFRM_INTERFACE) += xfrm_interface_core.o
 
-ifeq ($(CONFIG_XFRM_INTERFACE),m)
-xfrm_interface-$(CONFIG_DEBUG_INFO_BTF_MODULES) += xfrm_interface_bpf.o
-else ifeq ($(CONFIG_XFRM_INTERFACE),y)
-xfrm_interface-$(CONFIG_DEBUG_INFO_BTF) += xfrm_interface_bpf.o
-endif
-
 obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
 		      xfrm_input.o xfrm_output.o \
 		      xfrm_sysctl.o xfrm_replay.o xfrm_device.o
@@ -21,3 +15,4 @@ obj-$(CONFIG_XFRM_USER_COMPAT) += xfrm_compat.o
 obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o
 obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o
 obj-$(CONFIG_XFRM_ESPINTCP) += espintcp.o
+obj-$(CONFIG_DEBUG_INFO_BTF) += xfrm_bpf.o
diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_bpf.c
similarity index 88%
rename from net/xfrm/xfrm_interface_bpf.c
rename to net/xfrm/xfrm_bpf.c
index 7d5e920141e9..3d3018b87f96 100644
--- a/net/xfrm/xfrm_interface_bpf.c
+++ b/net/xfrm/xfrm_bpf.c
@@ -1,9 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0-only
-/* Unstable XFRM Helpers for TC-BPF hook
+/* Unstable XFRM BPF helpers.
  *
- * These are called from SCHED_CLS BPF programs. Note that it is
- * allowed to break compatibility for these functions since the interface they
- * are exposed through to BPF programs is explicitly unstable.
+ * Note that it is allowed to break compatibility for these functions since the
+ * interface they are exposed through to BPF programs is explicitly unstable.
  */
 
 #include <linux/bpf.h>
@@ -12,6 +11,9 @@
 #include <net/dst_metadata.h>
 #include <net/xfrm.h>
 
+#if IS_BUILTIN(CONFIG_XFRM_INTERFACE) || \
+    (IS_MODULE(CONFIG_XFRM_INTERFACE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
+
 /* bpf_xfrm_info - XFRM metadata information
  *
  * Members:
@@ -108,3 +110,5 @@ int __init register_xfrm_interface_bpf(void)
 	return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS,
 					 &xfrm_interface_kfunc_set);
 }
+
+#endif /* xfrm interface */
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-07 11:53   ` Steffen Klassert
  2023-12-07 21:21   ` Eyal Birger
  2023-12-04 20:56 ` [PATCH bpf-next v4 03/10] bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc Daniel Xu
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: ast, daniel, davem, Herbert Xu, steffen.klassert, pabeni, hawk,
	john.fastabend, kuba, edumazet, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87
  Cc: netdev, linux-kernel, bpf, devel

This commit adds an unstable kfunc helper to access internal xfrm_state
associated with an SA. This is intended to be used for the upcoming
IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
words: for custom software RSS.

That being said, the function that this kfunc wraps is fairly generic
and used for a lot of xfrm tasks. I'm sure people will find uses
elsewhere over time.

Co-developed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 include/net/xfrm.h     |   9 ++++
 net/xfrm/xfrm_bpf.c    | 102 +++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_policy.c |   2 +
 3 files changed, 113 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index c9bb0f892f55..1d107241b901 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -2190,4 +2190,13 @@ static inline int register_xfrm_interface_bpf(void)
 
 #endif
 
+#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF)
+int register_xfrm_state_bpf(void);
+#else
+static inline int register_xfrm_state_bpf(void)
+{
+	return 0;
+}
+#endif
+
 #endif	/* _NET_XFRM_H */
diff --git a/net/xfrm/xfrm_bpf.c b/net/xfrm/xfrm_bpf.c
index 3d3018b87f96..3d6cac7345ca 100644
--- a/net/xfrm/xfrm_bpf.c
+++ b/net/xfrm/xfrm_bpf.c
@@ -6,9 +6,11 @@
  */
 
 #include <linux/bpf.h>
+#include <linux/btf.h>
 #include <linux/btf_ids.h>
 
 #include <net/dst_metadata.h>
+#include <net/xdp.h>
 #include <net/xfrm.h>
 
 #if IS_BUILTIN(CONFIG_XFRM_INTERFACE) || \
@@ -112,3 +114,103 @@ int __init register_xfrm_interface_bpf(void)
 }
 
 #endif /* xfrm interface */
+
+/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers
+ *
+ * Members:
+ * @error      - Out parameter, set for any errors encountered
+ *		 Values:
+ *		   -EINVAL - netns_id is less than -1
+ *		   -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ
+ *		   -ENONET - No network namespace found for netns_id
+ * @netns_id	- Specify the network namespace for lookup
+ *		 Values:
+ *		   BPF_F_CURRENT_NETNS (-1)
+ *		     Use namespace associated with ctx
+ *		   [0, S32_MAX]
+ *		     Network Namespace ID
+ * @mark	- XFRM mark to match on
+ * @daddr	- Destination address to match on
+ * @spi		- Security parameter index to match on
+ * @proto	- L3 protocol to match on
+ * @family	- L3 protocol family to match on
+ */
+struct bpf_xfrm_state_opts {
+	s32 error;
+	s32 netns_id;
+	u32 mark;
+	xfrm_address_t daddr;
+	__be32 spi;
+	u8 proto;
+	u16 family;
+};
+
+enum {
+	BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts),
+};
+
+__bpf_kfunc_start_defs();
+
+/* bpf_xdp_get_xfrm_state - Get XFRM state
+ *
+ * Parameters:
+ * @ctx	- Pointer to ctx (xdp_md) in XDP program
+ *		    Cannot be NULL
+ * @opts	- Options for lookup (documented above)
+ *		    Cannot be NULL
+ * @opts__sz	- Length of the bpf_xfrm_state_opts structure
+ *		    Must be BPF_XFRM_STATE_OPTS_SZ
+ */
+__bpf_kfunc struct xfrm_state *
+bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz)
+{
+	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+	struct net *net = dev_net(xdp->rxq->dev);
+	struct xfrm_state *x;
+
+	if (!opts || opts__sz < sizeof(opts->error))
+		return NULL;
+
+	if (opts__sz != BPF_XFRM_STATE_OPTS_SZ) {
+		opts->error = -EINVAL;
+		return NULL;
+	}
+
+	if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) {
+		opts->error = -EINVAL;
+		return NULL;
+	}
+
+	if (opts->netns_id >= 0) {
+		net = get_net_ns_by_id(net, opts->netns_id);
+		if (unlikely(!net)) {
+			opts->error = -ENONET;
+			return NULL;
+		}
+	}
+
+	x = xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi,
+			      opts->proto, opts->family);
+
+	if (opts->netns_id >= 0)
+		put_net(net);
+
+	return x;
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_SET8_START(xfrm_state_kfunc_set)
+BTF_ID_FLAGS(func, bpf_xdp_get_xfrm_state, KF_RET_NULL | KF_ACQUIRE)
+BTF_SET8_END(xfrm_state_kfunc_set)
+
+static const struct btf_kfunc_id_set xfrm_state_xdp_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &xfrm_state_kfunc_set,
+};
+
+int __init register_xfrm_state_bpf(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
+					 &xfrm_state_xdp_kfunc_set);
+}
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index c13dc3ef7910..1b7e75159727 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -4218,6 +4218,8 @@ void __init xfrm_init(void)
 #ifdef CONFIG_XFRM_ESPINTCP
 	espintcp_init();
 #endif
+
+	register_xfrm_state_bpf();
 }
 
 #ifdef CONFIG_AUDITSYSCALL
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 03/10] bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-07 11:54   ` Steffen Klassert
  2023-12-04 20:56 ` [PATCH bpf-next v4 04/10] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro Daniel Xu
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: ast, daniel, davem, Herbert Xu, steffen.klassert, pabeni, hawk,
	john.fastabend, kuba, edumazet, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87
  Cc: netdev, linux-kernel, bpf, devel

This kfunc releases a previously acquired xfrm_state from
bpf_xdp_get_xfrm_state().

Co-developed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 net/xfrm/xfrm_bpf.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/net/xfrm/xfrm_bpf.c b/net/xfrm/xfrm_bpf.c
index 3d6cac7345ca..5ca780526607 100644
--- a/net/xfrm/xfrm_bpf.c
+++ b/net/xfrm/xfrm_bpf.c
@@ -198,10 +198,26 @@ bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32
 	return x;
 }
 
+/* bpf_xdp_xfrm_state_release - Release acquired xfrm_state object
+ *
+ * This must be invoked for referenced PTR_TO_BTF_ID, and the verifier rejects
+ * the program if any references remain in the program in all of the explored
+ * states.
+ *
+ * Parameters:
+ * @x		- Pointer to referenced xfrm_state object, obtained using
+ *		  bpf_xdp_get_xfrm_state.
+ */
+__bpf_kfunc void bpf_xdp_xfrm_state_release(struct xfrm_state *x)
+{
+	xfrm_state_put(x);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_SET8_START(xfrm_state_kfunc_set)
 BTF_ID_FLAGS(func, bpf_xdp_get_xfrm_state, KF_RET_NULL | KF_ACQUIRE)
+BTF_ID_FLAGS(func, bpf_xdp_xfrm_state_release, KF_RELEASE)
 BTF_SET8_END(xfrm_state_kfunc_set)
 
 static const struct btf_kfunc_id_set xfrm_state_xdp_kfunc_set = {
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 04/10] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
                   ` (2 preceding siblings ...)
  2023-12-04 20:56 ` [PATCH bpf-next v4 03/10] bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-05  4:03   ` Andrii Nakryiko
  2023-12-04 20:56 ` [PATCH bpf-next v4 05/10] bpf: selftests: test_loader: Support __btf_path() annotation Daniel Xu
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: daniel, ast, nathan, andrii, ndesaulniers, steffen.klassert,
	antony.antony, alexei.starovoitov, yonghong.song, eddyz87
  Cc: martin.lau, song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	trix, bpf, linux-kernel, llvm, devel, netdev, Jonathan Lemon

=== Motivation ===

Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
writing wrapper to make the verifier happy.

Two alternatives to this approach are:

1. Use the upcoming `preserve_static_offset` [0] attribute to disable
   CO-RE on specific structs.
2. Use broader byte-sized writes to write to bitfields.

(1) is a bit hard to use. It requires specific and not-very-obvious
annotations to bpftool generated vmlinux.h. It's also not generally
available in released LLVM versions yet.

(2) makes the code quite hard to read and write. And especially if
BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
to have an inverse helper for writing.

=== Implementation details ===

Since the logic is a bit non-obvious, I thought it would be helpful
to explain exactly what's going on.

To start, it helps by explaining what LSHIFT_U64 (lshift) and RSHIFT_U64
(rshift) is designed to mean. Consider the core of the
BPF_CORE_READ_BITFIELD() algorithm:

        val <<= __CORE_RELO(s, field, LSHIFT_U64);
        val = val >> __CORE_RELO(s, field, RSHIFT_U64);

Basically what happens is we lshift to clear the non-relevant (blank)
higher order bits. Then we rshift to bring the relevant bits (bitfield)
down to LSB position (while also clearing blank lower order bits). To
illustrate:

        Start:    ........XXX......
        Lshift:   XXX......00000000
        Rshift:   00000000000000XXX

where `.` means blank bit, `0` means 0 bit, and `X` means bitfield bit.

After the two operations, the bitfield is ready to be interpreted as a
regular integer.

Next, we want to build an alternative (but more helpful) mental model
on lshift and rshift. That is, to consider:

* rshift as the total number of blank bits in the u64
* lshift as number of blank bits left of the bitfield in the u64

Take a moment to consider why that is true by consulting the above
diagram.

With this insight, we can now define the following relationship:

              bitfield
                 _
                | |
        0.....00XXX0...00
        |      |   |    |
        |______|   |    |
         lshift    |    |
                   |____|
              (rshift - lshift)

That is, we know the number of higher order blank bits is just lshift.
And the number of lower order blank bits is (rshift - lshift).

Finally, we can examine the core of the write side algorithm:

        mask = (~0ULL << rshift) >> lshift;              // 1
        val = (val & ~mask) | ((nval << rpad) & mask);   // 2

1. Compute a mask where the set bits are the bitfield bits. The first
   left shift zeros out exactly the number of blank bits, leaving a
   bitfield sized set of 1s. The subsequent right shift inserts the
   correct amount of higher order blank bits.

2. On the left of the `|`, mask out the bitfield bits. This creates
   0s where the new bitfield bits will go. On the right of the `|`,
   bring nval into the correct bit position and mask out any bits
   that fall outside of the bitfield. Finally, by bor'ing the two
   halves, we get the final set of bits to write back.

[0]: https://reviews.llvm.org/D133361
Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jonathan Lemon <jlemon@aviatrix.com>
Signed-off-by: Jonathan Lemon <jlemon@aviatrix.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 tools/lib/bpf/bpf_core_read.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index 1ac57bb7ac55..7325a12692a3 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -111,6 +111,38 @@ enum bpf_enum_value_kind {
 	val;								      \
 })
 
+/*
+ * Write to a bitfield, identified by s->field.
+ * This is the inverse of BPF_CORE_WRITE_BITFIELD().
+ */
+#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({			\
+	void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET);	\
+	unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);	\
+	unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);	\
+	unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);	\
+	unsigned long long mask, val, nval = new_val;			\
+	unsigned int rpad = rshift - lshift;				\
+									\
+	asm volatile("" : "+r"(p));					\
+									\
+	switch (byte_size) {						\
+	case 1: val = *(unsigned char *)p; break;			\
+	case 2: val = *(unsigned short *)p; break;			\
+	case 4: val = *(unsigned int *)p; break;			\
+	case 8: val = *(unsigned long long *)p; break;			\
+	}								\
+									\
+	mask = (~0ULL << rshift) >> lshift;				\
+	val = (val & ~mask) | ((nval << rpad) & mask);			\
+									\
+	switch (byte_size) {						\
+	case 1: *(unsigned char *)p      = val; break;			\
+	case 2: *(unsigned short *)p     = val; break;			\
+	case 4: *(unsigned int *)p       = val; break;			\
+	case 8: *(unsigned long long *)p = val; break;			\
+	}								\
+})
+
 #define ___bpf_field_ref1(field)	(field)
 #define ___bpf_field_ref2(type, field)	(((typeof(type) *)0)->field)
 #define ___bpf_field_ref(args...)					    \
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 05/10] bpf: selftests: test_loader: Support __btf_path() annotation
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
                   ` (3 preceding siblings ...)
  2023-12-04 20:56 ` [PATCH bpf-next v4 04/10] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 06/10] libbpf: selftests: Add verifier tests for CO-RE bitfield writes Daniel Xu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: daniel, shuah, andrii, ast, steffen.klassert, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87
  Cc: mykolal, martin.lau, song, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, bpf, linux-kselftest, linux-kernel, devel, netdev

This commit adds support for per-prog btf_custom_path. This is necessary
for testing CO-RE relocations on non-vmlinux types using test_loader
infrastructure.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 tools/testing/selftests/bpf/progs/bpf_misc.h | 1 +
 tools/testing/selftests/bpf/test_loader.c    | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/bpf_misc.h b/tools/testing/selftests/bpf/progs/bpf_misc.h
index 799fff4995d8..2fd59970c43a 100644
--- a/tools/testing/selftests/bpf/progs/bpf_misc.h
+++ b/tools/testing/selftests/bpf/progs/bpf_misc.h
@@ -71,6 +71,7 @@
 #define __retval_unpriv(val)	__attribute__((btf_decl_tag("comment:test_retval_unpriv="#val)))
 #define __auxiliary		__attribute__((btf_decl_tag("comment:test_auxiliary")))
 #define __auxiliary_unpriv	__attribute__((btf_decl_tag("comment:test_auxiliary_unpriv")))
+#define __btf_path(path)	__attribute__((btf_decl_tag("comment:test_btf_path=" path)))
 
 /* Convenience macro for use with 'asm volatile' blocks */
 #define __naked __attribute__((naked))
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index a350ecdfba4a..74ceb7877ae2 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -27,6 +27,7 @@
 #define TEST_TAG_RETVAL_PFX_UNPRIV "comment:test_retval_unpriv="
 #define TEST_TAG_AUXILIARY "comment:test_auxiliary"
 #define TEST_TAG_AUXILIARY_UNPRIV "comment:test_auxiliary_unpriv"
+#define TEST_BTF_PATH "comment:test_btf_path="
 
 /* Warning: duplicated in bpf_misc.h */
 #define POINTER_VALUE	0xcafe4all
@@ -58,6 +59,7 @@ struct test_spec {
 	const char *prog_name;
 	struct test_subspec priv;
 	struct test_subspec unpriv;
+	const char *btf_custom_path;
 	int log_level;
 	int prog_flags;
 	int mode_mask;
@@ -288,6 +290,8 @@ static int parse_test_spec(struct test_loader *tester,
 					goto cleanup;
 				update_flags(&spec->prog_flags, flags, clear);
 			}
+		} else if (str_has_pfx(s, TEST_BTF_PATH)) {
+			spec->btf_custom_path = s + sizeof(TEST_BTF_PATH) - 1;
 		}
 	}
 
@@ -578,6 +582,9 @@ void run_subtest(struct test_loader *tester,
 		}
 	}
 
+	/* Implicitly reset to NULL if next test case doesn't specify */
+	open_opts->btf_custom_path = spec->btf_custom_path;
+
 	tobj = bpf_object__open_mem(obj_bytes, obj_byte_cnt, open_opts);
 	if (!ASSERT_OK_PTR(tobj, "obj_open_mem")) /* shouldn't happen */
 		goto subtest_cleanup;
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 06/10] libbpf: selftests: Add verifier tests for CO-RE bitfield writes
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
                   ` (4 preceding siblings ...)
  2023-12-04 20:56 ` [PATCH bpf-next v4 05/10] bpf: selftests: test_loader: Support __btf_path() annotation Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-05  4:05   ` Andrii Nakryiko
  2023-12-04 20:56 ` [PATCH bpf-next v4 07/10] bpf: selftests: test_tunnel: Setup fresh topology for each subtest Daniel Xu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: daniel, shuah, andrii, ast, steffen.klassert, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87
  Cc: mykolal, martin.lau, song, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, linux-kernel, bpf, linux-kselftest, devel, netdev

Add some tests that exercise BPF_CORE_WRITE_BITFIELD() macro. Since some
non-trivial bit fiddling is going on, make sure various edge cases (such
as adjacent bitfields and bitfields at the edge of structs) are
exercised.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 .../bpf/progs/verifier_bitfield_write.c       | 100 ++++++++++++++++++
 2 files changed, 102 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_bitfield_write.c

diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c
index 8d746642cbd7..ac49ec25211d 100644
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c
+++ b/tools/testing/selftests/bpf/prog_tests/verifier.c
@@ -6,6 +6,7 @@
 #include "verifier_and.skel.h"
 #include "verifier_array_access.skel.h"
 #include "verifier_basic_stack.skel.h"
+#include "verifier_bitfield_write.skel.h"
 #include "verifier_bounds.skel.h"
 #include "verifier_bounds_deduction.skel.h"
 #include "verifier_bounds_deduction_non_const.skel.h"
@@ -116,6 +117,7 @@ static void run_tests_aux(const char *skel_name,
 
 void test_verifier_and(void)                  { RUN(verifier_and); }
 void test_verifier_basic_stack(void)          { RUN(verifier_basic_stack); }
+void test_verifier_bitfield_write(void)       { RUN(verifier_bitfield_write); }
 void test_verifier_bounds(void)               { RUN(verifier_bounds); }
 void test_verifier_bounds_deduction(void)     { RUN(verifier_bounds_deduction); }
 void test_verifier_bounds_deduction_non_const(void)     { RUN(verifier_bounds_deduction_non_const); }
diff --git a/tools/testing/selftests/bpf/progs/verifier_bitfield_write.c b/tools/testing/selftests/bpf/progs/verifier_bitfield_write.c
new file mode 100644
index 000000000000..623f130a3198
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/verifier_bitfield_write.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <stdint.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+
+#include "bpf_misc.h"
+
+struct core_reloc_bitfields {
+	/* unsigned bitfields */
+	uint8_t		ub1: 1;
+	uint8_t		ub2: 2;
+	uint32_t	ub7: 7;
+	/* signed bitfields */
+	int8_t		sb4: 4;
+	int32_t		sb20: 20;
+	/* non-bitfields */
+	uint32_t	u32;
+	int32_t		s32;
+} __attribute__((preserve_access_index));
+
+SEC("tc")
+__description("single CO-RE bitfield roundtrip")
+__btf_path("btf__core_reloc_bitfields.bpf.o")
+__success
+__retval(3)
+int single_field_roundtrip(struct __sk_buff *ctx)
+{
+	struct core_reloc_bitfields bitfields;
+
+	__builtin_memset(&bitfields, 0, sizeof(bitfields));
+	BPF_CORE_WRITE_BITFIELD(&bitfields, ub2, 3);
+	return BPF_CORE_READ_BITFIELD(&bitfields, ub2);
+}
+
+SEC("tc")
+__description("multiple CO-RE bitfield roundtrip")
+__btf_path("btf__core_reloc_bitfields.bpf.o")
+__success
+__retval(0x3FD)
+int multiple_field_roundtrip(struct __sk_buff *ctx)
+{
+	struct core_reloc_bitfields bitfields;
+	uint8_t ub2;
+	int8_t sb4;
+
+	__builtin_memset(&bitfields, 0, sizeof(bitfields));
+	BPF_CORE_WRITE_BITFIELD(&bitfields, ub2, 1);
+	BPF_CORE_WRITE_BITFIELD(&bitfields, sb4, -1);
+
+	ub2 = BPF_CORE_READ_BITFIELD(&bitfields, ub2);
+	sb4 = BPF_CORE_READ_BITFIELD(&bitfields, sb4);
+
+	return (((uint8_t)sb4) << 2) | ub2;
+}
+
+SEC("tc")
+__description("adjacent CO-RE bitfield roundtrip")
+__btf_path("btf__core_reloc_bitfields.bpf.o")
+__success
+__retval(7)
+int adjacent_field_roundtrip(struct __sk_buff *ctx)
+{
+	struct core_reloc_bitfields bitfields;
+	uint8_t ub1, ub2;
+
+	__builtin_memset(&bitfields, 0, sizeof(bitfields));
+	BPF_CORE_WRITE_BITFIELD(&bitfields, ub1, 1);
+	BPF_CORE_WRITE_BITFIELD(&bitfields, ub2, 3);
+
+	ub1 = BPF_CORE_READ_BITFIELD(&bitfields, ub1);
+	ub2 = BPF_CORE_READ_BITFIELD(&bitfields, ub2);
+
+	return (ub2 << 1) | ub1;
+}
+
+SEC("tc")
+__description("multibyte CO-RE bitfield roundtrip")
+__btf_path("btf__core_reloc_bitfields.bpf.o")
+__success
+__retval(0x21)
+int multibyte_field_roundtrip(struct __sk_buff *ctx)
+{
+	struct core_reloc_bitfields bitfields;
+	uint32_t ub7;
+	uint8_t ub1;
+
+	__builtin_memset(&bitfields, 0, sizeof(bitfields));
+	BPF_CORE_WRITE_BITFIELD(&bitfields, ub1, 1);
+	BPF_CORE_WRITE_BITFIELD(&bitfields, ub7, 16);
+
+	ub1 = BPF_CORE_READ_BITFIELD(&bitfields, ub1);
+	ub7 = BPF_CORE_READ_BITFIELD(&bitfields, ub7);
+
+	return (ub7 << 1) | ub1;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 07/10] bpf: selftests: test_tunnel: Setup fresh topology for each subtest
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
                   ` (5 preceding siblings ...)
  2023-12-04 20:56 ` [PATCH bpf-next v4 06/10] libbpf: selftests: Add verifier tests for CO-RE bitfield writes Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 08/10] bpf: selftests: test_tunnel: Use vmlinux.h declarations Daniel Xu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: daniel, shuah, andrii, ast, steffen.klassert, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87
  Cc: martin.lau, song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	mykolal, bpf, linux-kselftest, linux-kernel, devel, netdev

This helps with determinism b/c individual setup/teardown prevents
leaking state between different subtests.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 tools/testing/selftests/bpf/prog_tests/test_tunnel.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index d149ab98798d..b57d48219d0b 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -535,23 +535,20 @@ static void test_ipip_tunnel(enum ipip_encap encap)
 #define RUN_TEST(name, ...)						\
 	({								\
 		if (test__start_subtest(#name)) {			\
+			config_device();				\
 			test_ ## name(__VA_ARGS__);			\
+			cleanup();					\
 		}							\
 	})
 
 static void *test_tunnel_run_tests(void *arg)
 {
-	cleanup();
-	config_device();
-
 	RUN_TEST(vxlan_tunnel);
 	RUN_TEST(ip6vxlan_tunnel);
 	RUN_TEST(ipip_tunnel, NONE);
 	RUN_TEST(ipip_tunnel, FOU);
 	RUN_TEST(ipip_tunnel, GUE);
 
-	cleanup();
-
 	return NULL;
 }
 
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 08/10] bpf: selftests: test_tunnel: Use vmlinux.h declarations
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
                   ` (6 preceding siblings ...)
  2023-12-04 20:56 ` [PATCH bpf-next v4 07/10] bpf: selftests: test_tunnel: Setup fresh topology for each subtest Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 09/10] bpf: selftests: Move xfrm tunnel test to test_progs Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 10/10] bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state() Daniel Xu
  9 siblings, 0 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: daniel, shuah, andrii, ast, steffen.klassert, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87
  Cc: martin.lau, song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	mykolal, bpf, linux-kselftest, linux-kernel, devel, netdev

vmlinux.h declarations are more ergnomic, especially when working with
kfuncs. The uapi headers are often incomplete for kfunc definitions.

This commit also switches bitfield accesses to use CO-RE helpers.
Switching to vmlinux.h definitions makes the verifier very
unhappy with raw bitfield accesses. The error is:

    ; md.u.md2.dir = direction;
    33: (69) r1 = *(u16 *)(r2 +11)
    misaligned stack access off (0x0; 0x0)+-64+11 size 2

Fix by using CO-RE-aware bitfield reads and writes.

Co-developed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 .../selftests/bpf/progs/bpf_tracing_net.h     |  1 +
 .../selftests/bpf/progs/test_tunnel_kern.c    | 76 +++++--------------
 2 files changed, 22 insertions(+), 55 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 0b793a102791..1bdc680b0e0e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -26,6 +26,7 @@
 #define IPV6_AUTOFLOWLABEL	70
 
 #define TC_ACT_UNSPEC		(-1)
+#define TC_ACT_OK		0
 #define TC_ACT_SHOT		2
 
 #define SOL_TCP			6
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index f66af753bbbb..b320fb7bb080 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -6,62 +6,26 @@
  * modify it under the terms of version 2 of the GNU General Public
  * License as published by the Free Software Foundation.
  */
-#include <stddef.h>
-#include <string.h>
-#include <arpa/inet.h>
-#include <linux/bpf.h>
-#include <linux/if_ether.h>
-#include <linux/if_packet.h>
-#include <linux/if_tunnel.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/icmp.h>
-#include <linux/types.h>
-#include <linux/socket.h>
-#include <linux/pkt_cls.h>
-#include <linux/erspan.h>
-#include <linux/udp.h>
+#include "vmlinux.h"
+#include <bpf/bpf_core_read.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_endian.h>
+#include "bpf_kfuncs.h"
+#include "bpf_tracing_net.h"
 
 #define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
 
-#define VXLAN_UDP_PORT 4789
+#define VXLAN_UDP_PORT		4789
+#define ETH_P_IP		0x0800
+#define PACKET_HOST		0
+#define TUNNEL_CSUM		bpf_htons(0x01)
+#define TUNNEL_KEY		bpf_htons(0x04)
 
 /* Only IPv4 address assigned to veth1.
  * 172.16.1.200
  */
 #define ASSIGNED_ADDR_VETH1 0xac1001c8
 
-struct geneve_opt {
-	__be16	opt_class;
-	__u8	type;
-	__u8	length:5;
-	__u8	r3:1;
-	__u8	r2:1;
-	__u8	r1:1;
-	__u8	opt_data[8]; /* hard-coded to 8 byte */
-};
-
-struct vxlanhdr {
-	__be32 vx_flags;
-	__be32 vx_vni;
-} __attribute__((packed));
-
-struct vxlan_metadata {
-	__u32     gbp;
-};
-
-struct bpf_fou_encap {
-	__be16 sport;
-	__be16 dport;
-};
-
-enum bpf_fou_encap_type {
-	FOU_BPF_ENCAP_FOU,
-	FOU_BPF_ENCAP_GUE,
-};
-
 int bpf_skb_set_fou_encap(struct __sk_buff *skb_ctx,
 			  struct bpf_fou_encap *encap, int type) __ksym;
 int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx,
@@ -205,9 +169,9 @@ int erspan_set_tunnel(struct __sk_buff *skb)
 	__u8 hwid = 7;
 
 	md.version = 2;
-	md.u.md2.dir = direction;
-	md.u.md2.hwid = hwid & 0xf;
-	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
+	BPF_CORE_WRITE_BITFIELD(&md.u.md2, dir, direction);
+	BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid, (hwid & 0xf));
+	BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid_upper, (hwid >> 4) & 0x3);
 #endif
 
 	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
@@ -246,8 +210,9 @@ int erspan_get_tunnel(struct __sk_buff *skb)
 	bpf_printk("\tindex %x\n", index);
 #else
 	bpf_printk("\tdirection %d hwid %x timestamp %u\n",
-		   md.u.md2.dir,
-		   (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
+		   BPF_CORE_READ_BITFIELD(&md.u.md2, dir),
+		   (BPF_CORE_READ_BITFIELD(&md.u.md2, hwid_upper) << 4) +
+		   BPF_CORE_READ_BITFIELD(&md.u.md2, hwid),
 		   bpf_ntohl(md.u.md2.timestamp));
 #endif
 
@@ -284,9 +249,9 @@ int ip4ip6erspan_set_tunnel(struct __sk_buff *skb)
 	__u8 hwid = 17;
 
 	md.version = 2;
-	md.u.md2.dir = direction;
-	md.u.md2.hwid = hwid & 0xf;
-	md.u.md2.hwid_upper = (hwid >> 4) & 0x3;
+	BPF_CORE_WRITE_BITFIELD(&md.u.md2, dir, direction);
+	BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid, (hwid & 0xf));
+	BPF_CORE_WRITE_BITFIELD(&md.u.md2, hwid_upper, (hwid >> 4) & 0x3);
 #endif
 
 	ret = bpf_skb_set_tunnel_opt(skb, &md, sizeof(md));
@@ -326,8 +291,9 @@ int ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
 	bpf_printk("\tindex %x\n", index);
 #else
 	bpf_printk("\tdirection %d hwid %x timestamp %u\n",
-		   md.u.md2.dir,
-		   (md.u.md2.hwid_upper << 4) + md.u.md2.hwid,
+		   BPF_CORE_READ_BITFIELD(&md.u.md2, dir),
+		   (BPF_CORE_READ_BITFIELD(&md.u.md2, hwid_upper) << 4) +
+		   BPF_CORE_READ_BITFIELD(&md.u.md2, hwid),
 		   bpf_ntohl(md.u.md2.timestamp));
 #endif
 
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 09/10] bpf: selftests: Move xfrm tunnel test to test_progs
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
                   ` (7 preceding siblings ...)
  2023-12-04 20:56 ` [PATCH bpf-next v4 08/10] bpf: selftests: test_tunnel: Use vmlinux.h declarations Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  2023-12-04 20:56 ` [PATCH bpf-next v4 10/10] bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state() Daniel Xu
  9 siblings, 0 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: daniel, shuah, andrii, ast, steffen.klassert, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87
  Cc: mykolal, martin.lau, song, john.fastabend, kpsingh, sdf, haoluo,
	jolsa, bpf, linux-kselftest, linux-kernel, devel, netdev

test_progs is better than a shell script b/c C is a bit easier to
maintain than shell. Also it's easier to use new infra like memory
mapped global variables from C via bpf skeleton.

Co-developed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 .../selftests/bpf/prog_tests/test_tunnel.c    | 143 ++++++++++++++++++
 .../selftests/bpf/progs/test_tunnel_kern.c    |  11 +-
 tools/testing/selftests/bpf/test_tunnel.sh    |  92 -----------
 3 files changed, 151 insertions(+), 95 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index b57d48219d0b..2d7f8fa82ebd 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -50,6 +50,7 @@
  */
 
 #include <arpa/inet.h>
+#include <linux/if_link.h>
 #include <linux/if_tun.h>
 #include <linux/limits.h>
 #include <linux/sysctl.h>
@@ -92,6 +93,11 @@
 #define IPIP_TUNL_DEV0 "ipip00"
 #define IPIP_TUNL_DEV1 "ipip11"
 
+#define XFRM_AUTH "0x1111111111111111111111111111111111111111"
+#define XFRM_ENC "0x22222222222222222222222222222222"
+#define XFRM_SPI_IN_TO_OUT 0x1
+#define XFRM_SPI_OUT_TO_IN 0x2
+
 #define PING_ARGS "-i 0.01 -c 3 -w 10 -q"
 
 static int config_device(void)
@@ -264,6 +270,92 @@ static void delete_ipip_tunnel(void)
 	SYS_NOFAIL("ip fou del port 5555 2> /dev/null");
 }
 
+static int add_xfrm_tunnel(void)
+{
+	/* at_ns0 namespace
+	 * at_ns0 -> root
+	 */
+	SYS(fail,
+	    "ip netns exec at_ns0 "
+		"ip xfrm state add src %s dst %s proto esp "
+			"spi %d reqid 1 mode tunnel "
+			"auth-trunc 'hmac(sha1)' %s 96 enc 'cbc(aes)' %s",
+	    IP4_ADDR_VETH0, IP4_ADDR1_VETH1, XFRM_SPI_IN_TO_OUT, XFRM_AUTH, XFRM_ENC);
+	SYS(fail,
+	    "ip netns exec at_ns0 "
+		"ip xfrm policy add src %s/32 dst %s/32 dir out "
+			"tmpl src %s dst %s proto esp reqid 1 "
+			"mode tunnel",
+	    IP4_ADDR_TUNL_DEV0, IP4_ADDR_TUNL_DEV1, IP4_ADDR_VETH0, IP4_ADDR1_VETH1);
+
+	/* root -> at_ns0 */
+	SYS(fail,
+	    "ip netns exec at_ns0 "
+		"ip xfrm state add src %s dst %s proto esp "
+			"spi %d reqid 2 mode tunnel "
+			"auth-trunc 'hmac(sha1)' %s 96 enc 'cbc(aes)' %s",
+	    IP4_ADDR1_VETH1, IP4_ADDR_VETH0, XFRM_SPI_OUT_TO_IN, XFRM_AUTH, XFRM_ENC);
+	SYS(fail,
+	    "ip netns exec at_ns0 "
+		"ip xfrm policy add src %s/32 dst %s/32 dir in "
+			"tmpl src %s dst %s proto esp reqid 2 "
+			"mode tunnel",
+	    IP4_ADDR_TUNL_DEV1, IP4_ADDR_TUNL_DEV0, IP4_ADDR1_VETH1, IP4_ADDR_VETH0);
+
+	/* address & route */
+	SYS(fail, "ip netns exec at_ns0 ip addr add dev veth0 %s/32",
+	    IP4_ADDR_TUNL_DEV0);
+	SYS(fail, "ip netns exec at_ns0 ip route add %s dev veth0 via %s src %s",
+	    IP4_ADDR_TUNL_DEV1, IP4_ADDR1_VETH1, IP4_ADDR_TUNL_DEV0);
+
+	/* root namespace
+	 * at_ns0 -> root
+	 */
+	SYS(fail,
+	    "ip xfrm state add src %s dst %s proto esp "
+		    "spi %d reqid 1 mode tunnel "
+		    "auth-trunc 'hmac(sha1)' %s 96  enc 'cbc(aes)' %s",
+	    IP4_ADDR_VETH0, IP4_ADDR1_VETH1, XFRM_SPI_IN_TO_OUT, XFRM_AUTH, XFRM_ENC);
+	SYS(fail,
+	    "ip xfrm policy add src %s/32 dst %s/32 dir in "
+		    "tmpl src %s dst %s proto esp reqid 1 "
+		    "mode tunnel",
+	    IP4_ADDR_TUNL_DEV0, IP4_ADDR_TUNL_DEV1, IP4_ADDR_VETH0, IP4_ADDR1_VETH1);
+
+	/* root -> at_ns0 */
+	SYS(fail,
+	    "ip xfrm state add src %s dst %s proto esp "
+		    "spi %d reqid 2 mode tunnel "
+		    "auth-trunc 'hmac(sha1)' %s 96  enc 'cbc(aes)' %s",
+	    IP4_ADDR1_VETH1, IP4_ADDR_VETH0, XFRM_SPI_OUT_TO_IN, XFRM_AUTH, XFRM_ENC);
+	SYS(fail,
+	    "ip xfrm policy add src %s/32 dst %s/32 dir out "
+		    "tmpl src %s dst %s proto esp reqid 2 "
+		    "mode tunnel",
+	    IP4_ADDR_TUNL_DEV1, IP4_ADDR_TUNL_DEV0, IP4_ADDR1_VETH1, IP4_ADDR_VETH0);
+
+	/* address & route */
+	SYS(fail, "ip addr add dev veth1 %s/32", IP4_ADDR_TUNL_DEV1);
+	SYS(fail, "ip route add %s dev veth1 via %s src %s",
+	    IP4_ADDR_TUNL_DEV0, IP4_ADDR_VETH0, IP4_ADDR_TUNL_DEV1);
+
+	return 0;
+fail:
+	return -1;
+}
+
+static void delete_xfrm_tunnel(void)
+{
+	SYS_NOFAIL("ip xfrm policy delete dir out src %s/32 dst %s/32 2> /dev/null",
+		   IP4_ADDR_TUNL_DEV1, IP4_ADDR_TUNL_DEV0);
+	SYS_NOFAIL("ip xfrm policy delete dir in src %s/32 dst %s/32 2> /dev/null",
+		   IP4_ADDR_TUNL_DEV0, IP4_ADDR_TUNL_DEV1);
+	SYS_NOFAIL("ip xfrm state delete src %s dst %s proto esp spi %d 2> /dev/null",
+		   IP4_ADDR_VETH0, IP4_ADDR1_VETH1, XFRM_SPI_IN_TO_OUT);
+	SYS_NOFAIL("ip xfrm state delete src %s dst %s proto esp spi %d 2> /dev/null",
+		   IP4_ADDR1_VETH1, IP4_ADDR_VETH0, XFRM_SPI_OUT_TO_IN);
+}
+
 static int test_ping(int family, const char *addr)
 {
 	SYS(fail, "%s %s %s > /dev/null", ping_command(family), PING_ARGS, addr);
@@ -532,6 +624,56 @@ static void test_ipip_tunnel(enum ipip_encap encap)
 		test_tunnel_kern__destroy(skel);
 }
 
+static void test_xfrm_tunnel(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
+			    .attach_point = BPF_TC_INGRESS);
+	struct test_tunnel_kern *skel = NULL;
+	struct nstoken *nstoken;
+	int tc_prog_fd;
+	int ifindex;
+	int err;
+
+	err = add_xfrm_tunnel();
+	if (!ASSERT_OK(err, "add_xfrm_tunnel"))
+		return;
+
+	skel = test_tunnel_kern__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_tunnel_kern__open_and_load"))
+		goto done;
+
+	ifindex = if_nametoindex("veth1");
+	if (!ASSERT_NEQ(ifindex, 0, "veth1 ifindex"))
+		goto done;
+
+	/* attach tc prog to tunnel dev */
+	tc_hook.ifindex = ifindex;
+	tc_prog_fd = bpf_program__fd(skel->progs.xfrm_get_state);
+	if (!ASSERT_GE(tc_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (attach_tc_prog(&tc_hook, tc_prog_fd, -1))
+		goto done;
+
+	/* ping from at_ns0 namespace test */
+	nstoken = open_netns("at_ns0");
+	err = test_ping(AF_INET, IP4_ADDR_TUNL_DEV1);
+	close_netns(nstoken);
+	if (!ASSERT_OK(err, "test_ping"))
+		goto done;
+
+	if (!ASSERT_EQ(skel->bss->xfrm_reqid, 1, "req_id"))
+		goto done;
+	if (!ASSERT_EQ(skel->bss->xfrm_spi, XFRM_SPI_IN_TO_OUT, "spi"))
+		goto done;
+	if (!ASSERT_EQ(skel->bss->xfrm_remote_ip, 0xac100164, "remote_ip"))
+		goto done;
+
+done:
+	delete_xfrm_tunnel();
+	if (skel)
+		test_tunnel_kern__destroy(skel);
+}
+
 #define RUN_TEST(name, ...)						\
 	({								\
 		if (test__start_subtest(#name)) {			\
@@ -548,6 +690,7 @@ static void *test_tunnel_run_tests(void *arg)
 	RUN_TEST(ipip_tunnel, NONE);
 	RUN_TEST(ipip_tunnel, FOU);
 	RUN_TEST(ipip_tunnel, GUE);
+	RUN_TEST(xfrm_tunnel);
 
 	return NULL;
 }
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index b320fb7bb080..3a59eb9c34de 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -929,6 +929,10 @@ int ip6ip6_get_tunnel(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
+volatile int xfrm_reqid = 0;
+volatile int xfrm_spi = 0;
+volatile int xfrm_remote_ip = 0;
+
 SEC("tc")
 int xfrm_get_state(struct __sk_buff *skb)
 {
@@ -939,9 +943,10 @@ int xfrm_get_state(struct __sk_buff *skb)
 	if (ret < 0)
 		return TC_ACT_OK;
 
-	bpf_printk("reqid %d spi 0x%x remote ip 0x%x\n",
-		   x.reqid, bpf_ntohl(x.spi),
-		   bpf_ntohl(x.remote_ipv4));
+	xfrm_reqid = x.reqid;
+	xfrm_spi = bpf_ntohl(x.spi);
+	xfrm_remote_ip = bpf_ntohl(x.remote_ipv4);
+
 	return TC_ACT_OK;
 }
 
diff --git a/tools/testing/selftests/bpf/test_tunnel.sh b/tools/testing/selftests/bpf/test_tunnel.sh
index 2dec7dbf29a2..d9661b9988ba 100755
--- a/tools/testing/selftests/bpf/test_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tunnel.sh
@@ -517,90 +517,6 @@ test_ip6ip6()
         echo -e ${GREEN}"PASS: ip6$TYPE"${NC}
 }
 
-setup_xfrm_tunnel()
-{
-	auth=0x$(printf '1%.0s' {1..40})
-	enc=0x$(printf '2%.0s' {1..32})
-	spi_in_to_out=0x1
-	spi_out_to_in=0x2
-	# at_ns0 namespace
-	# at_ns0 -> root
-	ip netns exec at_ns0 \
-		ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
-			spi $spi_in_to_out reqid 1 mode tunnel \
-			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
-	ip netns exec at_ns0 \
-		ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir out \
-		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
-		mode tunnel
-	# root -> at_ns0
-	ip netns exec at_ns0 \
-		ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
-			spi $spi_out_to_in reqid 2 mode tunnel \
-			auth-trunc 'hmac(sha1)' $auth 96 enc 'cbc(aes)' $enc
-	ip netns exec at_ns0 \
-		ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir in \
-		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
-		mode tunnel
-	# address & route
-	ip netns exec at_ns0 \
-		ip addr add dev veth0 10.1.1.100/32
-	ip netns exec at_ns0 \
-		ip route add 10.1.1.200 dev veth0 via 172.16.1.200 \
-			src 10.1.1.100
-
-	# root namespace
-	# at_ns0 -> root
-	ip xfrm state add src 172.16.1.100 dst 172.16.1.200 proto esp \
-		spi $spi_in_to_out reqid 1 mode tunnel \
-		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
-	ip xfrm policy add src 10.1.1.100/32 dst 10.1.1.200/32 dir in \
-		tmpl src 172.16.1.100 dst 172.16.1.200 proto esp reqid 1 \
-		mode tunnel
-	# root -> at_ns0
-	ip xfrm state add src 172.16.1.200 dst 172.16.1.100 proto esp \
-		spi $spi_out_to_in reqid 2 mode tunnel \
-		auth-trunc 'hmac(sha1)' $auth 96  enc 'cbc(aes)' $enc
-	ip xfrm policy add src 10.1.1.200/32 dst 10.1.1.100/32 dir out \
-		tmpl src 172.16.1.200 dst 172.16.1.100 proto esp reqid 2 \
-		mode tunnel
-	# address & route
-	ip addr add dev veth1 10.1.1.200/32
-	ip route add 10.1.1.100 dev veth1 via 172.16.1.100 src 10.1.1.200
-}
-
-test_xfrm_tunnel()
-{
-	if [[ -e /sys/kernel/tracing/trace ]]; then
-		TRACE=/sys/kernel/tracing/trace
-	else
-		TRACE=/sys/kernel/debug/tracing/trace
-	fi
-	config_device
-	> ${TRACE}
-	setup_xfrm_tunnel
-	mkdir -p ${BPF_PIN_TUNNEL_DIR}
-	bpftool prog loadall ${BPF_FILE} ${BPF_PIN_TUNNEL_DIR}
-	tc qdisc add dev veth1 clsact
-	tc filter add dev veth1 proto ip ingress bpf da object-pinned \
-		${BPF_PIN_TUNNEL_DIR}/xfrm_get_state
-	ip netns exec at_ns0 ping $PING_ARG 10.1.1.200
-	sleep 1
-	grep "reqid 1" ${TRACE}
-	check_err $?
-	grep "spi 0x1" ${TRACE}
-	check_err $?
-	grep "remote ip 0xac100164" ${TRACE}
-	check_err $?
-	cleanup
-
-	if [ $ret -ne 0 ]; then
-		echo -e ${RED}"FAIL: xfrm tunnel"${NC}
-		return 1
-	fi
-	echo -e ${GREEN}"PASS: xfrm tunnel"${NC}
-}
-
 attach_bpf()
 {
 	DEV=$1
@@ -630,10 +546,6 @@ cleanup()
 	ip link del ip6geneve11 2> /dev/null
 	ip link del erspan11 2> /dev/null
 	ip link del ip6erspan11 2> /dev/null
-	ip xfrm policy delete dir out src 10.1.1.200/32 dst 10.1.1.100/32 2> /dev/null
-	ip xfrm policy delete dir in src 10.1.1.100/32 dst 10.1.1.200/32 2> /dev/null
-	ip xfrm state delete src 172.16.1.100 dst 172.16.1.200 proto esp spi 0x1 2> /dev/null
-	ip xfrm state delete src 172.16.1.200 dst 172.16.1.100 proto esp spi 0x2 2> /dev/null
 }
 
 cleanup_exit()
@@ -716,10 +628,6 @@ bpf_tunnel_test()
 	test_ip6ip6
 	errors=$(( $errors + $? ))
 
-	echo "Testing IPSec tunnel..."
-	test_xfrm_tunnel
-	errors=$(( $errors + $? ))
-
 	return $errors
 }
 
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next v4 10/10] bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
  2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
                   ` (8 preceding siblings ...)
  2023-12-04 20:56 ` [PATCH bpf-next v4 09/10] bpf: selftests: Move xfrm tunnel test to test_progs Daniel Xu
@ 2023-12-04 20:56 ` Daniel Xu
  9 siblings, 0 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-04 20:56 UTC (permalink / raw)
  To: daniel, ast, davem, hawk, john.fastabend, andrii, kuba, shuah,
	steffen.klassert, antony.antony, alexei.starovoitov,
	yonghong.song, eddyz87
  Cc: mykolal, martin.lau, song, kpsingh, sdf, haoluo, jolsa, bpf,
	linux-kselftest, linux-kernel, netdev, devel

This commit extends test_tunnel selftest to test the new XDP xfrm state
lookup kfunc.

Co-developed-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Antony Antony <antony.antony@secunet.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 .../selftests/bpf/prog_tests/test_tunnel.c    | 20 ++++++--
 .../selftests/bpf/progs/test_tunnel_kern.c    | 51 +++++++++++++++++++
 2 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index 2d7f8fa82ebd..fc804095d578 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -278,7 +278,7 @@ static int add_xfrm_tunnel(void)
 	SYS(fail,
 	    "ip netns exec at_ns0 "
 		"ip xfrm state add src %s dst %s proto esp "
-			"spi %d reqid 1 mode tunnel "
+			"spi %d reqid 1 mode tunnel replay-window 42 "
 			"auth-trunc 'hmac(sha1)' %s 96 enc 'cbc(aes)' %s",
 	    IP4_ADDR_VETH0, IP4_ADDR1_VETH1, XFRM_SPI_IN_TO_OUT, XFRM_AUTH, XFRM_ENC);
 	SYS(fail,
@@ -292,7 +292,7 @@ static int add_xfrm_tunnel(void)
 	SYS(fail,
 	    "ip netns exec at_ns0 "
 		"ip xfrm state add src %s dst %s proto esp "
-			"spi %d reqid 2 mode tunnel "
+			"spi %d reqid 2 mode tunnel replay-window 42 "
 			"auth-trunc 'hmac(sha1)' %s 96 enc 'cbc(aes)' %s",
 	    IP4_ADDR1_VETH1, IP4_ADDR_VETH0, XFRM_SPI_OUT_TO_IN, XFRM_AUTH, XFRM_ENC);
 	SYS(fail,
@@ -313,7 +313,7 @@ static int add_xfrm_tunnel(void)
 	 */
 	SYS(fail,
 	    "ip xfrm state add src %s dst %s proto esp "
-		    "spi %d reqid 1 mode tunnel "
+		    "spi %d reqid 1 mode tunnel replay-window 42 "
 		    "auth-trunc 'hmac(sha1)' %s 96  enc 'cbc(aes)' %s",
 	    IP4_ADDR_VETH0, IP4_ADDR1_VETH1, XFRM_SPI_IN_TO_OUT, XFRM_AUTH, XFRM_ENC);
 	SYS(fail,
@@ -325,7 +325,7 @@ static int add_xfrm_tunnel(void)
 	/* root -> at_ns0 */
 	SYS(fail,
 	    "ip xfrm state add src %s dst %s proto esp "
-		    "spi %d reqid 2 mode tunnel "
+		    "spi %d reqid 2 mode tunnel replay-window 42 "
 		    "auth-trunc 'hmac(sha1)' %s 96  enc 'cbc(aes)' %s",
 	    IP4_ADDR1_VETH1, IP4_ADDR_VETH0, XFRM_SPI_OUT_TO_IN, XFRM_AUTH, XFRM_ENC);
 	SYS(fail,
@@ -628,8 +628,10 @@ static void test_xfrm_tunnel(void)
 {
 	DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
 			    .attach_point = BPF_TC_INGRESS);
+	LIBBPF_OPTS(bpf_xdp_attach_opts, opts);
 	struct test_tunnel_kern *skel = NULL;
 	struct nstoken *nstoken;
+	int xdp_prog_fd;
 	int tc_prog_fd;
 	int ifindex;
 	int err;
@@ -654,6 +656,14 @@ static void test_xfrm_tunnel(void)
 	if (attach_tc_prog(&tc_hook, tc_prog_fd, -1))
 		goto done;
 
+	/* attach xdp prog to tunnel dev */
+	xdp_prog_fd = bpf_program__fd(skel->progs.xfrm_get_state_xdp);
+	if (!ASSERT_GE(xdp_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	err = bpf_xdp_attach(ifindex, xdp_prog_fd, XDP_FLAGS_REPLACE, &opts);
+	if (!ASSERT_OK(err, "bpf_xdp_attach"))
+		goto done;
+
 	/* ping from at_ns0 namespace test */
 	nstoken = open_netns("at_ns0");
 	err = test_ping(AF_INET, IP4_ADDR_TUNL_DEV1);
@@ -667,6 +677,8 @@ static void test_xfrm_tunnel(void)
 		goto done;
 	if (!ASSERT_EQ(skel->bss->xfrm_remote_ip, 0xac100164, "remote_ip"))
 		goto done;
+	if (!ASSERT_EQ(skel->bss->xfrm_replay_window, 42, "replay_window"))
+		goto done;
 
 done:
 	delete_xfrm_tunnel();
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index 3a59eb9c34de..c0dd38616562 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -30,6 +30,10 @@ int bpf_skb_set_fou_encap(struct __sk_buff *skb_ctx,
 			  struct bpf_fou_encap *encap, int type) __ksym;
 int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx,
 			  struct bpf_fou_encap *encap) __ksym;
+struct xfrm_state *
+bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts,
+		       u32 opts__sz) __ksym;
+void bpf_xdp_xfrm_state_release(struct xfrm_state *x) __ksym;
 
 struct {
 	__uint(type, BPF_MAP_TYPE_ARRAY);
@@ -950,4 +954,51 @@ int xfrm_get_state(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
+volatile int xfrm_replay_window = 0;
+
+SEC("xdp")
+int xfrm_get_state_xdp(struct xdp_md *xdp)
+{
+	struct bpf_xfrm_state_opts opts = {};
+	struct xfrm_state *x = NULL;
+	struct ip_esp_hdr *esph;
+	struct bpf_dynptr ptr;
+	u8 esph_buf[8] = {};
+	u8 iph_buf[20] = {};
+	struct iphdr *iph;
+	u32 off;
+
+	if (bpf_dynptr_from_xdp(xdp, 0, &ptr))
+		goto out;
+
+	off = sizeof(struct ethhdr);
+	iph = bpf_dynptr_slice(&ptr, off, iph_buf, sizeof(iph_buf));
+	if (!iph || iph->protocol != IPPROTO_ESP)
+		goto out;
+
+	off += sizeof(struct iphdr);
+	esph = bpf_dynptr_slice(&ptr, off, esph_buf, sizeof(esph_buf));
+	if (!esph)
+		goto out;
+
+	opts.netns_id = BPF_F_CURRENT_NETNS;
+	opts.daddr.a4 = iph->daddr;
+	opts.spi = esph->spi;
+	opts.proto = IPPROTO_ESP;
+	opts.family = AF_INET;
+
+	x = bpf_xdp_get_xfrm_state(xdp, &opts, sizeof(opts));
+	if (!x || opts.error)
+		goto out;
+
+	if (!x->replay_esn)
+		goto out;
+
+	xfrm_replay_window = x->replay_esn->replay_window;
+out:
+	if (x)
+		bpf_xdp_xfrm_state_release(x);
+	return XDP_PASS;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.42.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c
  2023-12-04 20:56 ` [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c Daniel Xu
@ 2023-12-05  1:58   ` Alexei Starovoitov
  2023-12-07 11:52   ` Steffen Klassert
  1 sibling, 0 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2023-12-05  1:58 UTC (permalink / raw)
  To: Daniel Xu
  Cc: David S. Miller, Herbert Xu, Steffen Klassert, Paolo Abeni,
	Jakub Kicinski, Eric Dumazet, antony.antony, Yonghong Song,
	Eddy Z, Network Development, LKML, bpf, devel

On Mon, Dec 4, 2023 at 12:56 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> This commit moves the contents of xfrm_interface_bpf.c into a new file,
> xfrm_bpf.c This is in preparation for adding more xfrm kfuncs. We'd like
> to keep all the bpf integrations in a single file.
>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---
>  net/xfrm/Makefile                             |  7 +------
>  net/xfrm/{xfrm_interface_bpf.c => xfrm_bpf.c} | 12 ++++++++----
>  2 files changed, 9 insertions(+), 10 deletions(-)
>  rename net/xfrm/{xfrm_interface_bpf.c => xfrm_bpf.c} (88%)
>
> diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile
> index cd47f88921f5..29fff452280d 100644
> --- a/net/xfrm/Makefile
> +++ b/net/xfrm/Makefile
> @@ -5,12 +5,6 @@
>
>  xfrm_interface-$(CONFIG_XFRM_INTERFACE) += xfrm_interface_core.o
>
> -ifeq ($(CONFIG_XFRM_INTERFACE),m)
> -xfrm_interface-$(CONFIG_DEBUG_INFO_BTF_MODULES) += xfrm_interface_bpf.o
> -else ifeq ($(CONFIG_XFRM_INTERFACE),y)
> -xfrm_interface-$(CONFIG_DEBUG_INFO_BTF) += xfrm_interface_bpf.o
> -endif
> -
>  obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
>                       xfrm_input.o xfrm_output.o \
>                       xfrm_sysctl.o xfrm_replay.o xfrm_device.o
> @@ -21,3 +15,4 @@ obj-$(CONFIG_XFRM_USER_COMPAT) += xfrm_compat.o
>  obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o
>  obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o
>  obj-$(CONFIG_XFRM_ESPINTCP) += espintcp.o
> +obj-$(CONFIG_DEBUG_INFO_BTF) += xfrm_bpf.o
...
> +#if IS_BUILTIN(CONFIG_XFRM_INTERFACE) || \
> +    (IS_MODULE(CONFIG_XFRM_INTERFACE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
> +
>  /* bpf_xfrm_info - XFRM metadata information
>   *
>   * Members:
> @@ -108,3 +110,5 @@ int __init register_xfrm_interface_bpf(void)
>         return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS,
>                                          &xfrm_interface_kfunc_set);
>  }
> +
> +#endif /* xfrm interface */

imo the original approach was cleaner.
#ifdefs in .c should be avoided when possible.
But I'm not going to insist.

ipsec folks please ack the first 3 patches.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 04/10] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
  2023-12-04 20:56 ` [PATCH bpf-next v4 04/10] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro Daniel Xu
@ 2023-12-05  4:03   ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-12-05  4:03 UTC (permalink / raw)
  To: Daniel Xu
  Cc: daniel, ast, nathan, andrii, ndesaulniers, steffen.klassert,
	antony.antony, alexei.starovoitov, yonghong.song, eddyz87,
	martin.lau, song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	trix, bpf, linux-kernel, llvm, devel, netdev, Jonathan Lemon

On Mon, Dec 4, 2023 at 12:57 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> === Motivation ===
>
> Similar to reading from CO-RE bitfields, we need a CO-RE aware bitfield
> writing wrapper to make the verifier happy.
>
> Two alternatives to this approach are:
>
> 1. Use the upcoming `preserve_static_offset` [0] attribute to disable
>    CO-RE on specific structs.
> 2. Use broader byte-sized writes to write to bitfields.
>
> (1) is a bit hard to use. It requires specific and not-very-obvious
> annotations to bpftool generated vmlinux.h. It's also not generally
> available in released LLVM versions yet.
>
> (2) makes the code quite hard to read and write. And especially if
> BPF_CORE_READ_BITFIELD() is already being used, it makes more sense to
> to have an inverse helper for writing.
>
> === Implementation details ===
>
> Since the logic is a bit non-obvious, I thought it would be helpful
> to explain exactly what's going on.
>
> To start, it helps by explaining what LSHIFT_U64 (lshift) and RSHIFT_U64
> (rshift) is designed to mean. Consider the core of the
> BPF_CORE_READ_BITFIELD() algorithm:
>
>         val <<= __CORE_RELO(s, field, LSHIFT_U64);
>         val = val >> __CORE_RELO(s, field, RSHIFT_U64);
>
> Basically what happens is we lshift to clear the non-relevant (blank)
> higher order bits. Then we rshift to bring the relevant bits (bitfield)
> down to LSB position (while also clearing blank lower order bits). To
> illustrate:
>
>         Start:    ........XXX......
>         Lshift:   XXX......00000000
>         Rshift:   00000000000000XXX
>
> where `.` means blank bit, `0` means 0 bit, and `X` means bitfield bit.
>
> After the two operations, the bitfield is ready to be interpreted as a
> regular integer.
>
> Next, we want to build an alternative (but more helpful) mental model
> on lshift and rshift. That is, to consider:
>
> * rshift as the total number of blank bits in the u64
> * lshift as number of blank bits left of the bitfield in the u64
>
> Take a moment to consider why that is true by consulting the above
> diagram.
>
> With this insight, we can now define the following relationship:
>
>               bitfield
>                  _
>                 | |
>         0.....00XXX0...00
>         |      |   |    |
>         |______|   |    |
>          lshift    |    |
>                    |____|
>               (rshift - lshift)
>
> That is, we know the number of higher order blank bits is just lshift.
> And the number of lower order blank bits is (rshift - lshift).
>
> Finally, we can examine the core of the write side algorithm:
>
>         mask = (~0ULL << rshift) >> lshift;              // 1
>         val = (val & ~mask) | ((nval << rpad) & mask);   // 2
>
> 1. Compute a mask where the set bits are the bitfield bits. The first
>    left shift zeros out exactly the number of blank bits, leaving a
>    bitfield sized set of 1s. The subsequent right shift inserts the
>    correct amount of higher order blank bits.
>
> 2. On the left of the `|`, mask out the bitfield bits. This creates
>    0s where the new bitfield bits will go. On the right of the `|`,
>    bring nval into the correct bit position and mask out any bits
>    that fall outside of the bitfield. Finally, by bor'ing the two
>    halves, we get the final set of bits to write back.
>
> [0]: https://reviews.llvm.org/D133361
> Co-developed-by: Eduard Zingerman <eddyz87@gmail.com>
> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> Co-developed-by: Jonathan Lemon <jlemon@aviatrix.com>
> Signed-off-by: Jonathan Lemon <jlemon@aviatrix.com>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---
>  tools/lib/bpf/bpf_core_read.h | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>

LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
> index 1ac57bb7ac55..7325a12692a3 100644
> --- a/tools/lib/bpf/bpf_core_read.h
> +++ b/tools/lib/bpf/bpf_core_read.h
> @@ -111,6 +111,38 @@ enum bpf_enum_value_kind {
>         val;                                                                  \
>  })
>
> +/*
> + * Write to a bitfield, identified by s->field.
> + * This is the inverse of BPF_CORE_WRITE_BITFIELD().
> + */
> +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({                  \
> +       void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET);       \
> +       unsigned int byte_size = __CORE_RELO(s, field, BYTE_SIZE);      \
> +       unsigned int lshift = __CORE_RELO(s, field, LSHIFT_U64);        \
> +       unsigned int rshift = __CORE_RELO(s, field, RSHIFT_U64);        \
> +       unsigned long long mask, val, nval = new_val;                   \
> +       unsigned int rpad = rshift - lshift;                            \
> +                                                                       \
> +       asm volatile("" : "+r"(p));                                     \
> +                                                                       \
> +       switch (byte_size) {                                            \
> +       case 1: val = *(unsigned char *)p; break;                       \
> +       case 2: val = *(unsigned short *)p; break;                      \
> +       case 4: val = *(unsigned int *)p; break;                        \
> +       case 8: val = *(unsigned long long *)p; break;                  \
> +       }                                                               \
> +                                                                       \
> +       mask = (~0ULL << rshift) >> lshift;                             \
> +       val = (val & ~mask) | ((nval << rpad) & mask);                  \
> +                                                                       \
> +       switch (byte_size) {                                            \
> +       case 1: *(unsigned char *)p      = val; break;                  \
> +       case 2: *(unsigned short *)p     = val; break;                  \
> +       case 4: *(unsigned int *)p       = val; break;                  \
> +       case 8: *(unsigned long long *)p = val; break;                  \
> +       }                                                               \
> +})
> +
>  #define ___bpf_field_ref1(field)       (field)
>  #define ___bpf_field_ref2(type, field) (((typeof(type) *)0)->field)
>  #define ___bpf_field_ref(args...)                                          \
> --
> 2.42.1
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 06/10] libbpf: selftests: Add verifier tests for CO-RE bitfield writes
  2023-12-04 20:56 ` [PATCH bpf-next v4 06/10] libbpf: selftests: Add verifier tests for CO-RE bitfield writes Daniel Xu
@ 2023-12-05  4:05   ` Andrii Nakryiko
  0 siblings, 0 replies; 22+ messages in thread
From: Andrii Nakryiko @ 2023-12-05  4:05 UTC (permalink / raw)
  To: Daniel Xu
  Cc: daniel, shuah, andrii, ast, steffen.klassert, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87, mykolal, martin.lau,
	song, john.fastabend, kpsingh, sdf, haoluo, jolsa, linux-kernel,
	bpf, linux-kselftest, devel, netdev

On Mon, Dec 4, 2023 at 12:57 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> Add some tests that exercise BPF_CORE_WRITE_BITFIELD() macro. Since some
> non-trivial bit fiddling is going on, make sure various edge cases (such
> as adjacent bitfields and bitfields at the edge of structs) are
> exercised.
>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---

nit: please drop "libbpf: " prefix from the patch subject, this is
"selftests/bpf: " actually


>  .../selftests/bpf/prog_tests/verifier.c       |   2 +
>  .../bpf/progs/verifier_bitfield_write.c       | 100 ++++++++++++++++++
>  2 files changed, 102 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/verifier_bitfield_write.c
>

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c
  2023-12-04 20:56 ` [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c Daniel Xu
  2023-12-05  1:58   ` Alexei Starovoitov
@ 2023-12-07 11:52   ` Steffen Klassert
  2023-12-07 21:08     ` [devel-ipsec] " Eyal Birger
  1 sibling, 1 reply; 22+ messages in thread
From: Steffen Klassert @ 2023-12-07 11:52 UTC (permalink / raw)
  To: Daniel Xu
  Cc: davem, Herbert Xu, pabeni, kuba, edumazet, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87, netdev, linux-kernel,
	bpf, devel, Eyal Birger

On Mon, Dec 04, 2023 at 01:56:21PM -0700, Daniel Xu wrote:
> This commit moves the contents of xfrm_interface_bpf.c into a new file,
> xfrm_bpf.c This is in preparation for adding more xfrm kfuncs. We'd like
> to keep all the bpf integrations in a single file.
> 
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---
>  net/xfrm/Makefile                             |  7 +------
>  net/xfrm/{xfrm_interface_bpf.c => xfrm_bpf.c} | 12 ++++++++----
>  2 files changed, 9 insertions(+), 10 deletions(-)
>  rename net/xfrm/{xfrm_interface_bpf.c => xfrm_bpf.c} (88%)

Adding Eyal to Cc, he added the xfrm_interface_bpf.c file.

Acked-by: Steffen Klassert <steffen.klassert@secunet.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
  2023-12-04 20:56 ` [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
@ 2023-12-07 11:53   ` Steffen Klassert
  2023-12-07 21:21   ` Eyal Birger
  1 sibling, 0 replies; 22+ messages in thread
From: Steffen Klassert @ 2023-12-07 11:53 UTC (permalink / raw)
  To: Daniel Xu
  Cc: ast, daniel, davem, Herbert Xu, pabeni, hawk, john.fastabend,
	kuba, edumazet, antony.antony, alexei.starovoitov, yonghong.song,
	eddyz87, netdev, linux-kernel, bpf, devel

On Mon, Dec 04, 2023 at 01:56:22PM -0700, Daniel Xu wrote:
> This commit adds an unstable kfunc helper to access internal xfrm_state
> associated with an SA. This is intended to be used for the upcoming
> IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
> words: for custom software RSS.
> 
> That being said, the function that this kfunc wraps is fairly generic
> and used for a lot of xfrm tasks. I'm sure people will find uses
> elsewhere over time.
> 
> Co-developed-by: Antony Antony <antony.antony@secunet.com>
> Signed-off-by: Antony Antony <antony.antony@secunet.com>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>

Acked-by: Steffen Klassert <steffen.klassert@secunet.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 03/10] bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
  2023-12-04 20:56 ` [PATCH bpf-next v4 03/10] bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc Daniel Xu
@ 2023-12-07 11:54   ` Steffen Klassert
  0 siblings, 0 replies; 22+ messages in thread
From: Steffen Klassert @ 2023-12-07 11:54 UTC (permalink / raw)
  To: Daniel Xu
  Cc: ast, daniel, davem, Herbert Xu, pabeni, hawk, john.fastabend,
	kuba, edumazet, antony.antony, alexei.starovoitov, yonghong.song,
	eddyz87, netdev, linux-kernel, bpf, devel

On Mon, Dec 04, 2023 at 01:56:23PM -0700, Daniel Xu wrote:
> This kfunc releases a previously acquired xfrm_state from
> bpf_xdp_get_xfrm_state().
> 
> Co-developed-by: Antony Antony <antony.antony@secunet.com>
> Signed-off-by: Antony Antony <antony.antony@secunet.com>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>

Acked-by: Steffen Klassert <steffen.klassert@secunet.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [devel-ipsec] [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c
  2023-12-07 11:52   ` Steffen Klassert
@ 2023-12-07 21:08     ` Eyal Birger
  2023-12-08  8:35       ` Steffen Klassert
  2023-12-09  0:04       ` Daniel Xu
  0 siblings, 2 replies; 22+ messages in thread
From: Eyal Birger @ 2023-12-07 21:08 UTC (permalink / raw)
  To: Daniel Xu
  Cc: Herbert Xu, Steffen Klassert, netdev, linux-kernel,
	alexei.starovoitov, devel, eddyz87, edumazet, Eyal Birger,
	yonghong.song, kuba, bpf, pabeni, davem

Hi Daniel,

On Thu, Dec 7, 2023 at 3:52 AM Steffen Klassert via Devel
<devel@linux-ipsec.org> wrote:
>
> On Mon, Dec 04, 2023 at 01:56:21PM -0700, Daniel Xu wrote:
> > This commit moves the contents of xfrm_interface_bpf.c into a new file,
> > xfrm_bpf.c This is in preparation for adding more xfrm kfuncs. We'd like
> > to keep all the bpf integrations in a single file.

This takes away the nice ability to reload the xfrm interface
related kfuncs when reloading the xfrm interface.

I also find it a little strange that the kfuncs would be available
when the xfrm interface isn't loaded.

So imho it makes sense that these kfuncs would be built
as part of the module and not as part of the core.

Eyal.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
  2023-12-04 20:56 ` [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
  2023-12-07 11:53   ` Steffen Klassert
@ 2023-12-07 21:21   ` Eyal Birger
  2023-12-09  0:07     ` Daniel Xu
  1 sibling, 1 reply; 22+ messages in thread
From: Eyal Birger @ 2023-12-07 21:21 UTC (permalink / raw)
  To: Daniel Xu
  Cc: ast, daniel, davem, Herbert Xu, steffen.klassert, pabeni, hawk,
	john.fastabend, kuba, edumazet, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87, netdev, linux-kernel,
	bpf, devel

On Mon, Dec 4, 2023 at 12:57 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> This commit adds an unstable kfunc helper to access internal xfrm_state
> associated with an SA. This is intended to be used for the upcoming
> IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
> words: for custom software RSS.
>
> That being said, the function that this kfunc wraps is fairly generic
> and used for a lot of xfrm tasks. I'm sure people will find uses
> elsewhere over time.
>
> Co-developed-by: Antony Antony <antony.antony@secunet.com>
> Signed-off-by: Antony Antony <antony.antony@secunet.com>
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---
>  include/net/xfrm.h     |   9 ++++
>  net/xfrm/xfrm_bpf.c    | 102 +++++++++++++++++++++++++++++++++++++++++
>  net/xfrm/xfrm_policy.c |   2 +
>  3 files changed, 113 insertions(+)
>
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index c9bb0f892f55..1d107241b901 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -2190,4 +2190,13 @@ static inline int register_xfrm_interface_bpf(void)
>
>  #endif
>
> +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF)
> +int register_xfrm_state_bpf(void);
> +#else
> +static inline int register_xfrm_state_bpf(void)
> +{
> +       return 0;
> +}
> +#endif
> +
>  #endif /* _NET_XFRM_H */
> diff --git a/net/xfrm/xfrm_bpf.c b/net/xfrm/xfrm_bpf.c
> index 3d3018b87f96..3d6cac7345ca 100644
> --- a/net/xfrm/xfrm_bpf.c
> +++ b/net/xfrm/xfrm_bpf.c
> @@ -6,9 +6,11 @@
>   */
>
>  #include <linux/bpf.h>
> +#include <linux/btf.h>
>  #include <linux/btf_ids.h>
>
>  #include <net/dst_metadata.h>
> +#include <net/xdp.h>
>  #include <net/xfrm.h>
>
>  #if IS_BUILTIN(CONFIG_XFRM_INTERFACE) || \
> @@ -112,3 +114,103 @@ int __init register_xfrm_interface_bpf(void)
>  }
>
>  #endif /* xfrm interface */
> +
> +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers
> + *
> + * Members:
> + * @error      - Out parameter, set for any errors encountered
> + *              Values:
> + *                -EINVAL - netns_id is less than -1
> + *                -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ
> + *                -ENONET - No network namespace found for netns_id
> + * @netns_id   - Specify the network namespace for lookup
> + *              Values:
> + *                BPF_F_CURRENT_NETNS (-1)
> + *                  Use namespace associated with ctx
> + *                [0, S32_MAX]
> + *                  Network Namespace ID
> + * @mark       - XFRM mark to match on
> + * @daddr      - Destination address to match on
> + * @spi                - Security parameter index to match on
> + * @proto      - L3 protocol to match on
> + * @family     - L3 protocol family to match on
> + */
> +struct bpf_xfrm_state_opts {
> +       s32 error;
> +       s32 netns_id;
> +       u32 mark;
> +       xfrm_address_t daddr;
> +       __be32 spi;
> +       u8 proto;
> +       u16 family;
> +};
> +
> +enum {
> +       BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts),
> +};
> +
> +__bpf_kfunc_start_defs();
> +
> +/* bpf_xdp_get_xfrm_state - Get XFRM state
> + *
> + * Parameters:
> + * @ctx        - Pointer to ctx (xdp_md) in XDP program
> + *                 Cannot be NULL
> + * @opts       - Options for lookup (documented above)
> + *                 Cannot be NULL
> + * @opts__sz   - Length of the bpf_xfrm_state_opts structure
> + *                 Must be BPF_XFRM_STATE_OPTS_SZ
> + */
> +__bpf_kfunc struct xfrm_state *
> +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz)
> +{
> +       struct xdp_buff *xdp = (struct xdp_buff *)ctx;
> +       struct net *net = dev_net(xdp->rxq->dev);
> +       struct xfrm_state *x;
> +
> +       if (!opts || opts__sz < sizeof(opts->error))
> +               return NULL;
> +
> +       if (opts__sz != BPF_XFRM_STATE_OPTS_SZ) {
> +               opts->error = -EINVAL;
> +               return NULL;
> +       }
> +
> +       if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) {
> +               opts->error = -EINVAL;
> +               return NULL;
> +       }
> +
> +       if (opts->netns_id >= 0) {
> +               net = get_net_ns_by_id(net, opts->netns_id);
> +               if (unlikely(!net)) {
> +                       opts->error = -ENONET;
> +                       return NULL;
> +               }
> +       }
> +
> +       x = xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi,
> +                             opts->proto, opts->family);
> +
> +       if (opts->netns_id >= 0)
> +               put_net(net);

Maybe opts->error should be set to something like -ENOENT if x == NULL?

> +
> +       return x;
> +}
> +
> +__bpf_kfunc_end_defs();
> +
> +BTF_SET8_START(xfrm_state_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_xdp_get_xfrm_state, KF_RET_NULL | KF_ACQUIRE)
> +BTF_SET8_END(xfrm_state_kfunc_set)
> +
> +static const struct btf_kfunc_id_set xfrm_state_xdp_kfunc_set = {
> +       .owner = THIS_MODULE,
> +       .set   = &xfrm_state_kfunc_set,
> +};
> +
> +int __init register_xfrm_state_bpf(void)
> +{
> +       return register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP,
> +                                        &xfrm_state_xdp_kfunc_set);
> +}
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index c13dc3ef7910..1b7e75159727 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -4218,6 +4218,8 @@ void __init xfrm_init(void)
>  #ifdef CONFIG_XFRM_ESPINTCP
>         espintcp_init();
>  #endif
> +
> +       register_xfrm_state_bpf();
>  }
>
>  #ifdef CONFIG_AUDITSYSCALL
> --
> 2.42.1
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [devel-ipsec] [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c
  2023-12-07 21:08     ` [devel-ipsec] " Eyal Birger
@ 2023-12-08  8:35       ` Steffen Klassert
  2023-12-09  0:04       ` Daniel Xu
  1 sibling, 0 replies; 22+ messages in thread
From: Steffen Klassert @ 2023-12-08  8:35 UTC (permalink / raw)
  To: Eyal Birger
  Cc: Daniel Xu, Herbert Xu, netdev, linux-kernel, alexei.starovoitov,
	devel, eddyz87, edumazet, Eyal Birger, yonghong.song, kuba, bpf,
	pabeni, davem

On Thu, Dec 07, 2023 at 01:08:08PM -0800, Eyal Birger wrote:
> Hi Daniel,
> 
> On Thu, Dec 7, 2023 at 3:52 AM Steffen Klassert via Devel
> <devel@linux-ipsec.org> wrote:
> >
> > On Mon, Dec 04, 2023 at 01:56:21PM -0700, Daniel Xu wrote:
> > > This commit moves the contents of xfrm_interface_bpf.c into a new file,
> > > xfrm_bpf.c This is in preparation for adding more xfrm kfuncs. We'd like
> > > to keep all the bpf integrations in a single file.
> 
> This takes away the nice ability to reload the xfrm interface
> related kfuncs when reloading the xfrm interface.
> 
> I also find it a little strange that the kfuncs would be available
> when the xfrm interface isn't loaded.
> 
> So imho it makes sense that these kfuncs would be built
> as part of the module and not as part of the core.

I proposed to merge all the bpf extensions into one file.
With that I wanted to avoid to have 'many' files under
/net/xfrm that fall basically under bpf maintainance
scope. But if there are practical reasons to have these
spilted, I'm OK with that too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [devel-ipsec] [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c
  2023-12-07 21:08     ` [devel-ipsec] " Eyal Birger
  2023-12-08  8:35       ` Steffen Klassert
@ 2023-12-09  0:04       ` Daniel Xu
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-09  0:04 UTC (permalink / raw)
  To: Eyal Birger
  Cc: Herbert Xu, Steffen Klassert, netdev, linux-kernel,
	alexei.starovoitov, devel, eddyz87, edumazet, Eyal Birger,
	yonghong.song, kuba, bpf, pabeni, davem

Hi Eyal,

On Thu, Dec 07, 2023 at 01:08:08PM -0800, Eyal Birger wrote:
> Hi Daniel,
> 
> On Thu, Dec 7, 2023 at 3:52 AM Steffen Klassert via Devel
> <devel@linux-ipsec.org> wrote:
> >
> > On Mon, Dec 04, 2023 at 01:56:21PM -0700, Daniel Xu wrote:
> > > This commit moves the contents of xfrm_interface_bpf.c into a new file,
> > > xfrm_bpf.c This is in preparation for adding more xfrm kfuncs. We'd like
> > > to keep all the bpf integrations in a single file.
> 
> This takes away the nice ability to reload the xfrm interface
> related kfuncs when reloading the xfrm interface.
> 
> I also find it a little strange that the kfuncs would be available
> when the xfrm interface isn't loaded.

I think technically since the xfrm interface module does the kfunc
registration, the kfuncs would only be available after the module is
loaded.

> 
> So imho it makes sense that these kfuncs would be built
> as part of the module and not as part of the core.

But yeah, point taken. I will revert this commit for v5.

> 
> Eyal.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
  2023-12-07 21:21   ` Eyal Birger
@ 2023-12-09  0:07     ` Daniel Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Xu @ 2023-12-09  0:07 UTC (permalink / raw)
  To: Eyal Birger
  Cc: ast, daniel, davem, Herbert Xu, steffen.klassert, pabeni, hawk,
	john.fastabend, kuba, edumazet, antony.antony,
	alexei.starovoitov, yonghong.song, eddyz87, netdev, linux-kernel,
	bpf, devel

On Thu, Dec 07, 2023 at 01:21:11PM -0800, Eyal Birger wrote:
> On Mon, Dec 4, 2023 at 12:57 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
> >
> > This commit adds an unstable kfunc helper to access internal xfrm_state
> > associated with an SA. This is intended to be used for the upcoming
> > IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
> > words: for custom software RSS.
> >
> > That being said, the function that this kfunc wraps is fairly generic
> > and used for a lot of xfrm tasks. I'm sure people will find uses
> > elsewhere over time.
> >
> > Co-developed-by: Antony Antony <antony.antony@secunet.com>
> > Signed-off-by: Antony Antony <antony.antony@secunet.com>
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> > ---
> >  include/net/xfrm.h     |   9 ++++
> >  net/xfrm/xfrm_bpf.c    | 102 +++++++++++++++++++++++++++++++++++++++++
> >  net/xfrm/xfrm_policy.c |   2 +
> >  3 files changed, 113 insertions(+)
> >
> > diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> > index c9bb0f892f55..1d107241b901 100644
> > --- a/include/net/xfrm.h
> > +++ b/include/net/xfrm.h
> > @@ -2190,4 +2190,13 @@ static inline int register_xfrm_interface_bpf(void)
> >
> >  #endif
> >
> > +#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF)
> > +int register_xfrm_state_bpf(void);
> > +#else
> > +static inline int register_xfrm_state_bpf(void)
> > +{
> > +       return 0;
> > +}
> > +#endif
> > +
> >  #endif /* _NET_XFRM_H */
> > diff --git a/net/xfrm/xfrm_bpf.c b/net/xfrm/xfrm_bpf.c
> > index 3d3018b87f96..3d6cac7345ca 100644
> > --- a/net/xfrm/xfrm_bpf.c
> > +++ b/net/xfrm/xfrm_bpf.c
> > @@ -6,9 +6,11 @@
> >   */
> >
> >  #include <linux/bpf.h>
> > +#include <linux/btf.h>
> >  #include <linux/btf_ids.h>
> >
> >  #include <net/dst_metadata.h>
> > +#include <net/xdp.h>
> >  #include <net/xfrm.h>
> >
> >  #if IS_BUILTIN(CONFIG_XFRM_INTERFACE) || \
> > @@ -112,3 +114,103 @@ int __init register_xfrm_interface_bpf(void)
> >  }
> >
> >  #endif /* xfrm interface */
> > +
> > +/* bpf_xfrm_state_opts - Options for XFRM state lookup helpers
> > + *
> > + * Members:
> > + * @error      - Out parameter, set for any errors encountered
> > + *              Values:
> > + *                -EINVAL - netns_id is less than -1
> > + *                -EINVAL - opts__sz isn't BPF_XFRM_STATE_OPTS_SZ
> > + *                -ENONET - No network namespace found for netns_id
> > + * @netns_id   - Specify the network namespace for lookup
> > + *              Values:
> > + *                BPF_F_CURRENT_NETNS (-1)
> > + *                  Use namespace associated with ctx
> > + *                [0, S32_MAX]
> > + *                  Network Namespace ID
> > + * @mark       - XFRM mark to match on
> > + * @daddr      - Destination address to match on
> > + * @spi                - Security parameter index to match on
> > + * @proto      - L3 protocol to match on
> > + * @family     - L3 protocol family to match on
> > + */
> > +struct bpf_xfrm_state_opts {
> > +       s32 error;
> > +       s32 netns_id;
> > +       u32 mark;
> > +       xfrm_address_t daddr;
> > +       __be32 spi;
> > +       u8 proto;
> > +       u16 family;
> > +};
> > +
> > +enum {
> > +       BPF_XFRM_STATE_OPTS_SZ = sizeof(struct bpf_xfrm_state_opts),
> > +};
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +/* bpf_xdp_get_xfrm_state - Get XFRM state
> > + *
> > + * Parameters:
> > + * @ctx        - Pointer to ctx (xdp_md) in XDP program
> > + *                 Cannot be NULL
> > + * @opts       - Options for lookup (documented above)
> > + *                 Cannot be NULL
> > + * @opts__sz   - Length of the bpf_xfrm_state_opts structure
> > + *                 Must be BPF_XFRM_STATE_OPTS_SZ
> > + */
> > +__bpf_kfunc struct xfrm_state *
> > +bpf_xdp_get_xfrm_state(struct xdp_md *ctx, struct bpf_xfrm_state_opts *opts, u32 opts__sz)
> > +{
> > +       struct xdp_buff *xdp = (struct xdp_buff *)ctx;
> > +       struct net *net = dev_net(xdp->rxq->dev);
> > +       struct xfrm_state *x;
> > +
> > +       if (!opts || opts__sz < sizeof(opts->error))
> > +               return NULL;
> > +
> > +       if (opts__sz != BPF_XFRM_STATE_OPTS_SZ) {
> > +               opts->error = -EINVAL;
> > +               return NULL;
> > +       }
> > +
> > +       if (unlikely(opts->netns_id < BPF_F_CURRENT_NETNS)) {
> > +               opts->error = -EINVAL;
> > +               return NULL;
> > +       }
> > +
> > +       if (opts->netns_id >= 0) {
> > +               net = get_net_ns_by_id(net, opts->netns_id);
> > +               if (unlikely(!net)) {
> > +                       opts->error = -ENONET;
> > +                       return NULL;
> > +               }
> > +       }
> > +
> > +       x = xfrm_state_lookup(net, opts->mark, &opts->daddr, opts->spi,
> > +                             opts->proto, opts->family);
> > +
> > +       if (opts->netns_id >= 0)
> > +               put_net(net);
> 
> Maybe opts->error should be set to something like -ENOENT if x == NULL?

Originally I opted not to do that b/c xfrm_state_lookup() chooses not to
do anything like that (eg PTR_ERR()).

But I don't mind adding it - I think it's reasonable either way.

[..]

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-12-09  0:08 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-04 20:56 [PATCH bpf-next v4 00/10] Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
2023-12-04 20:56 ` [PATCH bpf-next v4 01/10] xfrm: bpf: Move xfrm_interface_bpf.c to xfrm_bpf.c Daniel Xu
2023-12-05  1:58   ` Alexei Starovoitov
2023-12-07 11:52   ` Steffen Klassert
2023-12-07 21:08     ` [devel-ipsec] " Eyal Birger
2023-12-08  8:35       ` Steffen Klassert
2023-12-09  0:04       ` Daniel Xu
2023-12-04 20:56 ` [PATCH bpf-next v4 02/10] bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc Daniel Xu
2023-12-07 11:53   ` Steffen Klassert
2023-12-07 21:21   ` Eyal Birger
2023-12-09  0:07     ` Daniel Xu
2023-12-04 20:56 ` [PATCH bpf-next v4 03/10] bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc Daniel Xu
2023-12-07 11:54   ` Steffen Klassert
2023-12-04 20:56 ` [PATCH bpf-next v4 04/10] libbpf: Add BPF_CORE_WRITE_BITFIELD() macro Daniel Xu
2023-12-05  4:03   ` Andrii Nakryiko
2023-12-04 20:56 ` [PATCH bpf-next v4 05/10] bpf: selftests: test_loader: Support __btf_path() annotation Daniel Xu
2023-12-04 20:56 ` [PATCH bpf-next v4 06/10] libbpf: selftests: Add verifier tests for CO-RE bitfield writes Daniel Xu
2023-12-05  4:05   ` Andrii Nakryiko
2023-12-04 20:56 ` [PATCH bpf-next v4 07/10] bpf: selftests: test_tunnel: Setup fresh topology for each subtest Daniel Xu
2023-12-04 20:56 ` [PATCH bpf-next v4 08/10] bpf: selftests: test_tunnel: Use vmlinux.h declarations Daniel Xu
2023-12-04 20:56 ` [PATCH bpf-next v4 09/10] bpf: selftests: Move xfrm tunnel test to test_progs Daniel Xu
2023-12-04 20:56 ` [PATCH bpf-next v4 10/10] bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state() Daniel Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.