All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH ipsec-next,v2 0/3] xfrm: interface: Add unstable helpers for XFRM metadata
@ 2022-11-29 13:20 Eyal Birger
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 1/3] xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c Eyal Birger
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eyal Birger @ 2022-11-29 13:20 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, steffen.klassert, herbert, andrii,
	daniel, nicolas.dichtel, razor, mykolal, ast, martin.lau, song,
	yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah
  Cc: netdev, bpf, linux-kselftest, Eyal Birger

This patch series adds xfrm metadata helpers using the unstable kfunc
call interface for the TC-BPF hooks.

This allows steering traffic towards different IPsec connections based
on logic implemented in bpf programs.

The helpers are integrated into the xfrm_interface module. For this
purpose the main functionality of this module is moved to
xfrm_interface_core.c.

Eyal Birger (3):
  xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c
  xfrm: interface: Add unstable helpers for setting/getting XFRM
    metadata from TC-BPF
  selftests/bpf: add xfrm_info tests

 include/net/dst_metadata.h                    |   1 +
 include/net/xfrm.h                            |  20 +
 net/core/dst.c                                |   8 +-
 net/xfrm/Makefile                             |   8 +
 net/xfrm/xfrm_interface_bpf.c                 | 100 +++++
 ...xfrm_interface.c => xfrm_interface_core.c} |  15 +
 tools/testing/selftests/bpf/config            |   2 +
 .../selftests/bpf/prog_tests/test_xfrm_info.c | 343 ++++++++++++++++++
 .../selftests/bpf/progs/test_xfrm_info_kern.c |  74 ++++
 9 files changed, 569 insertions(+), 2 deletions(-)
 create mode 100644 net/xfrm/xfrm_interface_bpf.c
 rename net/xfrm/{xfrm_interface.c => xfrm_interface_core.c} (98%)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH ipsec-next,v2 1/3] xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c
  2022-11-29 13:20 [PATCH ipsec-next,v2 0/3] xfrm: interface: Add unstable helpers for XFRM metadata Eyal Birger
@ 2022-11-29 13:20 ` Eyal Birger
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF Eyal Birger
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests Eyal Birger
  2 siblings, 0 replies; 17+ messages in thread
From: Eyal Birger @ 2022-11-29 13:20 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, steffen.klassert, herbert, andrii,
	daniel, nicolas.dichtel, razor, mykolal, ast, martin.lau, song,
	yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah
  Cc: netdev, bpf, linux-kselftest, Eyal Birger

This change allows adding additional files to the xfrm_interface module.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
---
 net/xfrm/Makefile                                    | 2 ++
 net/xfrm/{xfrm_interface.c => xfrm_interface_core.c} | 0
 2 files changed, 2 insertions(+)
 rename net/xfrm/{xfrm_interface.c => xfrm_interface_core.c} (100%)

diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile
index 494aa744bfb9..08a2870fdd36 100644
--- a/net/xfrm/Makefile
+++ b/net/xfrm/Makefile
@@ -3,6 +3,8 @@
 # Makefile for the XFRM subsystem.
 #
 
+xfrm_interface-$(CONFIG_XFRM_INTERFACE) += xfrm_interface_core.o
+
 obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
 		      xfrm_input.o xfrm_output.o \
 		      xfrm_sysctl.o xfrm_replay.o xfrm_device.o
diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface_core.c
similarity index 100%
rename from net/xfrm/xfrm_interface.c
rename to net/xfrm/xfrm_interface_core.c
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
  2022-11-29 13:20 [PATCH ipsec-next,v2 0/3] xfrm: interface: Add unstable helpers for XFRM metadata Eyal Birger
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 1/3] xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c Eyal Birger
@ 2022-11-29 13:20 ` Eyal Birger
  2022-11-30 18:14   ` Martin KaFai Lau
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests Eyal Birger
  2 siblings, 1 reply; 17+ messages in thread
From: Eyal Birger @ 2022-11-29 13:20 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, steffen.klassert, herbert, andrii,
	daniel, nicolas.dichtel, razor, mykolal, ast, martin.lau, song,
	yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah
  Cc: netdev, bpf, linux-kselftest, Eyal Birger

This change adds xfrm metadata helpers using the unstable kfunc call
interface for the TC-BPF hooks. This allows steering traffic towards
different IPsec connections based on logic implemented in bpf programs.

This object is built based on the availabilty of BTF debug info.

The metadata percpu dsts used on TX take ownership of the original skb
dsts so that they may be used as part of the xfrm transmittion logic -
e.g.  for MTU calculations.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>

---

v2: changed added following points raised by Martin KaFai Lau:
  - make sure dst is refcounted prior to caching
  - free dst_orig regardless of CONFIG_DST_CACHE
  - call xfrm interface bpf cleanup in case of kfunc registration errors
---
 include/net/dst_metadata.h     |   1 +
 include/net/xfrm.h             |  20 +++++++
 net/core/dst.c                 |   8 ++-
 net/xfrm/Makefile              |   6 ++
 net/xfrm/xfrm_interface_bpf.c  | 100 +++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_interface_core.c |  15 +++++
 6 files changed, 148 insertions(+), 2 deletions(-)
 create mode 100644 net/xfrm/xfrm_interface_bpf.c

diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index a454cf4327fe..1b7fae4c6b24 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -26,6 +26,7 @@ struct macsec_info {
 struct xfrm_md_info {
 	u32 if_id;
 	int link;
+	struct dst_entry *dst_orig;
 };
 
 struct metadata_dst {
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e0cc6791c001..5e5fea3087b6 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -2086,4 +2086,24 @@ static inline bool xfrm6_local_dontfrag(const struct sock *sk)
 	return false;
 }
 #endif
+
+#if (IS_BUILTIN(CONFIG_XFRM_INTERFACE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) || \
+    (IS_MODULE(CONFIG_XFRM_INTERFACE) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
+
+extern int register_xfrm_interface_bpf(void);
+extern void cleanup_xfrm_interface_bpf(void);
+
+#else
+
+static inline int register_xfrm_interface_bpf(void)
+{
+	return 0;
+}
+
+static inline void cleanup_xfrm_interface_bpf(void)
+{
+}
+
+#endif
+
 #endif	/* _NET_XFRM_H */
diff --git a/net/core/dst.c b/net/core/dst.c
index bc9c9be4e080..bb14a0392388 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -316,6 +316,8 @@ void metadata_dst_free(struct metadata_dst *md_dst)
 	if (md_dst->type == METADATA_IP_TUNNEL)
 		dst_cache_destroy(&md_dst->u.tun_info.dst_cache);
 #endif
+	if (md_dst->type == METADATA_XFRM)
+		dst_release(md_dst->u.xfrm_info.dst_orig);
 	kfree(md_dst);
 }
 EXPORT_SYMBOL_GPL(metadata_dst_free);
@@ -340,16 +342,18 @@ EXPORT_SYMBOL_GPL(metadata_dst_alloc_percpu);
 
 void metadata_dst_free_percpu(struct metadata_dst __percpu *md_dst)
 {
-#ifdef CONFIG_DST_CACHE
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
 		struct metadata_dst *one_md_dst = per_cpu_ptr(md_dst, cpu);
 
+#ifdef CONFIG_DST_CACHE
 		if (one_md_dst->type == METADATA_IP_TUNNEL)
 			dst_cache_destroy(&one_md_dst->u.tun_info.dst_cache);
-	}
 #endif
+		if (one_md_dst->type == METADATA_XFRM)
+			dst_release(one_md_dst->u.xfrm_info.dst_orig);
+	}
 	free_percpu(md_dst);
 }
 EXPORT_SYMBOL_GPL(metadata_dst_free_percpu);
diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile
index 08a2870fdd36..cd47f88921f5 100644
--- a/net/xfrm/Makefile
+++ b/net/xfrm/Makefile
@@ -5,6 +5,12 @@
 
 xfrm_interface-$(CONFIG_XFRM_INTERFACE) += xfrm_interface_core.o
 
+ifeq ($(CONFIG_XFRM_INTERFACE),m)
+xfrm_interface-$(CONFIG_DEBUG_INFO_BTF_MODULES) += xfrm_interface_bpf.o
+else ifeq ($(CONFIG_XFRM_INTERFACE),y)
+xfrm_interface-$(CONFIG_DEBUG_INFO_BTF) += xfrm_interface_bpf.o
+endif
+
 obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
 		      xfrm_input.o xfrm_output.o \
 		      xfrm_sysctl.o xfrm_replay.o xfrm_device.o
diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c
new file mode 100644
index 000000000000..757e15857dbf
--- /dev/null
+++ b/net/xfrm/xfrm_interface_bpf.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Unstable XFRM Helpers for TC-BPF hook
+ *
+ * These are called from SCHED_CLS BPF programs. Note that it is
+ * allowed to break compatibility for these functions since the interface they
+ * are exposed through to BPF programs is explicitly unstable.
+ */
+
+#include <linux/bpf.h>
+#include <linux/btf_ids.h>
+
+#include <net/dst_metadata.h>
+#include <net/xfrm.h>
+
+struct bpf_xfrm_info {
+	u32 if_id;
+	int link;
+};
+
+static struct metadata_dst __percpu *xfrm_md_dst;
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in xfrm_interface BTF");
+
+__used noinline
+int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to)
+{
+	struct sk_buff *skb = (struct sk_buff *)skb_ctx;
+	struct xfrm_md_info *info;
+
+	memset(to, 0, sizeof(*to));
+
+	info = skb_xfrm_md_info(skb);
+	if (!info)
+		return -EINVAL;
+
+	to->if_id = info->if_id;
+	to->link = info->link;
+	return 0;
+}
+
+__used noinline
+int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
+			  const struct bpf_xfrm_info *from)
+{
+	struct sk_buff *skb = (struct sk_buff *)skb_ctx;
+	struct metadata_dst *md_dst;
+	struct xfrm_md_info *info;
+
+	if (unlikely(skb_metadata_dst(skb)))
+		return -EINVAL;
+
+	md_dst = this_cpu_ptr(xfrm_md_dst);
+
+	info = &md_dst->u.xfrm_info;
+	memset(info, 0, sizeof(*info));
+
+	info->if_id = from->if_id;
+	info->link = from->link;
+	skb_dst_force(skb);
+	info->dst_orig = skb_dst(skb);
+
+	dst_hold((struct dst_entry *)md_dst);
+	skb_dst_set(skb, (struct dst_entry *)md_dst);
+	return 0;
+}
+
+__diag_pop()
+
+BTF_SET8_START(xfrm_ifc_kfunc_set)
+BTF_ID_FLAGS(func, bpf_skb_get_xfrm_info)
+BTF_ID_FLAGS(func, bpf_skb_set_xfrm_info)
+BTF_SET8_END(xfrm_ifc_kfunc_set)
+
+static const struct btf_kfunc_id_set xfrm_interface_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &xfrm_ifc_kfunc_set,
+};
+
+int __init register_xfrm_interface_bpf(void)
+{
+	int err;
+
+	xfrm_md_dst = metadata_dst_alloc_percpu(0, METADATA_XFRM,
+						GFP_KERNEL);
+	if (!xfrm_md_dst)
+		return -ENOMEM;
+	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS,
+					&xfrm_interface_kfunc_set);
+	if (err < 0) {
+		cleanup_xfrm_interface_bpf();
+		return err;
+	}
+	return 0;
+}
+
+void __exit cleanup_xfrm_interface_bpf(void)
+{
+	metadata_dst_free_percpu(xfrm_md_dst);
+}
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c
index 5a67b120c4db..1e1e8e965939 100644
--- a/net/xfrm/xfrm_interface_core.c
+++ b/net/xfrm/xfrm_interface_core.c
@@ -396,6 +396,14 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 
 		if_id = md_info->if_id;
 		fl->flowi_oif = md_info->link;
+		if (md_info->dst_orig) {
+			struct dst_entry *tmp_dst = dst;
+
+			dst = md_info->dst_orig;
+			skb_dst_set(skb, dst);
+			md_info->dst_orig = NULL;
+			dst_release(tmp_dst);
+		}
 	} else {
 		if_id = xi->p.if_id;
 	}
@@ -1162,12 +1170,18 @@ static int __init xfrmi_init(void)
 	if (err < 0)
 		goto rtnl_link_failed;
 
+	err = register_xfrm_interface_bpf();
+	if (err < 0)
+		goto kfunc_failed;
+
 	lwtunnel_encap_add_ops(&xfrmi_encap_ops, LWTUNNEL_ENCAP_XFRM);
 
 	xfrm_if_register_cb(&xfrm_if_cb);
 
 	return err;
 
+kfunc_failed:
+	rtnl_link_unregister(&xfrmi_link_ops);
 rtnl_link_failed:
 	xfrmi6_fini();
 xfrmi6_failed:
@@ -1183,6 +1197,7 @@ static void __exit xfrmi_fini(void)
 {
 	xfrm_if_unregister_cb();
 	lwtunnel_encap_del_ops(&xfrmi_encap_ops, LWTUNNEL_ENCAP_XFRM);
+	cleanup_xfrm_interface_bpf();
 	rtnl_link_unregister(&xfrmi_link_ops);
 	xfrmi4_fini();
 	xfrmi6_fini();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests
  2022-11-29 13:20 [PATCH ipsec-next,v2 0/3] xfrm: interface: Add unstable helpers for XFRM metadata Eyal Birger
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 1/3] xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c Eyal Birger
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF Eyal Birger
@ 2022-11-29 13:20 ` Eyal Birger
  2022-11-30 18:41   ` Martin KaFai Lau
  2 siblings, 1 reply; 17+ messages in thread
From: Eyal Birger @ 2022-11-29 13:20 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, steffen.klassert, herbert, andrii,
	daniel, nicolas.dichtel, razor, mykolal, ast, martin.lau, song,
	yhs, john.fastabend, kpsingh, sdf, haoluo, jolsa, shuah
  Cc: netdev, bpf, linux-kselftest, Eyal Birger

Test the xfrm_info kfunc helpers.

Note: the tests require support for xfrmi "external" mode in iproute2.

The test setup creates three name spaces - NS0, NS1, NS2.

XFRM tunnels are setup between NS0 and the two other NSs.

The kfunc helpers are used to steer traffic from NS0 to the other
NSs based on a userspace populated map and validate that the
return traffic had arrived from the desired NS.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>

---

v2:
  - use an lwt route in NS1 for testing that flow as well
  - indendation fix
---
 tools/testing/selftests/bpf/config            |   2 +
 .../selftests/bpf/prog_tests/test_xfrm_info.c | 343 ++++++++++++++++++
 .../selftests/bpf/progs/test_xfrm_info_kern.c |  74 ++++
 3 files changed, 419 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c

diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 9213565c0311..9f39943d6ebd 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -20,6 +20,7 @@ CONFIG_IKCONFIG_PROC=y
 CONFIG_IMA=y
 CONFIG_IMA_READ_POLICY=y
 CONFIG_IMA_WRITE_POLICY=y
+CONFIG_INET_ESP=y
 CONFIG_IP_NF_FILTER=y
 CONFIG_IP_NF_RAW=y
 CONFIG_IP_NF_TARGET_SYNPROXY=y
@@ -71,3 +72,4 @@ CONFIG_TEST_BPF=y
 CONFIG_USERFAULTFD=y
 CONFIG_VXLAN=y
 CONFIG_XDP_SOCKETS=y
+CONFIG_XFRM_INTERFACE=y
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c b/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
new file mode 100644
index 000000000000..3aef72540934
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
@@ -0,0 +1,343 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+
+/*
+ * Topology:
+ * ---------
+ *   NS0 namespace         |   NS1 namespace        | NS2 namespace
+ *                         |                        |
+ *   +---------------+     |   +---------------+    |
+ *   |    ipsec0     |---------|    ipsec0     |    |
+ *   | 192.168.1.100 |     |   | 192.168.1.200 |    |
+ *   | if_id: bpf    |     |   +---------------+    |
+ *   +---------------+     |                        |
+ *           |             |                        |   +---------------+
+ *           |             |                        |   |    ipsec0     |
+ *           \------------------------------------------| 192.168.1.200 |
+ *                         |                        |   +---------------+
+ *                         |                        |
+ *                         |                        | (overlay network)
+ *      ------------------------------------------------------
+ *                         |                        | (underlay network)
+ *   +--------------+      |   +--------------+     |
+ *   |    veth01    |----------|    veth10    |     |
+ *   | 172.16.1.100 |      |   | 172.16.1.200 |     |
+ *   ---------------+      |   +--------------+     |
+ *                         |                        |
+ *   +--------------+      |                        |   +--------------+
+ *   |    veth02    |-----------------------------------|    veth20    |
+ *   | 172.16.2.100 |      |                        |   | 172.16.2.200 |
+ *   +--------------+      |                        |   +--------------+
+ *
+ *
+ * Test Packet flow
+ * -----------
+ *  The tests perform 'ping 192.168.1.200' from the NS0 namespace:
+ *  1) request is routed to NS0 ipsec0
+ *  2) NS0 ipsec0 tc egress BPF program is triggered and sets the if_id based
+ *     on a map value. This makes the ipsec0 device in external mode select the
+ *     destination tunnel
+ *  3) ping reaches the other namespace (NS1 or NS2 based on which if_id was
+ *     used) and response is sent
+ *  4) response is received on NS0 ipsec0, tc ingress program is triggered and
+ *     records the response if_id in the map
+ *  5) requested if_id is compared with received if_id
+ */
+
+#include <net/if.h>
+
+#include "test_progs.h"
+#include "network_helpers.h"
+#include "test_xfrm_info_kern.skel.h"
+
+#define NS0 "xfrm_test_ns0"
+#define NS1 "xfrm_test_ns1"
+#define NS2 "xfrm_test_ns2"
+
+#define IF_ID_0_TO_1 1
+#define IF_ID_0_TO_2 2
+#define IF_ID_1 3
+#define IF_ID_2 4
+
+#define IP4_ADDR_VETH01 "172.16.1.100"
+#define IP4_ADDR_VETH10 "172.16.1.200"
+#define IP4_ADDR_VETH02 "172.16.2.100"
+#define IP4_ADDR_VETH20 "172.16.2.200"
+
+#define ESP_DUMMY_PARAMS \
+    "proto esp aead 'rfc4106(gcm(aes))' " \
+    "0xe4d8f4b4da1df18a3510b3781496daa82488b713 128 mode tunnel "
+
+#define PING_ARGS "-i 0.01 -c 3 -w 10 -q"
+
+#define SYS(fmt, ...)						\
+	({							\
+		char cmd[1024];					\
+		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
+		if (!ASSERT_OK(system(cmd), cmd))		\
+			goto fail;				\
+	})
+
+#define SYS_NOFAIL(fmt, ...)					\
+	({							\
+		char cmd[1024];					\
+		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
+		system(cmd);					\
+	})
+
+static int attach_tc_prog(struct bpf_tc_hook *hook, int igr_fd, int egr_fd)
+{
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts1, .handle = 1,
+			    .priority = 1, .prog_fd = igr_fd);
+	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts2, .handle = 1,
+			    .priority = 1, .prog_fd = egr_fd);
+	int ret;
+
+	ret = bpf_tc_hook_create(hook);
+	if (!ASSERT_OK(ret, "create tc hook"))
+		return ret;
+
+	if (igr_fd >= 0) {
+		hook->attach_point = BPF_TC_INGRESS;
+		ret = bpf_tc_attach(hook, &opts1);
+		if (!ASSERT_OK(ret, "bpf_tc_attach")) {
+			bpf_tc_hook_destroy(hook);
+			return ret;
+		}
+	}
+
+	if (egr_fd >= 0) {
+		hook->attach_point = BPF_TC_EGRESS;
+		ret = bpf_tc_attach(hook, &opts2);
+		if (!ASSERT_OK(ret, "bpf_tc_attach")) {
+			bpf_tc_hook_destroy(hook);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static void cleanup(void)
+{
+	SYS_NOFAIL("test -f /var/run/netns/" NS0 " && ip netns delete " NS0);
+	SYS_NOFAIL("test -f /var/run/netns/" NS1 " && ip netns delete " NS1);
+	SYS_NOFAIL("test -f /var/run/netns/" NS2 " && ip netns delete " NS2);
+}
+
+static int config_underlay(void)
+{
+	SYS("ip netns add " NS0);
+	SYS("ip netns add " NS1);
+	SYS("ip netns add " NS2);
+
+	/* NS0 <-> NS1 [veth01 <-> veth10] */
+	SYS("ip link add veth01 netns " NS0 " type veth peer name veth10 netns " NS1);
+	SYS("ip -net " NS0 " addr add " IP4_ADDR_VETH01 "/24 dev veth01");
+	SYS("ip -net " NS0 " link set dev veth01 up");
+	SYS("ip -net " NS1 " addr add " IP4_ADDR_VETH10 "/24 dev veth10");
+	SYS("ip -net " NS1 " link set dev veth10 up");
+
+	/* NS0 <-> NS2 [veth02 <-> veth20] */
+	SYS("ip link add veth02 netns " NS0 " type veth peer name veth20 netns " NS2);
+	SYS("ip -net " NS0 " addr add " IP4_ADDR_VETH02 "/24 dev veth02");
+	SYS("ip -net " NS0 " link set dev veth02 up");
+	SYS("ip -net " NS2 " addr add " IP4_ADDR_VETH20 "/24 dev veth20");
+	SYS("ip -net " NS2 " link set dev veth20 up");
+
+	return 0;
+fail:
+	return -1;
+}
+
+static int setup_xfrm_tunnel_ns(const char *ns, const char *ipv4_local,
+				const char *ipv4_remote, int if_id)
+{
+	/* State: local -> remote */
+	SYS("ip -net %s xfrm state add src %s dst %s spi 1 "
+	    ESP_DUMMY_PARAMS "if_id %d", ns, ipv4_local, ipv4_remote, if_id);
+
+	/* State: local <- remote */
+	SYS("ip -net %s xfrm state add src %s dst %s spi 1 "
+	    ESP_DUMMY_PARAMS "if_id %d", ns, ipv4_remote, ipv4_local, if_id);
+
+	/* Policy: local -> remote */
+	SYS("ip -net %s xfrm policy add dir out src 0.0.0.0/0 dst 0.0.0.0/0 "
+	    "if_id %d tmpl src %s dst %s proto esp mode tunnel if_id %d", ns,
+	    if_id, ipv4_local, ipv4_remote, if_id);
+
+	/* Policy: local <- remote */
+	SYS("ip -net %s xfrm policy add dir in src 0.0.0.0/0 dst 0.0.0.0/0 "
+	    "if_id %d tmpl src %s dst %s proto esp mode tunnel if_id %d", ns,
+	    if_id, ipv4_remote, ipv4_local, if_id);
+
+	return 0;
+fail:
+	return -1;
+}
+
+static int setup_xfrm_tunnel(const char *ns_a, const char *ns_b,
+			     const char *ipv4_a, const char *ipv4_b,
+			     int if_id_a, int if_id_b)
+{
+	return setup_xfrm_tunnel_ns(ns_a, ipv4_a, ipv4_b, if_id_a) ||
+		setup_xfrm_tunnel_ns(ns_b, ipv4_b, ipv4_a, if_id_b);
+}
+
+static int config_overlay(void)
+{
+	if (setup_xfrm_tunnel(NS0, NS1, IP4_ADDR_VETH01, IP4_ADDR_VETH10,
+			      IF_ID_0_TO_1, IF_ID_1))
+		goto fail;
+	if (setup_xfrm_tunnel(NS0, NS2, IP4_ADDR_VETH02, IP4_ADDR_VETH20,
+			      IF_ID_0_TO_2, IF_ID_2))
+		goto fail;
+
+	SYS("ip -net " NS0 " link add ipsec0 type xfrm external");
+	SYS("ip -net " NS0 " addr add 192.168.1.100/24 dev ipsec0");
+	SYS("ip -net " NS0 " link set dev ipsec0 up");
+
+	SYS("ip -net " NS1 " link add ipsec0 type xfrm external");
+	SYS("ip -net " NS1 " addr add 192.168.1.200/32 dev ipsec0");
+	SYS("ip -net " NS1 " link set dev ipsec0 up");
+	SYS("ip -net " NS1 " route add 192.168.1.100/32 dev ipsec0 encap xfrm if_id %d", IF_ID_1);
+
+	SYS("ip -net " NS2 " link add ipsec0 type xfrm if_id %d", IF_ID_2);
+	SYS("ip -net " NS2 " addr add 192.168.1.200/24 dev ipsec0");
+	SYS("ip -net " NS2 " link set dev ipsec0 up");
+
+	return 0;
+fail:
+	return -1;
+}
+
+static int test_ping(int family, const char *addr)
+{
+	SYS("%s %s %s > /dev/null", ping_command(family), PING_ARGS, addr);
+	return 0;
+fail:
+	return -1;
+}
+
+static int test_xfrm_ping(int dst_if_id_map_fd, u32 if_id)
+{
+	u32 dst_if_id;
+	int key, err;
+
+	key = 0;
+	dst_if_id = if_id;
+	err = bpf_map_update_elem(dst_if_id_map_fd, &key, &dst_if_id, BPF_ANY);
+	if (!ASSERT_OK(err, "update bpf dst_if_id_map"))
+		return -1;
+
+	if (test_ping(AF_INET, "192.168.1.200"))
+		return -1;
+
+	key = 1;
+	dst_if_id = 0;
+	err = bpf_map_lookup_elem(dst_if_id_map_fd, &key, &dst_if_id);
+	if (!ASSERT_OK(err, "lookup bpf dst_if_id_map"))
+		return -1;
+
+	if (!ASSERT_EQ(dst_if_id, if_id, "if_id"))
+		return -1;
+
+	return 0;
+}
+
+static void test_xfrm_info(void)
+{
+	int get_xfrm_info_prog_fd, set_xfrm_info_prog_fd;
+	struct test_xfrm_info_kern *skel = NULL;
+	struct nstoken *nstoken = NULL;
+	int dst_if_id_map_fd = -1;
+	int ifindex = -1;
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
+			    .attach_point = BPF_TC_INGRESS);
+
+	/* load and attach bpf progs to ipsec dev tc hook point */
+	skel = test_xfrm_info_kern__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_xfrm_info_kern__open_and_load"))
+		goto done;
+	nstoken = open_netns(NS0);
+	ifindex = if_nametoindex("ipsec0");
+	if (!ASSERT_NEQ(ifindex, 0, "ipsec0 ifindex"))
+		goto done;
+	tc_hook.ifindex = ifindex;
+	set_xfrm_info_prog_fd = bpf_program__fd(skel->progs.set_xfrm_info);
+	get_xfrm_info_prog_fd = bpf_program__fd(skel->progs.get_xfrm_info);
+	if (!ASSERT_GE(set_xfrm_info_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (!ASSERT_GE(get_xfrm_info_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (attach_tc_prog(&tc_hook, get_xfrm_info_prog_fd,
+			   set_xfrm_info_prog_fd))
+		goto done;
+	dst_if_id_map_fd = bpf_map__fd(skel->maps.dst_if_id_map);
+	if (!ASSERT_GE(dst_if_id_map_fd, 0, "bpf_map__fd"))
+		goto done;
+
+	if (!ASSERT_EQ(test_xfrm_ping(dst_if_id_map_fd, IF_ID_0_TO_1), 0,
+		       "ping " NS1))
+		goto done;
+	if (!ASSERT_EQ(test_xfrm_ping(dst_if_id_map_fd, IF_ID_0_TO_2), 0,
+		       "ping " NS2))
+		goto done;
+
+done:
+	if (nstoken)
+		close_netns(nstoken);
+	if (dst_if_id_map_fd >= 0)
+		close(dst_if_id_map_fd);
+	if (skel)
+		test_xfrm_info_kern__destroy(skel);
+}
+
+#define RUN_TEST(name)							\
+	({								\
+		if (test__start_subtest(#name)) {			\
+			test_ ## name();				\
+		}							\
+	})
+
+static void *test_xfrm_info_run_tests(void *arg)
+{
+	cleanup();
+
+	config_underlay();
+	config_overlay();
+
+	RUN_TEST(xfrm_info);
+
+	cleanup();
+
+	return NULL;
+}
+
+static int probe_iproute2(void)
+{
+	if (SYS_NOFAIL("ip link add type xfrm help 2>&1 | "
+		       "grep external > /dev/null")) {
+		fprintf(stdout, "%s:SKIP: iproute2 with xfrm external support needed for this test\n", __func__);
+		return -1;
+	}
+	return 0;
+}
+
+void serial_test_xfrm_info(void)
+{
+	pthread_t test_thread;
+	int err;
+
+	if (probe_iproute2()) {
+		test__skip();
+		return;
+	}
+
+	/* Run the tests in their own thread to isolate the namespace changes
+	 * so they do not affect the environment of other tests.
+	 * (specifically needed because of unshare(CLONE_NEWNS) in open_netns())
+	 */
+	err = pthread_create(&test_thread, NULL, &test_xfrm_info_run_tests, NULL);
+	if (ASSERT_OK(err, "pthread_create"))
+		ASSERT_OK(pthread_join(test_thread, NULL), "pthread_join");
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c b/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c
new file mode 100644
index 000000000000..98991a83c1e9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <bpf/bpf_helpers.h>
+
+#define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 2);
+	__type(key, __u32);
+	__type(value, __u32);
+} dst_if_id_map SEC(".maps");
+
+struct bpf_xfrm_info {
+	__u32 if_id;
+	int link;
+};
+
+int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
+			  const struct bpf_xfrm_info *from) __ksym;
+int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx,
+			  struct bpf_xfrm_info *to) __ksym;
+
+SEC("tc")
+int set_xfrm_info(struct __sk_buff *skb)
+{
+	struct bpf_xfrm_info info = {};
+	__u32 *if_id = NULL;
+	__u32 index = 0;
+	int ret = -1;
+
+	if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
+	if (!if_id) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	info.if_id = *if_id;
+	ret = bpf_skb_set_xfrm_info(skb, &info);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_UNSPEC;
+}
+
+SEC("tc")
+int get_xfrm_info(struct __sk_buff *skb)
+{
+	struct bpf_xfrm_info info = {};
+	__u32 *if_id = NULL;
+	__u32 index = 1;
+	int ret = -1;
+
+	if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
+	if (!if_id) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_xfrm_info(skb, &info);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	*if_id = info.if_id;
+
+	return TC_ACT_UNSPEC;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF Eyal Birger
@ 2022-11-30 18:14   ` Martin KaFai Lau
  2022-12-01  5:55     ` Eyal Birger
  0 siblings, 1 reply; 17+ messages in thread
From: Martin KaFai Lau @ 2022-11-30 18:14 UTC (permalink / raw)
  To: Eyal Birger
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On 11/29/22 5:20 AM, Eyal Birger wrote:
> diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c
> new file mode 100644
> index 000000000000..757e15857dbf
> --- /dev/null
> +++ b/net/xfrm/xfrm_interface_bpf.c
> @@ -0,0 +1,100 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Unstable XFRM Helpers for TC-BPF hook
> + *
> + * These are called from SCHED_CLS BPF programs. Note that it is
> + * allowed to break compatibility for these functions since the interface they
> + * are exposed through to BPF programs is explicitly unstable.
> + */
> +
> +#include <linux/bpf.h>
> +#include <linux/btf_ids.h>
> +
> +#include <net/dst_metadata.h>
> +#include <net/xfrm.h>
> +
> +struct bpf_xfrm_info {
No need to introduce a bpf variant of the "struct xfrm_md_info" (more on this 
later).

> +	u32 if_id;
> +	int link;
> +};
> +
> +static struct metadata_dst __percpu *xfrm_md_dst;
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> +		  "Global functions as their definitions will be in xfrm_interface BTF");
> +
> +__used noinline
> +int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to)

This kfunc is not needed.  It only reads the skb->_skb_refdst.  The new kfunc 
bpf_rdonly_cast() can be used.  Take a look at the bpf_rdonly_cast() usages in 
the selftests/bpf/progs/type_cast.c.  It was in bpf-next only but should also be 
in net-next now.

> +{
> +	struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> +	struct xfrm_md_info *info;
> +
> +	memset(to, 0, sizeof(*to));
> +
> +	info = skb_xfrm_md_info(skb);
> +	if (!info)
> +		return -EINVAL;
> +
> +	to->if_id = info->if_id;
> +	to->link = info->link;
> +	return 0;
> +}
> +
> +__used noinline
> +int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
> +			  const struct bpf_xfrm_info *from)

Directly use "const struct xfrm_md_info *from" instead.  This kfunc can check 
from->dst_orig != NULL and return -EINVAL.  It will then have a consistent API 
with the bpf_rdonly_cast() mentioned above.

> +{
> +	struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> +	struct metadata_dst *md_dst;
> +	struct xfrm_md_info *info;
> +
> +	if (unlikely(skb_metadata_dst(skb)))
> +		return -EINVAL;
> +
> +	md_dst = this_cpu_ptr(xfrm_md_dst);
> +
> +	info = &md_dst->u.xfrm_info;
> +	memset(info, 0, sizeof(*info));

Unnecessary memset here.  Everything should have been initialized below. 
bpf_skb_set_tunnel_key() needs memset but not here.

> +
> +	info->if_id = from->if_id;
> +	info->link = from->link;
> +	skb_dst_force(skb);
> +	info->dst_orig = skb_dst(skb);
> +
> +	dst_hold((struct dst_entry *)md_dst);
> +	skb_dst_set(skb, (struct dst_entry *)md_dst);
> +	return 0;
> +}
> +
> +__diag_pop()
> +
> +BTF_SET8_START(xfrm_ifc_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_skb_get_xfrm_info)
> +BTF_ID_FLAGS(func, bpf_skb_set_xfrm_info)
> +BTF_SET8_END(xfrm_ifc_kfunc_set)
> +
> +static const struct btf_kfunc_id_set xfrm_interface_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set   = &xfrm_ifc_kfunc_set,
> +};
> +
> +int __init register_xfrm_interface_bpf(void)
> +{
> +	int err;
> +
> +	xfrm_md_dst = metadata_dst_alloc_percpu(0, METADATA_XFRM,
> +						GFP_KERNEL);

May be DEFINE_PER_CPU() instead?

> +	if (!xfrm_md_dst)
> +		return -ENOMEM;
> +	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS,
> +					&xfrm_interface_kfunc_set);
> +	if (err < 0) {
> +		cleanup_xfrm_interface_bpf();
> +		return err;
> +	}
> +	return 0;
> +}
> +
> +void __exit cleanup_xfrm_interface_bpf(void)
> +{
> +	metadata_dst_free_percpu(xfrm_md_dst);
> +}


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests
  2022-11-29 13:20 ` [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests Eyal Birger
@ 2022-11-30 18:41   ` Martin KaFai Lau
  2022-11-30 18:48     ` Martin KaFai Lau
  2022-12-01  5:34     ` Eyal Birger
  0 siblings, 2 replies; 17+ messages in thread
From: Martin KaFai Lau @ 2022-11-30 18:41 UTC (permalink / raw)
  To: Eyal Birger
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On 11/29/22 5:20 AM, Eyal Birger wrote:
> Test the xfrm_info kfunc helpers.
> 
> Note: the tests require support for xfrmi "external" mode in iproute2.
> 
> The test setup creates three name spaces - NS0, NS1, NS2.
> 
> XFRM tunnels are setup between NS0 and the two other NSs.
> 
> The kfunc helpers are used to steer traffic from NS0 to the other
> NSs based on a userspace populated map and validate that the
> return traffic had arrived from the desired NS.
> 
> Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
> 
> ---
> 
> v2:
>    - use an lwt route in NS1 for testing that flow as well
>    - indendation fix
> ---
>   tools/testing/selftests/bpf/config            |   2 +
>   .../selftests/bpf/prog_tests/test_xfrm_info.c | 343 ++++++++++++++++++
>   .../selftests/bpf/progs/test_xfrm_info_kern.c |  74 ++++
>   3 files changed, 419 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c
> 
> diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> index 9213565c0311..9f39943d6ebd 100644
> --- a/tools/testing/selftests/bpf/config
> +++ b/tools/testing/selftests/bpf/config
> @@ -20,6 +20,7 @@ CONFIG_IKCONFIG_PROC=y
>   CONFIG_IMA=y
>   CONFIG_IMA_READ_POLICY=y
>   CONFIG_IMA_WRITE_POLICY=y
> +CONFIG_INET_ESP=y
>   CONFIG_IP_NF_FILTER=y
>   CONFIG_IP_NF_RAW=y
>   CONFIG_IP_NF_TARGET_SYNPROXY=y
> @@ -71,3 +72,4 @@ CONFIG_TEST_BPF=y
>   CONFIG_USERFAULTFD=y
>   CONFIG_VXLAN=y
>   CONFIG_XDP_SOCKETS=y
> +CONFIG_XFRM_INTERFACE=y
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c b/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
> new file mode 100644
> index 000000000000..3aef72540934
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c

Nit. Just xfrm_info.c

> @@ -0,0 +1,343 @@
> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> +
> +/*
> + * Topology:
> + * ---------
> + *   NS0 namespace         |   NS1 namespace        | NS2 namespace
> + *                         |                        |
> + *   +---------------+     |   +---------------+    |
> + *   |    ipsec0     |---------|    ipsec0     |    |
> + *   | 192.168.1.100 |     |   | 192.168.1.200 |    |
> + *   | if_id: bpf    |     |   +---------------+    |
> + *   +---------------+     |                        |
> + *           |             |                        |   +---------------+
> + *           |             |                        |   |    ipsec0     |
> + *           \------------------------------------------| 192.168.1.200 |
> + *                         |                        |   +---------------+
> + *                         |                        |
> + *                         |                        | (overlay network)
> + *      ------------------------------------------------------
> + *                         |                        | (underlay network)
> + *   +--------------+      |   +--------------+     |
> + *   |    veth01    |----------|    veth10    |     |
> + *   | 172.16.1.100 |      |   | 172.16.1.200 |     |
> + *   ---------------+      |   +--------------+     |
> + *                         |                        |
> + *   +--------------+      |                        |   +--------------+
> + *   |    veth02    |-----------------------------------|    veth20    |
> + *   | 172.16.2.100 |      |                        |   | 172.16.2.200 |
> + *   +--------------+      |                        |   +--------------+
> + *
> + *
> + * Test Packet flow
> + * -----------
> + *  The tests perform 'ping 192.168.1.200' from the NS0 namespace:
> + *  1) request is routed to NS0 ipsec0
> + *  2) NS0 ipsec0 tc egress BPF program is triggered and sets the if_id based
> + *     on a map value. This makes the ipsec0 device in external mode select the
> + *     destination tunnel
> + *  3) ping reaches the other namespace (NS1 or NS2 based on which if_id was
> + *     used) and response is sent
> + *  4) response is received on NS0 ipsec0, tc ingress program is triggered and
> + *     records the response if_id in the map
> + *  5) requested if_id is compared with received if_id
> + */
> +
> +#include <net/if.h>
> +
> +#include "test_progs.h"
> +#include "network_helpers.h"
> +#include "test_xfrm_info_kern.skel.h"
> +
> +#define NS0 "xfrm_test_ns0"
> +#define NS1 "xfrm_test_ns1"
> +#define NS2 "xfrm_test_ns2"
> +
> +#define IF_ID_0_TO_1 1
> +#define IF_ID_0_TO_2 2
> +#define IF_ID_1 3
> +#define IF_ID_2 4
> +
> +#define IP4_ADDR_VETH01 "172.16.1.100"
> +#define IP4_ADDR_VETH10 "172.16.1.200"
> +#define IP4_ADDR_VETH02 "172.16.2.100"
> +#define IP4_ADDR_VETH20 "172.16.2.200"
> +
> +#define ESP_DUMMY_PARAMS \
> +    "proto esp aead 'rfc4106(gcm(aes))' " \
> +    "0xe4d8f4b4da1df18a3510b3781496daa82488b713 128 mode tunnel "
> +
> +#define PING_ARGS "-i 0.01 -c 3 -w 10 -q"
> +
> +#define SYS(fmt, ...)						\
> +	({							\
> +		char cmd[1024];					\
> +		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
> +		if (!ASSERT_OK(system(cmd), cmd))		\
> +			goto fail;				\
> +	})
> +
> +#define SYS_NOFAIL(fmt, ...)					\
> +	({							\
> +		char cmd[1024];					\
> +		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
> +		system(cmd);					\
> +	})
> +
> +static int attach_tc_prog(struct bpf_tc_hook *hook, int igr_fd, int egr_fd)
> +{
> +	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts1, .handle = 1,
> +			    .priority = 1, .prog_fd = igr_fd);
> +	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts2, .handle = 1,
> +			    .priority = 1, .prog_fd = egr_fd);
> +	int ret;
> +
> +	ret = bpf_tc_hook_create(hook);
> +	if (!ASSERT_OK(ret, "create tc hook"))
> +		return ret;
> +
> +	if (igr_fd >= 0) {
> +		hook->attach_point = BPF_TC_INGRESS;
> +		ret = bpf_tc_attach(hook, &opts1);
> +		if (!ASSERT_OK(ret, "bpf_tc_attach")) {
> +			bpf_tc_hook_destroy(hook);
> +			return ret;
> +		}
> +	}
> +
> +	if (egr_fd >= 0) {
> +		hook->attach_point = BPF_TC_EGRESS;
> +		ret = bpf_tc_attach(hook, &opts2);
> +		if (!ASSERT_OK(ret, "bpf_tc_attach")) {
> +			bpf_tc_hook_destroy(hook);
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static void cleanup(void)
> +{
> +	SYS_NOFAIL("test -f /var/run/netns/" NS0 " && ip netns delete " NS0);
> +	SYS_NOFAIL("test -f /var/run/netns/" NS1 " && ip netns delete " NS1);
> +	SYS_NOFAIL("test -f /var/run/netns/" NS2 " && ip netns delete " NS2);
> +}
> +
> +static int config_underlay(void)
> +{
> +	SYS("ip netns add " NS0);
> +	SYS("ip netns add " NS1);
> +	SYS("ip netns add " NS2);
> +
> +	/* NS0 <-> NS1 [veth01 <-> veth10] */
> +	SYS("ip link add veth01 netns " NS0 " type veth peer name veth10 netns " NS1);
> +	SYS("ip -net " NS0 " addr add " IP4_ADDR_VETH01 "/24 dev veth01");
> +	SYS("ip -net " NS0 " link set dev veth01 up");
> +	SYS("ip -net " NS1 " addr add " IP4_ADDR_VETH10 "/24 dev veth10");
> +	SYS("ip -net " NS1 " link set dev veth10 up");
> +
> +	/* NS0 <-> NS2 [veth02 <-> veth20] */
> +	SYS("ip link add veth02 netns " NS0 " type veth peer name veth20 netns " NS2);
> +	SYS("ip -net " NS0 " addr add " IP4_ADDR_VETH02 "/24 dev veth02");
> +	SYS("ip -net " NS0 " link set dev veth02 up");
> +	SYS("ip -net " NS2 " addr add " IP4_ADDR_VETH20 "/24 dev veth20");
> +	SYS("ip -net " NS2 " link set dev veth20 up");
> +
> +	return 0;
> +fail:
> +	return -1;
> +}
> +
> +static int setup_xfrm_tunnel_ns(const char *ns, const char *ipv4_local,
> +				const char *ipv4_remote, int if_id)
> +{
> +	/* State: local -> remote */
> +	SYS("ip -net %s xfrm state add src %s dst %s spi 1 "
> +	    ESP_DUMMY_PARAMS "if_id %d", ns, ipv4_local, ipv4_remote, if_id);
> +
> +	/* State: local <- remote */
> +	SYS("ip -net %s xfrm state add src %s dst %s spi 1 "
> +	    ESP_DUMMY_PARAMS "if_id %d", ns, ipv4_remote, ipv4_local, if_id);
> +
> +	/* Policy: local -> remote */
> +	SYS("ip -net %s xfrm policy add dir out src 0.0.0.0/0 dst 0.0.0.0/0 "
> +	    "if_id %d tmpl src %s dst %s proto esp mode tunnel if_id %d", ns,
> +	    if_id, ipv4_local, ipv4_remote, if_id);
> +
> +	/* Policy: local <- remote */
> +	SYS("ip -net %s xfrm policy add dir in src 0.0.0.0/0 dst 0.0.0.0/0 "
> +	    "if_id %d tmpl src %s dst %s proto esp mode tunnel if_id %d", ns,
> +	    if_id, ipv4_remote, ipv4_local, if_id);
> +
> +	return 0;
> +fail:
> +	return -1;
> +}
> +
> +static int setup_xfrm_tunnel(const char *ns_a, const char *ns_b,
> +			     const char *ipv4_a, const char *ipv4_b,
> +			     int if_id_a, int if_id_b)
> +{
> +	return setup_xfrm_tunnel_ns(ns_a, ipv4_a, ipv4_b, if_id_a) ||
> +		setup_xfrm_tunnel_ns(ns_b, ipv4_b, ipv4_a, if_id_b);
> +}
> +
> +static int config_overlay(void)
> +{
> +	if (setup_xfrm_tunnel(NS0, NS1, IP4_ADDR_VETH01, IP4_ADDR_VETH10,
> +			      IF_ID_0_TO_1, IF_ID_1))
> +		goto fail;
> +	if (setup_xfrm_tunnel(NS0, NS2, IP4_ADDR_VETH02, IP4_ADDR_VETH20,
> +			      IF_ID_0_TO_2, IF_ID_2))
> +		goto fail;
> +
> +	SYS("ip -net " NS0 " link add ipsec0 type xfrm external");
> +	SYS("ip -net " NS0 " addr add 192.168.1.100/24 dev ipsec0");
> +	SYS("ip -net " NS0 " link set dev ipsec0 up");
> +
> +	SYS("ip -net " NS1 " link add ipsec0 type xfrm external");
> +	SYS("ip -net " NS1 " addr add 192.168.1.200/32 dev ipsec0");
> +	SYS("ip -net " NS1 " link set dev ipsec0 up");
> +	SYS("ip -net " NS1 " route add 192.168.1.100/32 dev ipsec0 encap xfrm if_id %d", IF_ID_1);
> +
> +	SYS("ip -net " NS2 " link add ipsec0 type xfrm if_id %d", IF_ID_2);
> +	SYS("ip -net " NS2 " addr add 192.168.1.200/24 dev ipsec0");
> +	SYS("ip -net " NS2 " link set dev ipsec0 up");
> +
> +	return 0;
> +fail:
> +	return -1;
> +}
> +
> +static int test_ping(int family, const char *addr)
> +{
> +	SYS("%s %s %s > /dev/null", ping_command(family), PING_ARGS, addr);
> +	return 0;
> +fail:
> +	return -1;
> +}
> +
> +static int test_xfrm_ping(int dst_if_id_map_fd, u32 if_id)
> +{
> +	u32 dst_if_id;
> +	int key, err;
> +
> +	key = 0;
> +	dst_if_id = if_id;
> +	err = bpf_map_update_elem(dst_if_id_map_fd, &key, &dst_if_id, BPF_ANY);
> +	if (!ASSERT_OK(err, "update bpf dst_if_id_map"))
> +		return -1;
> +
> +	if (test_ping(AF_INET, "192.168.1.200"))
> +		return -1;
> +
> +	key = 1;
> +	dst_if_id = 0;
> +	err = bpf_map_lookup_elem(dst_if_id_map_fd, &key, &dst_if_id);
> +	if (!ASSERT_OK(err, "lookup bpf dst_if_id_map"))
> +		return -1;
> +
> +	if (!ASSERT_EQ(dst_if_id, if_id, "if_id"))
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static void test_xfrm_info(void)
> +{
> +	int get_xfrm_info_prog_fd, set_xfrm_info_prog_fd;
> +	struct test_xfrm_info_kern *skel = NULL;
> +	struct nstoken *nstoken = NULL;
> +	int dst_if_id_map_fd = -1;
> +	int ifindex = -1;
> +	DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
> +			    .attach_point = BPF_TC_INGRESS);
> +
> +	/* load and attach bpf progs to ipsec dev tc hook point */
> +	skel = test_xfrm_info_kern__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "test_xfrm_info_kern__open_and_load"))
> +		goto done;
> +	nstoken = open_netns(NS0);
> +	ifindex = if_nametoindex("ipsec0");
> +	if (!ASSERT_NEQ(ifindex, 0, "ipsec0 ifindex"))
> +		goto done;
> +	tc_hook.ifindex = ifindex;
> +	set_xfrm_info_prog_fd = bpf_program__fd(skel->progs.set_xfrm_info);
> +	get_xfrm_info_prog_fd = bpf_program__fd(skel->progs.get_xfrm_info);
> +	if (!ASSERT_GE(set_xfrm_info_prog_fd, 0, "bpf_program__fd"))
> +		goto done;
> +	if (!ASSERT_GE(get_xfrm_info_prog_fd, 0, "bpf_program__fd"))
> +		goto done;
> +	if (attach_tc_prog(&tc_hook, get_xfrm_info_prog_fd,
> +			   set_xfrm_info_prog_fd))
> +		goto done;
> +	dst_if_id_map_fd = bpf_map__fd(skel->maps.dst_if_id_map);
> +	if (!ASSERT_GE(dst_if_id_map_fd, 0, "bpf_map__fd"))
> +		goto done;
> +
> +	if (!ASSERT_EQ(test_xfrm_ping(dst_if_id_map_fd, IF_ID_0_TO_1), 0,
> +		       "ping " NS1))
> +		goto done;
> +	if (!ASSERT_EQ(test_xfrm_ping(dst_if_id_map_fd, IF_ID_0_TO_2), 0,
> +		       "ping " NS2))
> +		goto done;
> +
> +done:
> +	if (nstoken)
> +		close_netns(nstoken);
> +	if (dst_if_id_map_fd >= 0)
> +		close(dst_if_id_map_fd);
> +	if (skel)
> +		test_xfrm_info_kern__destroy(skel);
> +}
> +
> +#define RUN_TEST(name)							\
> +	({								\
> +		if (test__start_subtest(#name)) {			\
> +			test_ ## name();				\
> +		}							\
> +	})
> +
> +static void *test_xfrm_info_run_tests(void *arg)
> +{
> +	cleanup();
> +
> +	config_underlay();
> +	config_overlay();

config_*() is returning ok/err but no error checking here.  Does it make sense 
to continue if the config_*() failed?

> +
> +	RUN_TEST(xfrm_info);

nit.  Remove this macro indirection.  There is only one test.

> +
> +	cleanup();
> +
> +	return NULL;
> +}
> +
> +static int probe_iproute2(void)
> +{
> +	if (SYS_NOFAIL("ip link add type xfrm help 2>&1 | "
> +		       "grep external > /dev/null")) {
> +		fprintf(stdout, "%s:SKIP: iproute2 with xfrm external support needed for this test\n", __func__);

Unfortunately, the BPF CI iproute2 does not have this support also :(
I am worry it will just stay SKIP for some time and rot.  Can you try to 
directly use netlink here?

https://github.com/kernel-patches/bpf/actions/runs/3578467213/jobs/6019370754#step:6:6395

> +		return -1;
> +	}
> +	return 0;
> +}
> +
> +void serial_test_xfrm_info(void)

Remove "serial_".  New test must be able to run in parallel ("./test_progs -j").

> +{
> +	pthread_t test_thread;
> +	int err;
> +
> +	if (probe_iproute2()) {
> +		test__skip();
> +		return;
> +	}
> +
> +	/* Run the tests in their own thread to isolate the namespace changes
> +	 * so they do not affect the environment of other tests.
> +	 * (specifically needed because of unshare(CLONE_NEWNS) in open_netns())
> +	 */

I think this comment is mostly inherited from other tests (eg. tc_redirect.c) 
but the pthread dance is actually unnecessary.  The test_progs.c will restore 
the original netns before running the next test.  I am abort to remove this from 
the tc_redirect.c also.  Please also avoid this pthread create here.

> +	err = pthread_create(&test_thread, NULL, &test_xfrm_info_run_tests, NULL);
> +	if (ASSERT_OK(err, "pthread_create"))
> +		ASSERT_OK(pthread_join(test_thread, NULL), "pthread_join");
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c b/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c
> new file mode 100644
> index 000000000000..98991a83c1e9
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c


Nit. Same here. Just xfrm_info.c.

> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/bpf.h>
> +#include <linux/pkt_cls.h>
> +#include <bpf/bpf_helpers.h>
> +
> +#define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)

Please try not to use unnecessary bpf_printk().  BPF CI is not capturing it also.

> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 2);
> +	__type(key, __u32);
> +	__type(value, __u32);
> +} dst_if_id_map SEC(".maps");

It is easier to use global variables instead of a map.

> +
> +struct bpf_xfrm_info {
> +	__u32 if_id;
> +	int link;
> +};

This needs __attribute__((preserve_access_index) for CO-RE.
It is easier to just include vmlinux.h to get the struct xfrm_md_info { ... }.

> +
> +int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
> +			  const struct bpf_xfrm_info *from) __ksym;
> +int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx,
> +			  struct bpf_xfrm_info *to) __ksym;
> +
> +SEC("tc")
> +int set_xfrm_info(struct __sk_buff *skb)
> +{
> +	struct bpf_xfrm_info info = {};
> +	__u32 *if_id = NULL;
> +	__u32 index = 0;
> +	int ret = -1;
> +
> +	if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
> +	if (!if_id) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	info.if_id = *if_id;
> +	ret = bpf_skb_set_xfrm_info(skb, &info);
> +	if (ret < 0) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	return TC_ACT_UNSPEC;
> +}
> +
> +SEC("tc")
> +int get_xfrm_info(struct __sk_buff *skb)
> +{
> +	struct bpf_xfrm_info info = {};
> +	__u32 *if_id = NULL;
> +	__u32 index = 1;
> +	int ret = -1;
> +
> +	if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
> +	if (!if_id) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	ret = bpf_skb_get_xfrm_info(skb, &info);
> +	if (ret < 0) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	*if_id = info.if_id;
> +
> +	return TC_ACT_UNSPEC;
> +}
> +
> +char _license[] SEC("license") = "GPL";


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests
  2022-11-30 18:41   ` Martin KaFai Lau
@ 2022-11-30 18:48     ` Martin KaFai Lau
  2022-12-01  5:34     ` Eyal Birger
  1 sibling, 0 replies; 17+ messages in thread
From: Martin KaFai Lau @ 2022-11-30 18:48 UTC (permalink / raw)
  To: Eyal Birger
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On 11/30/22 10:41 AM, Martin KaFai Lau wrote:
>> +static int probe_iproute2(void)
>> +{
>> +    if (SYS_NOFAIL("ip link add type xfrm help 2>&1 | "
>> +               "grep external > /dev/null")) {
>> +        fprintf(stdout, "%s:SKIP: iproute2 with xfrm external support needed 
>> for this test\n", __func__);
> 
> Unfortunately, the BPF CI iproute2 does not have this support also :(
> I am worry it will just stay SKIP for some time and rot.  Can you try to 
> directly use netlink here?

To be clear, I meant directly use netlink only for the xfrm link creation

> 
> https://github.com/kernel-patches/bpf/actions/runs/3578467213/jobs/6019370754#step:6:6395


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests
  2022-11-30 18:41   ` Martin KaFai Lau
  2022-11-30 18:48     ` Martin KaFai Lau
@ 2022-12-01  5:34     ` Eyal Birger
  2022-12-01  7:33       ` Martin KaFai Lau
  2022-12-01 20:26       ` Martin KaFai Lau
  1 sibling, 2 replies; 17+ messages in thread
From: Eyal Birger @ 2022-12-01  5:34 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

Hi Martin,

On Wed, Nov 30, 2022 at 8:41 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 11/29/22 5:20 AM, Eyal Birger wrote:
> > Test the xfrm_info kfunc helpers.
> >
> > Note: the tests require support for xfrmi "external" mode in iproute2.
> >
> > The test setup creates three name spaces - NS0, NS1, NS2.
> >
> > XFRM tunnels are setup between NS0 and the two other NSs.
> >
> > The kfunc helpers are used to steer traffic from NS0 to the other
> > NSs based on a userspace populated map and validate that the
> > return traffic had arrived from the desired NS.
> >
> > Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
> >
> > ---
> >
> > v2:
> >    - use an lwt route in NS1 for testing that flow as well
> >    - indendation fix
> > ---
> >   tools/testing/selftests/bpf/config            |   2 +
> >   .../selftests/bpf/prog_tests/test_xfrm_info.c | 343 ++++++++++++++++++
> >   .../selftests/bpf/progs/test_xfrm_info_kern.c |  74 ++++
> >   3 files changed, 419 insertions(+)
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
> >   create mode 100644 tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c
> >
> > diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
> > index 9213565c0311..9f39943d6ebd 100644
> > --- a/tools/testing/selftests/bpf/config
> > +++ b/tools/testing/selftests/bpf/config
> > @@ -20,6 +20,7 @@ CONFIG_IKCONFIG_PROC=y
> >   CONFIG_IMA=y
> >   CONFIG_IMA_READ_POLICY=y
> >   CONFIG_IMA_WRITE_POLICY=y
> > +CONFIG_INET_ESP=y
> >   CONFIG_IP_NF_FILTER=y
> >   CONFIG_IP_NF_RAW=y
> >   CONFIG_IP_NF_TARGET_SYNPROXY=y
> > @@ -71,3 +72,4 @@ CONFIG_TEST_BPF=y
> >   CONFIG_USERFAULTFD=y
> >   CONFIG_VXLAN=y
> >   CONFIG_XDP_SOCKETS=y
> > +CONFIG_XFRM_INTERFACE=y
> > diff --git a/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c b/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
> > new file mode 100644
> > index 000000000000..3aef72540934
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/test_xfrm_info.c
>
> Nit. Just xfrm_info.c

Ok.

>
> > @@ -0,0 +1,343 @@
> > +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> > +
> > +/*
> > + * Topology:
> > + * ---------
> > + *   NS0 namespace         |   NS1 namespace        | NS2 namespace
> > + *                         |                        |
> > + *   +---------------+     |   +---------------+    |
> > + *   |    ipsec0     |---------|    ipsec0     |    |
> > + *   | 192.168.1.100 |     |   | 192.168.1.200 |    |
> > + *   | if_id: bpf    |     |   +---------------+    |
> > + *   +---------------+     |                        |
> > + *           |             |                        |   +---------------+
> > + *           |             |                        |   |    ipsec0     |
> > + *           \------------------------------------------| 192.168.1.200 |
> > + *                         |                        |   +---------------+
> > + *                         |                        |
> > + *                         |                        | (overlay network)
> > + *      ------------------------------------------------------
> > + *                         |                        | (underlay network)
> > + *   +--------------+      |   +--------------+     |
> > + *   |    veth01    |----------|    veth10    |     |
> > + *   | 172.16.1.100 |      |   | 172.16.1.200 |     |
> > + *   ---------------+      |   +--------------+     |
> > + *                         |                        |
> > + *   +--------------+      |                        |   +--------------+
> > + *   |    veth02    |-----------------------------------|    veth20    |
> > + *   | 172.16.2.100 |      |                        |   | 172.16.2.200 |
> > + *   +--------------+      |                        |   +--------------+
> > + *
> > + *
[...]
> > +
> > +#define RUN_TEST(name)                                                       \
> > +     ({                                                              \
> > +             if (test__start_subtest(#name)) {                       \
> > +                     test_ ## name();                                \
> > +             }                                                       \
> > +     })
> > +
> > +static void *test_xfrm_info_run_tests(void *arg)
> > +{
> > +     cleanup();
> > +
> > +     config_underlay();
> > +     config_overlay();
>
> config_*() is returning ok/err but no error checking here.  Does it make sense
> to continue if the config_*() failed?

I'll assert their success.

>
> > +
> > +     RUN_TEST(xfrm_info);
>
> nit.  Remove this macro indirection.  There is only one test.

Ok. I was considering other possible tests in the future, but this can
be added then.

>
> > +
> > +     cleanup();
> > +
> > +     return NULL;
> > +}
> > +
> > +static int probe_iproute2(void)
> > +{
> > +     if (SYS_NOFAIL("ip link add type xfrm help 2>&1 | "
> > +                    "grep external > /dev/null")) {
> > +             fprintf(stdout, "%s:SKIP: iproute2 with xfrm external support needed for this test\n", __func__);
>
> Unfortunately, the BPF CI iproute2 does not have this support also :(
> I am worry it will just stay SKIP for some time and rot.  Can you try to
> directly use netlink here?

Yeah, I wasn't sure if adding a libmnl (or alternative) dependency
was ok here, and also didn't want to copy all that nl logic here.
So I figured it would get there eventually.

I noticed libmnl is used by the nf tests, so maybe its inclusion isn't too
bad. Unless there's a better approach.

As you noted, I should add this for the "external" support. However, I don't
think adding the LWT route directly using nl is a good idea here so I'll
make the NS1 use a regular xfrmi.
>
> https://github.com/kernel-patches/bpf/actions/runs/3578467213/jobs/6019370754#step:6:6395
>
> > +             return -1;
> > +     }
> > +     return 0;
> > +}
> > +
> > +void serial_test_xfrm_info(void)
>
> Remove "serial_".  New test must be able to run in parallel ("./test_progs -j").

Ok.

>
> > +{
> > +     pthread_t test_thread;
> > +     int err;
> > +
> > +     if (probe_iproute2()) {
> > +             test__skip();
> > +             return;
> > +     }
> > +
> > +     /* Run the tests in their own thread to isolate the namespace changes
> > +      * so they do not affect the environment of other tests.
> > +      * (specifically needed because of unshare(CLONE_NEWNS) in open_netns())
> > +      */
>
> I think this comment is mostly inherited from other tests (eg. tc_redirect.c)
> but the pthread dance is actually unnecessary.  The test_progs.c will restore
> the original netns before running the next test.  I am abort to remove this from
> the tc_redirect.c also.  Please also avoid this pthread create here.

Ok. Indeed was inherited.

>
> > +     err = pthread_create(&test_thread, NULL, &test_xfrm_info_run_tests, NULL);
> > +     if (ASSERT_OK(err, "pthread_create"))
> > +             ASSERT_OK(pthread_join(test_thread, NULL), "pthread_join");
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c b/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c
> > new file mode 100644
> > index 000000000000..98991a83c1e9
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_xfrm_info_kern.c
>
>
> Nit. Same here. Just xfrm_info.c.

Ok.

>
> > @@ -0,0 +1,74 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <linux/bpf.h>
> > +#include <linux/pkt_cls.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +#define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
>
> Please try not to use unnecessary bpf_printk().  BPF CI is not capturing it also.

Ok.

>
> > +
> > +struct {
> > +     __uint(type, BPF_MAP_TYPE_ARRAY);
> > +     __uint(max_entries, 2);
> > +     __type(key, __u32);
> > +     __type(value, __u32);
> > +} dst_if_id_map SEC(".maps");
>
> It is easier to use global variables instead of a map.

Would these be available for read/write from the test application (as the
map is currently populated/read from userspace)?

>
> > +
> > +struct bpf_xfrm_info {
> > +     __u32 if_id;
> > +     int link;
> > +};
>
> This needs __attribute__((preserve_access_index) for CO-RE.
> It is easier to just include vmlinux.h to get the struct xfrm_md_info { ... }.

Ok.

>
> > +
> > +int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
> > +                       const struct bpf_xfrm_info *from) __ksym;
> > +int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx,
> > +                       struct bpf_xfrm_info *to) __ksym;
> > +
> > +SEC("tc")
> > +int set_xfrm_info(struct __sk_buff *skb)
> > +{
> > +     struct bpf_xfrm_info info = {};
> > +     __u32 *if_id = NULL;
> > +     __u32 index = 0;
> > +     int ret = -1;
> > +
> > +     if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
> > +     if (!if_id) {
> > +             log_err(ret);
> > +             return TC_ACT_SHOT;
> > +     }
> > +
> > +     info.if_id = *if_id;
> > +     ret = bpf_skb_set_xfrm_info(skb, &info);
> > +     if (ret < 0) {
> > +             log_err(ret);
> > +             return TC_ACT_SHOT;
> > +     }
> > +
> > +     return TC_ACT_UNSPEC;
> > +}
> > +
> > +SEC("tc")
> > +int get_xfrm_info(struct __sk_buff *skb)
> > +{
> > +     struct bpf_xfrm_info info = {};
> > +     __u32 *if_id = NULL;
> > +     __u32 index = 1;
> > +     int ret = -1;
> > +
> > +     if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
> > +     if (!if_id) {
> > +             log_err(ret);
> > +             return TC_ACT_SHOT;
> > +     }
> > +
> > +     ret = bpf_skb_get_xfrm_info(skb, &info);
> > +     if (ret < 0) {
> > +             log_err(ret);
> > +             return TC_ACT_SHOT;
> > +     }
> > +
> > +     *if_id = info.if_id;
> > +
> > +     return TC_ACT_UNSPEC;
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
>

Thanks for the review!
Eyal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
  2022-11-30 18:14   ` Martin KaFai Lau
@ 2022-12-01  5:55     ` Eyal Birger
  2022-12-01 13:30       ` Eyal Birger
  2022-12-01 20:21       ` Martin KaFai Lau
  0 siblings, 2 replies; 17+ messages in thread
From: Eyal Birger @ 2022-12-01  5:55 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

Hi Martin,

On Wed, Nov 30, 2022 at 8:15 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 11/29/22 5:20 AM, Eyal Birger wrote:
> > diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c
> > new file mode 100644
> > index 000000000000..757e15857dbf
> > --- /dev/null
> > +++ b/net/xfrm/xfrm_interface_bpf.c
> > @@ -0,0 +1,100 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Unstable XFRM Helpers for TC-BPF hook
> > + *
> > + * These are called from SCHED_CLS BPF programs. Note that it is
> > + * allowed to break compatibility for these functions since the interface they
> > + * are exposed through to BPF programs is explicitly unstable.
> > + */
> > +
> > +#include <linux/bpf.h>
> > +#include <linux/btf_ids.h>
> > +
> > +#include <net/dst_metadata.h>
> > +#include <net/xfrm.h>
> > +
> > +struct bpf_xfrm_info {
> No need to introduce a bpf variant of the "struct xfrm_md_info" (more on this
> later).
>
> > +     u32 if_id;
> > +     int link;
> > +};
> > +
> > +static struct metadata_dst __percpu *xfrm_md_dst;
> > +__diag_push();
> > +__diag_ignore_all("-Wmissing-prototypes",
> > +               "Global functions as their definitions will be in xfrm_interface BTF");
> > +
> > +__used noinline
> > +int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to)
>
> This kfunc is not needed.  It only reads the skb->_skb_refdst.  The new kfunc
> bpf_rdonly_cast() can be used.  Take a look at the bpf_rdonly_cast() usages in
> the selftests/bpf/progs/type_cast.c.  It was in bpf-next only but should also be
> in net-next now.

I'm somewhat concerned with this approach.
Indeed it would remove the kfunc, and the API is declared "unstable", but
still the implementation as dst isn't relevant to the user and would make
the programs less readable.

Also note that the helper can be also used as it is to get the xfrm info at
egress from an lwt route (which stores the xfrm_info in the dst lwstate).

>
> > +{
> > +     struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> > +     struct xfrm_md_info *info;
> > +
> > +     memset(to, 0, sizeof(*to));
> > +
> > +     info = skb_xfrm_md_info(skb);
> > +     if (!info)
> > +             return -EINVAL;
> > +
> > +     to->if_id = info->if_id;
> > +     to->link = info->link;
> > +     return 0;
> > +}
> > +
> > +__used noinline
> > +int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
> > +                       const struct bpf_xfrm_info *from)
>
> Directly use "const struct xfrm_md_info *from" instead.  This kfunc can check
> from->dst_orig != NULL and return -EINVAL.  It will then have a consistent API
> with the bpf_rdonly_cast() mentioned above.

See above.

>
> > +{
> > +     struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> > +     struct metadata_dst *md_dst;
> > +     struct xfrm_md_info *info;
> > +
> > +     if (unlikely(skb_metadata_dst(skb)))
> > +             return -EINVAL;
> > +
> > +     md_dst = this_cpu_ptr(xfrm_md_dst);
> > +
> > +     info = &md_dst->u.xfrm_info;
> > +     memset(info, 0, sizeof(*info));
>
> Unnecessary memset here.  Everything should have been initialized below.
> bpf_skb_set_tunnel_key() needs memset but not here.

Ok.

>
> > +
> > +     info->if_id = from->if_id;
> > +     info->link = from->link;
> > +     skb_dst_force(skb);
> > +     info->dst_orig = skb_dst(skb);
> > +
> > +     dst_hold((struct dst_entry *)md_dst);
> > +     skb_dst_set(skb, (struct dst_entry *)md_dst);
> > +     return 0;
> > +}
> > +
> > +__diag_pop()
> > +
> > +BTF_SET8_START(xfrm_ifc_kfunc_set)
> > +BTF_ID_FLAGS(func, bpf_skb_get_xfrm_info)
> > +BTF_ID_FLAGS(func, bpf_skb_set_xfrm_info)
> > +BTF_SET8_END(xfrm_ifc_kfunc_set)
> > +
> > +static const struct btf_kfunc_id_set xfrm_interface_kfunc_set = {
> > +     .owner = THIS_MODULE,
> > +     .set   = &xfrm_ifc_kfunc_set,
> > +};
> > +
> > +int __init register_xfrm_interface_bpf(void)
> > +{
> > +     int err;
> > +
> > +     xfrm_md_dst = metadata_dst_alloc_percpu(0, METADATA_XFRM,
> > +                                             GFP_KERNEL);
>
> May be DEFINE_PER_CPU() instead?

Are you suggesting duplicating the dst init/cleanup logic here?
Personally given that this happens once at module load, I think it's best to
use the existing infrastructure, but am willing to add this here if you think
it's better.

Thanks for the review!
Eyal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests
  2022-12-01  5:34     ` Eyal Birger
@ 2022-12-01  7:33       ` Martin KaFai Lau
  2022-12-01 13:33         ` Eyal Birger
  2022-12-01 20:26       ` Martin KaFai Lau
  1 sibling, 1 reply; 17+ messages in thread
From: Martin KaFai Lau @ 2022-12-01  7:33 UTC (permalink / raw)
  To: Eyal Birger
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On 11/30/22 9:34 PM, Eyal Birger wrote:
>>> +static int probe_iproute2(void)
>>> +{
>>> +     if (SYS_NOFAIL("ip link add type xfrm help 2>&1 | "
>>> +                    "grep external > /dev/null")) {
>>> +             fprintf(stdout, "%s:SKIP: iproute2 with xfrm external support needed for this test\n", __func__);
>>
>> Unfortunately, the BPF CI iproute2 does not have this support also :(
>> I am worry it will just stay SKIP for some time and rot.  Can you try to
>> directly use netlink here?
> 
> Yeah, I wasn't sure if adding a libmnl (or alternative) dependency
> was ok here, and also didn't want to copy all that nl logic here.
> So I figured it would get there eventually.
> 
> I noticed libmnl is used by the nf tests, so maybe its inclusion isn't too
> bad. Unless there's a better approach.

I wasn't thinking about including the libmnl.  I am thinking about something 
lightweight like the bpf_tc_hook_create() used in this test. 
bpf_tc_hook_create() is in libbpf's netlink.c.  Not sure if this netlink 
link-add helper belongs to libbpf though, so it will be better just stay here in 
this selftest for now.  If it is too complicated without libmnl, leave it as 
SKIP for now is an option and I will try to run it manually first with a newer 
iproute2.

will reply other comments tomorrow.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
  2022-12-01  5:55     ` Eyal Birger
@ 2022-12-01 13:30       ` Eyal Birger
  2022-12-01 20:18         ` Martin KaFai Lau
  2022-12-01 20:21       ` Martin KaFai Lau
  1 sibling, 1 reply; 17+ messages in thread
From: Eyal Birger @ 2022-12-01 13:30 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

Hi Martin,

On Thu, Dec 1, 2022 at 7:55 AM Eyal Birger <eyal.birger@gmail.com> wrote:
>
> Hi Martin,
>
> On Wed, Nov 30, 2022 at 8:15 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 11/29/22 5:20 AM, Eyal Birger wrote:
> > > diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c
> > > new file mode 100644
> > > index 000000000000..757e15857dbf
> > > --- /dev/null
> > > +++ b/net/xfrm/xfrm_interface_bpf.c
> > > @@ -0,0 +1,100 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/* Unstable XFRM Helpers for TC-BPF hook
> > > + *
> > > + * These are called from SCHED_CLS BPF programs. Note that it is
> > > + * allowed to break compatibility for these functions since the interface they
> > > + * are exposed through to BPF programs is explicitly unstable.
> > > + */
> > > +
> > > +#include <linux/bpf.h>
> > > +#include <linux/btf_ids.h>
> > > +
> > > +#include <net/dst_metadata.h>
> > > +#include <net/xfrm.h>
> > > +
> > > +struct bpf_xfrm_info {
> > No need to introduce a bpf variant of the "struct xfrm_md_info" (more on this
> > later).
> >
> > > +     u32 if_id;
> > > +     int link;
> > > +};
> > > +
> > > +static struct metadata_dst __percpu *xfrm_md_dst;
> > > +__diag_push();
> > > +__diag_ignore_all("-Wmissing-prototypes",
> > > +               "Global functions as their definitions will be in xfrm_interface BTF");
> > > +
> > > +__used noinline
> > > +int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to)
> >
> > This kfunc is not needed.  It only reads the skb->_skb_refdst.  The new kfunc
> > bpf_rdonly_cast() can be used.  Take a look at the bpf_rdonly_cast() usages in
> > the selftests/bpf/progs/type_cast.c.  It was in bpf-next only but should also be
> > in net-next now.
>
> I'm somewhat concerned with this approach.
> Indeed it would remove the kfunc, and the API is declared "unstable", but
> still the implementation as dst isn't relevant to the user and would make
> the programs less readable.
>
> Also note that the helper can be also used as it is to get the xfrm info at
> egress from an lwt route (which stores the xfrm_info in the dst lwstate).
>
> >
> > > +{
> > > +     struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> > > +     struct xfrm_md_info *info;
> > > +
> > > +     memset(to, 0, sizeof(*to));
> > > +
> > > +     info = skb_xfrm_md_info(skb);
> > > +     if (!info)
> > > +             return -EINVAL;
> > > +
> > > +     to->if_id = info->if_id;
> > > +     to->link = info->link;
> > > +     return 0;
> > > +}
> > > +
> > > +__used noinline
> > > +int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
> > > +                       const struct bpf_xfrm_info *from)
> >
> > Directly use "const struct xfrm_md_info *from" instead.  This kfunc can check
> > from->dst_orig != NULL and return -EINVAL.  It will then have a consistent API
> > with the bpf_rdonly_cast() mentioned above.
>
> See above.

Also, when trying this approach with bpf_set_xfrm_info() accepting
"const struct xfrm_md_info *from" I fail to load the program:

libbpf: prog 'set_xfrm_info': BPF program load failed: Invalid argument
libbpf: prog 'set_xfrm_info': -- BEGIN PROG LOAD LOG --
0: R1=ctx(off=0,imm=0) R10=fp0
; int set_xfrm_info(struct __sk_buff *skb)
0: (bf) r6 = r1                       ; R1=ctx(off=0,imm=0)
R6_w=ctx(off=0,imm=0)
1: (b7) r1 = 0                        ; R1_w=0
; struct xfrm_md_info info = {};
2: (7b) *(u64 *)(r10 -8) = r1         ; R1_w=0 R10=fp0 fp-8_w=00000000
3: (7b) *(u64 *)(r10 -16) = r1        ; R1_w=0 R10=fp0 fp-16_w=00000000
4: (b4) w1 = 0                        ; R1_w=0
; __u32 index = 0;
5: (63) *(u32 *)(r10 -20) = r1        ; R1_w=0 R10=fp0 fp-24=0000????
6: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
;
7: (07) r2 += -20                     ; R2_w=fp-20
; if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
8: (18) r1 = 0xffff888006751c00       ; R1_w=map_ptr(off=0,ks=4,vs=4,imm=0)
10: (85) call bpf_map_lookup_elem#1   ;
R0_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
11: (bf) r1 = r0                      ;
R0_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
R1_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
12: (b4) w0 = 2                       ; R0_w=2
; if (!if_id)
13: (15) if r1 == 0x0 goto pc+10      ; R1_w=map_value(off=0,ks=4,vs=4,imm=0)
14: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
;
15: (07) r2 += -16                    ; R2_w=fp-16
; info.if_id = *if_id;
16: (61) r1 = *(u32 *)(r1 +0)         ;
R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
; info.if_id = *if_id;
17: (63) *(u32 *)(r2 +0) = r1         ;
R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R2_w=fp-16
fp-16_w=
; ret = bpf_skb_set_xfrm_info(skb, &info);
18: (bf) r1 = r6                      ; R1_w=ctx(off=0,imm=0)
R6_w=ctx(off=0,imm=0)
19: (85) call bpf_skb_set_xfrm_info#81442
arg#1 pointer type STRUCT xfrm_md_info must point to scalar, or struct
with scalar

Is there some registration I need to do for this struct?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests
  2022-12-01  7:33       ` Martin KaFai Lau
@ 2022-12-01 13:33         ` Eyal Birger
  0 siblings, 0 replies; 17+ messages in thread
From: Eyal Birger @ 2022-12-01 13:33 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

Hi Martin,

On Thu, Dec 1, 2022 at 9:33 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 11/30/22 9:34 PM, Eyal Birger wrote:
> >>> +static int probe_iproute2(void)
> >>> +{
> >>> +     if (SYS_NOFAIL("ip link add type xfrm help 2>&1 | "
> >>> +                    "grep external > /dev/null")) {
> >>> +             fprintf(stdout, "%s:SKIP: iproute2 with xfrm external support needed for this test\n", __func__);
> >>
> >> Unfortunately, the BPF CI iproute2 does not have this support also :(
> >> I am worry it will just stay SKIP for some time and rot.  Can you try to
> >> directly use netlink here?
> >
> > Yeah, I wasn't sure if adding a libmnl (or alternative) dependency
> > was ok here, and also didn't want to copy all that nl logic here.
> > So I figured it would get there eventually.
> >
> > I noticed libmnl is used by the nf tests, so maybe its inclusion isn't too
> > bad. Unless there's a better approach.
>
> I wasn't thinking about including the libmnl.  I am thinking about something
> lightweight like the bpf_tc_hook_create() used in this test.
> bpf_tc_hook_create() is in libbpf's netlink.c.  Not sure if this netlink
> link-add helper belongs to libbpf though, so it will be better just stay here in
> this selftest for now.  If it is too complicated without libmnl, leave it as
> SKIP for now is an option and I will try to run it manually first with a newer
> iproute2.

Looks like it doesn't turn out too bad imho without libmnl, basically
needs a few rtattr helpers.
Will do it that way in v3.

Eyal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
  2022-12-01 13:30       ` Eyal Birger
@ 2022-12-01 20:18         ` Martin KaFai Lau
  2022-12-01 20:47           ` Eyal Birger
  0 siblings, 1 reply; 17+ messages in thread
From: Martin KaFai Lau @ 2022-12-01 20:18 UTC (permalink / raw)
  To: Eyal Birger
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On 12/1/22 5:30 AM, Eyal Birger wrote:
> Hi Martin,
> 
> On Thu, Dec 1, 2022 at 7:55 AM Eyal Birger <eyal.birger@gmail.com> wrote:
>>
>> Hi Martin,
>>
>> On Wed, Nov 30, 2022 at 8:15 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 11/29/22 5:20 AM, Eyal Birger wrote:
>>>> diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c
>>>> new file mode 100644
>>>> index 000000000000..757e15857dbf
>>>> --- /dev/null
>>>> +++ b/net/xfrm/xfrm_interface_bpf.c
>>>> @@ -0,0 +1,100 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/* Unstable XFRM Helpers for TC-BPF hook
>>>> + *
>>>> + * These are called from SCHED_CLS BPF programs. Note that it is
>>>> + * allowed to break compatibility for these functions since the interface they
>>>> + * are exposed through to BPF programs is explicitly unstable.
>>>> + */
>>>> +
>>>> +#include <linux/bpf.h>
>>>> +#include <linux/btf_ids.h>
>>>> +
>>>> +#include <net/dst_metadata.h>
>>>> +#include <net/xfrm.h>
>>>> +
>>>> +struct bpf_xfrm_info {
>>> No need to introduce a bpf variant of the "struct xfrm_md_info" (more on this
>>> later).
>>>
>>>> +     u32 if_id;
>>>> +     int link;
>>>> +};
>>>> +
>>>> +static struct metadata_dst __percpu *xfrm_md_dst;
>>>> +__diag_push();
>>>> +__diag_ignore_all("-Wmissing-prototypes",
>>>> +               "Global functions as their definitions will be in xfrm_interface BTF");
>>>> +
>>>> +__used noinline
>>>> +int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to)
>>>
>>> This kfunc is not needed.  It only reads the skb->_skb_refdst.  The new kfunc
>>> bpf_rdonly_cast() can be used.  Take a look at the bpf_rdonly_cast() usages in
>>> the selftests/bpf/progs/type_cast.c.  It was in bpf-next only but should also be
>>> in net-next now.
>>
>> I'm somewhat concerned with this approach.
>> Indeed it would remove the kfunc, and the API is declared "unstable", but
>> still the implementation as dst isn't relevant to the user and would make
>> the programs less readable.
>>
>> Also note that the helper can be also used as it is to get the xfrm info at
>> egress from an lwt route (which stores the xfrm_info in the dst lwstate).

Right, the whole skb_xfrm_md_info() can be implemented in bpf prog itself, like 
checking lwtstate.

If adding a kfunc, how about directly expose skb_xfrm_md_info() itself as a 
kfunc to bpf prog and directly return a "struct xfrm_md_info *" instead.  Then 
there is no need to copy if_id/link...etc.  The bpf prog has no need to 
initialize the "to" also.  Something like this:

__used noinline
const struct xfrm_md_info *bpf_skb_xfrm_md_info(const struct __sk_buff *skb) { ... }

BTF_ID_FLAGS(func, bpf_skb_xfrm_md_info, KF_RET_NULL)

>>
>>>
>>>> +{
>>>> +     struct sk_buff *skb = (struct sk_buff *)skb_ctx;
>>>> +     struct xfrm_md_info *info;
>>>> +
>>>> +     memset(to, 0, sizeof(*to));
>>>> +
>>>> +     info = skb_xfrm_md_info(skb);
>>>> +     if (!info)
>>>> +             return -EINVAL;
>>>> +
>>>> +     to->if_id = info->if_id;
>>>> +     to->link = info->link;
>>>> +     return 0;
>>>> +}
>>>> +
>>>> +__used noinline
>>>> +int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
>>>> +                       const struct bpf_xfrm_info *from)
>>>
>>> Directly use "const struct xfrm_md_info *from" instead.  This kfunc can check
>>> from->dst_orig != NULL and return -EINVAL.  It will then have a consistent API
>>> with the bpf_rdonly_cast() mentioned above.
>>
>> See above.
> 
> Also, when trying this approach with bpf_set_xfrm_info() accepting
> "const struct xfrm_md_info *from" I fail to load the program:
> 
> libbpf: prog 'set_xfrm_info': BPF program load failed: Invalid argument
> libbpf: prog 'set_xfrm_info': -- BEGIN PROG LOAD LOG --
> 0: R1=ctx(off=0,imm=0) R10=fp0
> ; int set_xfrm_info(struct __sk_buff *skb)
> 0: (bf) r6 = r1                       ; R1=ctx(off=0,imm=0)
> R6_w=ctx(off=0,imm=0)
> 1: (b7) r1 = 0                        ; R1_w=0
> ; struct xfrm_md_info info = {};
> 2: (7b) *(u64 *)(r10 -8) = r1         ; R1_w=0 R10=fp0 fp-8_w=00000000
> 3: (7b) *(u64 *)(r10 -16) = r1        ; R1_w=0 R10=fp0 fp-16_w=00000000
> 4: (b4) w1 = 0                        ; R1_w=0
> ; __u32 index = 0;
> 5: (63) *(u32 *)(r10 -20) = r1        ; R1_w=0 R10=fp0 fp-24=0000????
> 6: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
> ;
> 7: (07) r2 += -20                     ; R2_w=fp-20
> ; if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
> 8: (18) r1 = 0xffff888006751c00       ; R1_w=map_ptr(off=0,ks=4,vs=4,imm=0)
> 10: (85) call bpf_map_lookup_elem#1   ;
> R0_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
> 11: (bf) r1 = r0                      ;
> R0_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
> R1_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
> 12: (b4) w0 = 2                       ; R0_w=2
> ; if (!if_id)
> 13: (15) if r1 == 0x0 goto pc+10      ; R1_w=map_value(off=0,ks=4,vs=4,imm=0)
> 14: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
> ;
> 15: (07) r2 += -16                    ; R2_w=fp-16
> ; info.if_id = *if_id;
> 16: (61) r1 = *(u32 *)(r1 +0)         ;
> R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
> ; info.if_id = *if_id;
> 17: (63) *(u32 *)(r2 +0) = r1         ;
> R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R2_w=fp-16
> fp-16_w=
> ; ret = bpf_skb_set_xfrm_info(skb, &info);
> 18: (bf) r1 = r6                      ; R1_w=ctx(off=0,imm=0)
> R6_w=ctx(off=0,imm=0)
> 19: (85) call bpf_skb_set_xfrm_info#81442
> arg#1 pointer type STRUCT xfrm_md_info must point to scalar, or struct
> with scalar
> 
> Is there some registration I need to do for this struct?

Ah, thanks for trying!
hmm... it will need a change to the verifier.  likely tag the param with 
something like "const struct xfrm_md_info *from__nonscalar_ok".

The reason of my earlier suggestion was to avoid the need to duplicate future 
changes in xfrm_md_info to bpf_xfrm_info and more importantly avoid creating 
another __sk_buff vs sk_buff like usage confusion.

For now, lets stay with bpf_xfrm_info.  This can be changed later.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
  2022-12-01  5:55     ` Eyal Birger
  2022-12-01 13:30       ` Eyal Birger
@ 2022-12-01 20:21       ` Martin KaFai Lau
  1 sibling, 0 replies; 17+ messages in thread
From: Martin KaFai Lau @ 2022-12-01 20:21 UTC (permalink / raw)
  To: Eyal Birger
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On 11/30/22 9:55 PM, Eyal Birger wrote:
>>> +
>>> +     info->if_id = from->if_id;
>>> +     info->link = from->link;
>>> +     skb_dst_force(skb);
>>> +     info->dst_orig = skb_dst(skb);
>>> +
>>> +     dst_hold((struct dst_entry *)md_dst);
>>> +     skb_dst_set(skb, (struct dst_entry *)md_dst);
>>> +     return 0;
>>> +}
>>> +
>>> +__diag_pop()
>>> +
>>> +BTF_SET8_START(xfrm_ifc_kfunc_set)
>>> +BTF_ID_FLAGS(func, bpf_skb_get_xfrm_info)
>>> +BTF_ID_FLAGS(func, bpf_skb_set_xfrm_info)
>>> +BTF_SET8_END(xfrm_ifc_kfunc_set)
>>> +
>>> +static const struct btf_kfunc_id_set xfrm_interface_kfunc_set = {
>>> +     .owner = THIS_MODULE,
>>> +     .set   = &xfrm_ifc_kfunc_set,
>>> +};
>>> +
>>> +int __init register_xfrm_interface_bpf(void)
>>> +{
>>> +     int err;
>>> +
>>> +     xfrm_md_dst = metadata_dst_alloc_percpu(0, METADATA_XFRM,
>>> +                                             GFP_KERNEL);
>>
>> May be DEFINE_PER_CPU() instead?
> 
> Are you suggesting duplicating the dst init/cleanup logic here?
> Personally given that this happens once at module load, I think it's best to
> use the existing infrastructure, but am willing to add this here if you think
> it's better.

Agree, staying with the current patch is better.  I somehow thought 
metadata_dst_alloc_percpu() was newly added in this patch also.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests
  2022-12-01  5:34     ` Eyal Birger
  2022-12-01  7:33       ` Martin KaFai Lau
@ 2022-12-01 20:26       ` Martin KaFai Lau
  2022-12-01 20:48         ` Eyal Birger
  1 sibling, 1 reply; 17+ messages in thread
From: Martin KaFai Lau @ 2022-12-01 20:26 UTC (permalink / raw)
  To: Eyal Birger
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On 11/30/22 9:34 PM, Eyal Birger wrote:
>>> +
>>> +struct {
>>> +     __uint(type, BPF_MAP_TYPE_ARRAY);
>>> +     __uint(max_entries, 2);
>>> +     __type(key, __u32);
>>> +     __type(value, __u32);
>>> +} dst_if_id_map SEC(".maps");
>>
>> It is easier to use global variables instead of a map.
> 
> Would these be available for read/write from the test application (as the
> map is currently populated/read from userspace)?

Yes, through the skel->bss->...
selftests/bpf/prog_tests/ has examples.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF
  2022-12-01 20:18         ` Martin KaFai Lau
@ 2022-12-01 20:47           ` Eyal Birger
  0 siblings, 0 replies; 17+ messages in thread
From: Eyal Birger @ 2022-12-01 20:47 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On Thu, Dec 1, 2022 at 10:18 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 12/1/22 5:30 AM, Eyal Birger wrote:
> > Hi Martin,
> >
> > On Thu, Dec 1, 2022 at 7:55 AM Eyal Birger <eyal.birger@gmail.com> wrote:
> >>
> >> Hi Martin,
> >>
> >> On Wed, Nov 30, 2022 at 8:15 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >>>
> >>> On 11/29/22 5:20 AM, Eyal Birger wrote:
> >>>> diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c
> >>>> new file mode 100644
> >>>> index 000000000000..757e15857dbf
> >>>> --- /dev/null
> >>>> +++ b/net/xfrm/xfrm_interface_bpf.c
> >>>> @@ -0,0 +1,100 @@
> >>>> +// SPDX-License-Identifier: GPL-2.0-only
> >>>> +/* Unstable XFRM Helpers for TC-BPF hook
> >>>> + *
> >>>> + * These are called from SCHED_CLS BPF programs. Note that it is
> >>>> + * allowed to break compatibility for these functions since the interface they
> >>>> + * are exposed through to BPF programs is explicitly unstable.
> >>>> + */
> >>>> +
> >>>> +#include <linux/bpf.h>
> >>>> +#include <linux/btf_ids.h>
> >>>> +
> >>>> +#include <net/dst_metadata.h>
> >>>> +#include <net/xfrm.h>
> >>>> +
> >>>> +struct bpf_xfrm_info {
> >>> No need to introduce a bpf variant of the "struct xfrm_md_info" (more on this
> >>> later).
> >>>
> >>>> +     u32 if_id;
> >>>> +     int link;
> >>>> +};
> >>>> +
> >>>> +static struct metadata_dst __percpu *xfrm_md_dst;
> >>>> +__diag_push();
> >>>> +__diag_ignore_all("-Wmissing-prototypes",
> >>>> +               "Global functions as their definitions will be in xfrm_interface BTF");
> >>>> +
> >>>> +__used noinline
> >>>> +int bpf_skb_get_xfrm_info(struct __sk_buff *skb_ctx, struct bpf_xfrm_info *to)
> >>>
> >>> This kfunc is not needed.  It only reads the skb->_skb_refdst.  The new kfunc
> >>> bpf_rdonly_cast() can be used.  Take a look at the bpf_rdonly_cast() usages in
> >>> the selftests/bpf/progs/type_cast.c.  It was in bpf-next only but should also be
> >>> in net-next now.
> >>
> >> I'm somewhat concerned with this approach.
> >> Indeed it would remove the kfunc, and the API is declared "unstable", but
> >> still the implementation as dst isn't relevant to the user and would make
> >> the programs less readable.
> >>
> >> Also note that the helper can be also used as it is to get the xfrm info at
> >> egress from an lwt route (which stores the xfrm_info in the dst lwstate).
>
> Right, the whole skb_xfrm_md_info() can be implemented in bpf prog itself, like
> checking lwtstate.
>
> If adding a kfunc, how about directly expose skb_xfrm_md_info() itself as a
> kfunc to bpf prog and directly return a "struct xfrm_md_info *" instead.  Then
> there is no need to copy if_id/link...etc.  The bpf prog has no need to
> initialize the "to" also.  Something like this:
>
> __used noinline
> const struct xfrm_md_info *bpf_skb_xfrm_md_info(const struct __sk_buff *skb) { ... }
>
> BTF_ID_FLAGS(func, bpf_skb_xfrm_md_info, KF_RET_NULL)
>
> >>
> >>>
> >>>> +{
> >>>> +     struct sk_buff *skb = (struct sk_buff *)skb_ctx;
> >>>> +     struct xfrm_md_info *info;
> >>>> +
> >>>> +     memset(to, 0, sizeof(*to));
> >>>> +
> >>>> +     info = skb_xfrm_md_info(skb);
> >>>> +     if (!info)
> >>>> +             return -EINVAL;
> >>>> +
> >>>> +     to->if_id = info->if_id;
> >>>> +     to->link = info->link;
> >>>> +     return 0;
> >>>> +}
> >>>> +
> >>>> +__used noinline
> >>>> +int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx,
> >>>> +                       const struct bpf_xfrm_info *from)
> >>>
> >>> Directly use "const struct xfrm_md_info *from" instead.  This kfunc can check
> >>> from->dst_orig != NULL and return -EINVAL.  It will then have a consistent API
> >>> with the bpf_rdonly_cast() mentioned above.
> >>
> >> See above.
> >
> > Also, when trying this approach with bpf_set_xfrm_info() accepting
> > "const struct xfrm_md_info *from" I fail to load the program:
> >
> > libbpf: prog 'set_xfrm_info': BPF program load failed: Invalid argument
> > libbpf: prog 'set_xfrm_info': -- BEGIN PROG LOAD LOG --
> > 0: R1=ctx(off=0,imm=0) R10=fp0
> > ; int set_xfrm_info(struct __sk_buff *skb)
> > 0: (bf) r6 = r1                       ; R1=ctx(off=0,imm=0)
> > R6_w=ctx(off=0,imm=0)
> > 1: (b7) r1 = 0                        ; R1_w=0
> > ; struct xfrm_md_info info = {};
> > 2: (7b) *(u64 *)(r10 -8) = r1         ; R1_w=0 R10=fp0 fp-8_w=00000000
> > 3: (7b) *(u64 *)(r10 -16) = r1        ; R1_w=0 R10=fp0 fp-16_w=00000000
> > 4: (b4) w1 = 0                        ; R1_w=0
> > ; __u32 index = 0;
> > 5: (63) *(u32 *)(r10 -20) = r1        ; R1_w=0 R10=fp0 fp-24=0000????
> > 6: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
> > ;
> > 7: (07) r2 += -20                     ; R2_w=fp-20
> > ; if_id = bpf_map_lookup_elem(&dst_if_id_map, &index);
> > 8: (18) r1 = 0xffff888006751c00       ; R1_w=map_ptr(off=0,ks=4,vs=4,imm=0)
> > 10: (85) call bpf_map_lookup_elem#1   ;
> > R0_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
> > 11: (bf) r1 = r0                      ;
> > R0_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
> > R1_w=map_value_or_null(id=1,off=0,ks=4,vs=4,imm=0)
> > 12: (b4) w0 = 2                       ; R0_w=2
> > ; if (!if_id)
> > 13: (15) if r1 == 0x0 goto pc+10      ; R1_w=map_value(off=0,ks=4,vs=4,imm=0)
> > 14: (bf) r2 = r10                     ; R2_w=fp0 R10=fp0
> > ;
> > 15: (07) r2 += -16                    ; R2_w=fp-16
> > ; info.if_id = *if_id;
> > 16: (61) r1 = *(u32 *)(r1 +0)         ;
> > R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
> > ; info.if_id = *if_id;
> > 17: (63) *(u32 *)(r2 +0) = r1         ;
> > R1_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R2_w=fp-16
> > fp-16_w=
> > ; ret = bpf_skb_set_xfrm_info(skb, &info);
> > 18: (bf) r1 = r6                      ; R1_w=ctx(off=0,imm=0)
> > R6_w=ctx(off=0,imm=0)
> > 19: (85) call bpf_skb_set_xfrm_info#81442
> > arg#1 pointer type STRUCT xfrm_md_info must point to scalar, or struct
> > with scalar
> >
> > Is there some registration I need to do for this struct?
>
> Ah, thanks for trying!
> hmm... it will need a change to the verifier.  likely tag the param with
> something like "const struct xfrm_md_info *from__nonscalar_ok".
>
> The reason of my earlier suggestion was to avoid the need to duplicate future
> changes in xfrm_md_info to bpf_xfrm_info and more importantly avoid creating
> another __sk_buff vs sk_buff like usage confusion.
>
> For now, lets stay with bpf_xfrm_info.  This can be changed later.
Ok.

I suggest that in that case we use struct bpf_xfrm_info for both the getter
and setter for now, as it would be more confusing to use different structs.

Btw, I did try using vmlinux.h and importing struct xfrm_md_info, however,
it seems vmlinux.h and linux/pkt_cls.h don't play well together, and the
latter is needed for the TC_ACT_* macro definitions. So at least for the
time being it's another reason to define a single struct in the test program.

Eyal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests
  2022-12-01 20:26       ` Martin KaFai Lau
@ 2022-12-01 20:48         ` Eyal Birger
  0 siblings, 0 replies; 17+ messages in thread
From: Eyal Birger @ 2022-12-01 20:48 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, linux-kselftest, davem, edumazet, kuba, pabeni,
	steffen.klassert, herbert, andrii, daniel, nicolas.dichtel,
	razor, mykolal, ast, song, yhs, john.fastabend, kpsingh, sdf,
	haoluo, jolsa, shuah

On Thu, Dec 1, 2022 at 10:26 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 11/30/22 9:34 PM, Eyal Birger wrote:
> >>> +
> >>> +struct {
> >>> +     __uint(type, BPF_MAP_TYPE_ARRAY);
> >>> +     __uint(max_entries, 2);
> >>> +     __type(key, __u32);
> >>> +     __type(value, __u32);
> >>> +} dst_if_id_map SEC(".maps");
> >>
> >> It is easier to use global variables instead of a map.
> >
> > Would these be available for read/write from the test application (as the
> > map is currently populated/read from userspace)?
>
> Yes, through the skel->bss->...
> selftests/bpf/prog_tests/ has examples.

Oh this is very useful! I tested and indeed this works perfectly.
WIll use in v3.

Thanks!
Eyal.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-12-01 20:48 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-29 13:20 [PATCH ipsec-next,v2 0/3] xfrm: interface: Add unstable helpers for XFRM metadata Eyal Birger
2022-11-29 13:20 ` [PATCH ipsec-next,v2 1/3] xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c Eyal Birger
2022-11-29 13:20 ` [PATCH ipsec-next,v2 2/3] xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF Eyal Birger
2022-11-30 18:14   ` Martin KaFai Lau
2022-12-01  5:55     ` Eyal Birger
2022-12-01 13:30       ` Eyal Birger
2022-12-01 20:18         ` Martin KaFai Lau
2022-12-01 20:47           ` Eyal Birger
2022-12-01 20:21       ` Martin KaFai Lau
2022-11-29 13:20 ` [PATCH ipsec-next,v2 3/3] selftests/bpf: add xfrm_info tests Eyal Birger
2022-11-30 18:41   ` Martin KaFai Lau
2022-11-30 18:48     ` Martin KaFai Lau
2022-12-01  5:34     ` Eyal Birger
2022-12-01  7:33       ` Martin KaFai Lau
2022-12-01 13:33         ` Eyal Birger
2022-12-01 20:26       ` Martin KaFai Lau
2022-12-01 20:48         ` Eyal Birger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.