All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 bpf-next 0/4] bpf: Support setting variable-length tunnel options
@ 2022-09-11 12:23 Shmulik Ladkani
  2022-09-11 12:23 ` [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers Shmulik Ladkani
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Shmulik Ladkani @ 2022-09-11 12:23 UTC (permalink / raw)
  To: bpf, Daniel Borkmann, John Fastabend, Joanne Koong, Andrii Nakryiko
  Cc: Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani, kernel test robot

Introduce 'skb_set_tunnel_opt_dynptr' to allow setting tunnel options of
dynamic length.

v2:
- Place test_tunnel's local route in a custom table, to ensure the IP
  isn't considered assigned to a device.
v3:
- Avoid 'inline' for the __bpf_skb_set_tunopt helper function
v4:
- change API to be based on bpf_dynptr,
  suggested by John Fastabend <john.fastabend@gmail.com>
v5:
- fix bpf_dynptr_get_data's incorrect usage of bpf_dynptr_kern's size
  spotted by Joanne Koong <joannelkoong@gmail.com>
v6:
- Simplify bpf_dynptr_get_data's interface and make it inline
  suggested by John Fastabend <john.fastabend@gmail.com>
- Simplify bpf_skb_set_tunnel_opt_dynptr's interface, removing the
  superfluous 'len' parameter
  suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>
- Fix missing retcodes in progs/test_tunnel_kern.c
  spotted by John Fastabend <john.fastabend@gmail.com>
v7:
- Fix undefined reference to `bpf_dynptr_get_size' when CONFIG_BPF_SYSCALL
  is unset,
Reported-by: kernel test robot <lkp@intel.com>

Shmulik Ladkani (4):
  bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers
  bpf: Support setting variable-length tunnel options
  selftests/bpf: Simplify test_tunnel setup for allowing non-local
    tunnel traffic
  selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case
    to test_progs

 include/linux/bpf.h                           |  13 ++
 include/uapi/linux/bpf.h                      |  11 +
 kernel/bpf/helpers.c                          |   2 +-
 net/core/filter.c                             |  31 ++-
 tools/include/uapi/linux/bpf.h                |  11 +
 .../selftests/bpf/prog_tests/test_tunnel.c    | 131 +++++++++--
 .../selftests/bpf/progs/test_tunnel_kern.c    | 212 ++++++++++++------
 7 files changed, 325 insertions(+), 86 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers
  2022-09-11 12:23 [PATCH v7 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
@ 2022-09-11 12:23 ` Shmulik Ladkani
  2022-09-20  2:32   ` Yonghong Song
  2022-09-21  8:38   ` Hou Tao
  2022-09-11 12:23 ` [PATCH v7 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Shmulik Ladkani @ 2022-09-11 12:23 UTC (permalink / raw)
  To: bpf, Daniel Borkmann, John Fastabend, Joanne Koong, Andrii Nakryiko
  Cc: Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani, kernel test robot

This allows kernel code dealing with dynptrs obtain dynptr's available
size and current (w. proper offset) data pointer.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

---
v5:
- fix bpf_dynptr_get_data's incorrect usage of bpf_dynptr_kern's size
  spotted by Joanne Koong <joannelkoong@gmail.com>

v6:
- Simplify bpf_dynptr_get_data's interface and make it inline
  suggested by John Fastabend <john.fastabend@gmail.com>

v7:
- Fix undefined reference to `bpf_dynptr_get_size' when CONFIG_BPF_SYSCALL
  is unset,
Reported-by: kernel test robot <lkp@intel.com>
---
 include/linux/bpf.h  | 13 +++++++++++++
 kernel/bpf/helpers.c |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 48ae05099f36..a2f16e3cb0fa 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2631,6 +2631,19 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
 		     enum bpf_dynptr_type type, u32 offset, u32 size);
 void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
 int bpf_dynptr_check_size(u32 size);
+#ifdef CONFIG_BPF_SYSCALL
+u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr);
+#else
+static inline u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
+{
+	return 0;
+}
+#endif
+
+static inline void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr)
+{
+	return ptr->data ? ptr->data + ptr->offset : NULL;
+}
 
 #ifdef CONFIG_BPF_LSM
 void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index fc08035f14ed..824864ac82d1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1408,7 +1408,7 @@ static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_typ
 	ptr->size |= type << DYNPTR_TYPE_SHIFT;
 }
 
-static u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
+u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
 {
 	return ptr->size & DYNPTR_SIZE_MASK;
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 bpf-next 2/4] bpf: Support setting variable-length tunnel options
  2022-09-11 12:23 [PATCH v7 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
  2022-09-11 12:23 ` [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers Shmulik Ladkani
@ 2022-09-11 12:23 ` Shmulik Ladkani
  2022-09-22  0:20   ` Andrii Nakryiko
  2022-09-11 12:23 ` [PATCH v7 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
  2022-09-11 12:23 ` [PATCH v7 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
  3 siblings, 1 reply; 12+ messages in thread
From: Shmulik Ladkani @ 2022-09-11 12:23 UTC (permalink / raw)
  To: bpf, Daniel Borkmann, John Fastabend, Joanne Koong, Andrii Nakryiko
  Cc: Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani

Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given
an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer
size (ARG_CONST_SIZE).

However, in certain cases we wish to set tunnel options of dynamic
length.

For example, we have an ebpf program that gets geneve options on
incoming packets, stores them into a map (using a key representing
the incoming flow), and later needs to assign *same* options to
reply packets (belonging to same flow).

This is currently imposssible without knowing sender's exact geneve
options length, which unfortunately is dymamic.

Introduce 'bpf_skb_set_tunnel_opt_dynptr'.

This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic
pointer (ARG_PTR_TO_DYNPTR) parameter whose data points to the options
buffer to set.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

---
v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function
v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com>
v6: Remove superfluous 'len' from bpf_skb_set_tunnel_opt_dynptr API
    (rely on dynptr's internal size), suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>
---
 include/uapi/linux/bpf.h       | 11 +++++++++++
 net/core/filter.c              | 31 +++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 11 +++++++++++
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3df78c56c1bf..ba12f7e1ccb6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5387,6 +5387,16 @@ union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt)
+ *	Description
+ *		Set tunnel options metadata for the packet associated to *skb*
+ *		to the option data pointed to by the *opt* dynptr.
+ *
+ *		See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ *		helper for additional information.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5598,6 +5608,7 @@ union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(skb_set_tunnel_opt_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index e872f45399b0..1c652936ef86 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4674,8 +4674,7 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
-BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
-	   const u8 *, from, u32, size)
+static u64 __bpf_skb_set_tunopt(struct sk_buff *skb, const u8 *from, u32 size)
 {
 	struct ip_tunnel_info *info = skb_tunnel_info(skb);
 	const struct metadata_dst *md = this_cpu_ptr(md_dst);
@@ -4690,6 +4689,22 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
 	return 0;
 }
 
+BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
+	   const u8 *, from, u32, size)
+{
+	return __bpf_skb_set_tunopt(skb, from, size);
+}
+
+BPF_CALL_2(bpf_skb_set_tunnel_opt_dynptr, struct sk_buff *, skb,
+	   struct bpf_dynptr_kern *, ptr)
+{
+	const u8 *from = bpf_dynptr_get_data(ptr);
+
+	if (unlikely(!from))
+		return -EFAULT;
+	return __bpf_skb_set_tunopt(skb, from, bpf_dynptr_get_size(ptr));
+}
+
 static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
 	.func		= bpf_skb_set_tunnel_opt,
 	.gpl_only	= false,
@@ -4699,6 +4714,14 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
+static const struct bpf_func_proto bpf_skb_set_tunnel_opt_dynptr_proto = {
+	.func		= bpf_skb_set_tunnel_opt_dynptr,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL,
+};
+
 static const struct bpf_func_proto *
 bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
 {
@@ -4719,6 +4742,8 @@ bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
 		return &bpf_skb_set_tunnel_key_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
 		return &bpf_skb_set_tunnel_opt_proto;
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
+		return &bpf_skb_set_tunnel_opt_dynptr_proto;
 	default:
 		return NULL;
 	}
@@ -7798,6 +7823,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_get_tunnel_opt:
 		return &bpf_skb_get_tunnel_opt_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
 		return bpf_get_skb_set_tunnel_proto(func_id);
 	case BPF_FUNC_redirect:
 		return &bpf_redirect_proto;
@@ -8145,6 +8171,7 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_get_tunnel_opt:
 		return &bpf_skb_get_tunnel_opt_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
 		return bpf_get_skb_set_tunnel_proto(func_id);
 	case BPF_FUNC_redirect:
 		return &bpf_redirect_proto;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 3df78c56c1bf..ba12f7e1ccb6 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5387,6 +5387,16 @@ union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt)
+ *	Description
+ *		Set tunnel options metadata for the packet associated to *skb*
+ *		to the option data pointed to by the *opt* dynptr.
+ *
+ *		See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ *		helper for additional information.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5598,6 +5608,7 @@ union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(skb_set_tunnel_opt_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic
  2022-09-11 12:23 [PATCH v7 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
  2022-09-11 12:23 ` [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers Shmulik Ladkani
  2022-09-11 12:23 ` [PATCH v7 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
@ 2022-09-11 12:23 ` Shmulik Ladkani
  2022-09-11 12:23 ` [PATCH v7 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
  3 siblings, 0 replies; 12+ messages in thread
From: Shmulik Ladkani @ 2022-09-11 12:23 UTC (permalink / raw)
  To: bpf, Daniel Borkmann, John Fastabend, Joanne Koong, Andrii Nakryiko
  Cc: Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani

Commit 1115169f47ae ("selftests/bpf: Don't assign outer source IP to host")
removed the secondary IP (IP4_ADDR2_VETH1) assigned to veth1, in order
to test bpf_skb_set_tunnel_key's functionality when tunnel destination
isn't assigned to an interface.

The chosen setup for testing the "tunnel to unassigned outer IP"
scenario was rather complex: (1) static ARP entries in order to
bypass ARP (o/w requests will fail as the target address isn't assigned
locally), and (2) a BPF program running on veth1 ingress which
manipulates the IP header's daddr to the actual IP assigned to the
interface (o/w tunnel traffic won't be accepted locally).

This is complex, and adds a dependency on this hidden "dnat"-like eBPF
program, that needs to be replicated when new tunnel tests are added.

Instead, we can have a much simpler setup: Add the secondary IP as a
*local route* in a table pointed by a custom fib rule. No static arp
entries are needed, and the special eBPF program that "dnats" the outer
destination can be removed.

This commit is a revert of 1115169f47ae, with the addition of the local
route of IP4_ADDR2_VETH1 (instead of the original address assignment).

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

---
v2: Place the local route for the secondary IP in a custom table
    pointed by a custom fib rule; this ensures the IP is not considered
    assigned to a device.
---
 .../selftests/bpf/prog_tests/test_tunnel.c    | 23 ++----
 .../selftests/bpf/progs/test_tunnel_kern.c    | 80 +++----------------
 2 files changed, 17 insertions(+), 86 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index eea274110267..852da04ff281 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -82,7 +82,6 @@
 
 #define MAC_TUNL_DEV0 "52:54:00:d9:01:00"
 #define MAC_TUNL_DEV1 "52:54:00:d9:02:00"
-#define MAC_VETH1 "52:54:00:d9:03:00"
 
 #define VXLAN_TUNL_DEV0 "vxlan00"
 #define VXLAN_TUNL_DEV1 "vxlan11"
@@ -109,9 +108,15 @@
 static int config_device(void)
 {
 	SYS("ip netns add at_ns0");
-	SYS("ip link add veth0 address " MAC_VETH1 " type veth peer name veth1");
+	SYS("ip link add veth0 type veth peer name veth1");
 	SYS("ip link set veth0 netns at_ns0");
 	SYS("ip addr add " IP4_ADDR1_VETH1 "/24 dev veth1");
+	/* Create a custom rule routing IP4_ADDR2_VETH1 as local.
+	 * Do not place it in "local" table, to avoid this IP being considered
+	 * assigned to a device.
+	 */
+	SYS("ip rule add to " IP4_ADDR2_VETH1 " table 20");
+	SYS("ip route add local " IP4_ADDR2_VETH1 "/32 dev veth1 table 20");
 	SYS("ip link set dev veth1 up mtu 1500");
 	SYS("ip netns exec at_ns0 ip addr add " IP4_ADDR_VETH0 "/24 dev veth0");
 	SYS("ip netns exec at_ns0 ip link set dev veth0 up mtu 1500");
@@ -125,6 +130,7 @@ static void cleanup(void)
 {
 	SYS_NOFAIL("test -f /var/run/netns/at_ns0 && ip netns delete at_ns0");
 	SYS_NOFAIL("ip link del veth1 2> /dev/null");
+	SYS_NOFAIL("ip rule del to %s table 20 2> /dev/null", IP4_ADDR2_VETH1);
 	SYS_NOFAIL("ip link del %s 2> /dev/null", VXLAN_TUNL_DEV1);
 	SYS_NOFAIL("ip link del %s 2> /dev/null", IP6VXLAN_TUNL_DEV1);
 }
@@ -140,8 +146,6 @@ static int add_vxlan_tunnel(void)
 	    VXLAN_TUNL_DEV0, IP4_ADDR_TUNL_DEV0);
 	SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev %s",
 	    IP4_ADDR_TUNL_DEV1, MAC_TUNL_DEV1, VXLAN_TUNL_DEV0);
-	SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev veth0",
-	    IP4_ADDR2_VETH1, MAC_VETH1);
 
 	/* root namespace */
 	SYS("ip link add dev %s type vxlan external gbp dstport 4789",
@@ -279,17 +283,6 @@ static void test_vxlan_tunnel(void)
 	if (attach_tc_prog(&tc_hook, get_src_prog_fd, set_src_prog_fd))
 		goto done;
 
-	/* load and attach bpf prog to veth dev tc hook point */
-	ifindex = if_nametoindex("veth1");
-	if (!ASSERT_NEQ(ifindex, 0, "veth1 ifindex"))
-		goto done;
-	tc_hook.ifindex = ifindex;
-	set_dst_prog_fd = bpf_program__fd(skel->progs.veth_set_outer_dst);
-	if (!ASSERT_GE(set_dst_prog_fd, 0, "bpf_program__fd"))
-		goto done;
-	if (attach_tc_prog(&tc_hook, set_dst_prog_fd, -1))
-		goto done;
-
 	/* load and attach prog set_md to tunnel dev tc hook point at_ns0 */
 	nstoken = open_netns("at_ns0");
 	if (!ASSERT_OK_PTR(nstoken, "setns src"))
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index 98af55f0bcd3..b11f6952b0c8 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -15,24 +15,15 @@
 #include <linux/if_tunnel.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
-#include <linux/icmp.h>
 #include <linux/types.h>
 #include <linux/socket.h>
 #include <linux/pkt_cls.h>
 #include <linux/erspan.h>
-#include <linux/udp.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_endian.h>
 
 #define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
 
-#define VXLAN_UDP_PORT 4789
-
-/* Only IPv4 address assigned to veth1.
- * 172.16.1.200
- */
-#define ASSIGNED_ADDR_VETH1 0xac1001c8
-
 struct geneve_opt {
 	__be16	opt_class;
 	__u8	type;
@@ -43,11 +34,6 @@ struct geneve_opt {
 	__u8	opt_data[8]; /* hard-coded to 8 byte */
 };
 
-struct vxlanhdr {
-	__be32 vx_flags;
-	__be32 vx_vni;
-} __attribute__((packed));
-
 struct vxlan_metadata {
 	__u32     gbp;
 };
@@ -384,8 +370,14 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
 	int ret;
 	struct bpf_tunnel_key key;
 	struct vxlan_metadata md;
-	__u32 orig_daddr;
 	__u32 index = 0;
+	__u32 *local_ip = NULL;
+
+	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+	if (!local_ip) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
 
 	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key),
 				     BPF_F_TUNINFO_FLAGS);
@@ -400,13 +392,14 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
 		return TC_ACT_SHOT;
 	}
 
-	if (key.local_ipv4 != ASSIGNED_ADDR_VETH1 || md.gbp != 0x800FF ||
+	if (key.local_ipv4 != *local_ip || md.gbp != 0x800FF ||
 	    !(key.tunnel_flags & TUNNEL_KEY) ||
 	    (key.tunnel_flags & TUNNEL_CSUM)) {
 		bpf_printk("vxlan key %d local ip 0x%x remote ip 0x%x gbp 0x%x flags 0x%x\n",
 			   key.tunnel_id, key.local_ipv4,
 			   key.remote_ipv4, md.gbp,
 			   bpf_ntohs(key.tunnel_flags));
+		bpf_printk("local_ip 0x%x\n", *local_ip);
 		log_err(ret);
 		return TC_ACT_SHOT;
 	}
@@ -414,61 +407,6 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
-SEC("tc")
-int veth_set_outer_dst(struct __sk_buff *skb)
-{
-	struct ethhdr *eth = (struct ethhdr *)(long)skb->data;
-	__u32 assigned_ip = bpf_htonl(ASSIGNED_ADDR_VETH1);
-	void *data_end = (void *)(long)skb->data_end;
-	struct udphdr *udph;
-	struct iphdr *iph;
-	__u32 index = 0;
-	int ret = 0;
-	int shrink;
-	__s64 csum;
-
-	if ((void *)eth + sizeof(*eth) > data_end) {
-		log_err(ret);
-		return TC_ACT_SHOT;
-	}
-
-	if (eth->h_proto != bpf_htons(ETH_P_IP))
-		return TC_ACT_OK;
-
-	iph = (struct iphdr *)(eth + 1);
-	if ((void *)iph + sizeof(*iph) > data_end) {
-		log_err(ret);
-		return TC_ACT_SHOT;
-	}
-	if (iph->protocol != IPPROTO_UDP)
-		return TC_ACT_OK;
-
-	udph = (struct udphdr *)(iph + 1);
-	if ((void *)udph + sizeof(*udph) > data_end) {
-		log_err(ret);
-		return TC_ACT_SHOT;
-	}
-	if (udph->dest != bpf_htons(VXLAN_UDP_PORT))
-		return TC_ACT_OK;
-
-	if (iph->daddr != assigned_ip) {
-		csum = bpf_csum_diff(&iph->daddr, sizeof(__u32), &assigned_ip,
-				     sizeof(__u32), 0);
-		if (bpf_skb_store_bytes(skb, ETH_HLEN + offsetof(struct iphdr, daddr),
-					&assigned_ip, sizeof(__u32), 0) < 0) {
-			log_err(ret);
-			return TC_ACT_SHOT;
-		}
-		if (bpf_l3_csum_replace(skb, ETH_HLEN + offsetof(struct iphdr, check),
-					0, csum, 0) < 0) {
-			log_err(ret);
-			return TC_ACT_SHOT;
-		}
-		bpf_skb_change_type(skb, PACKET_HOST);
-	}
-	return TC_ACT_OK;
-}
-
 SEC("tc")
 int ip6vxlan_set_tunnel_dst(struct __sk_buff *skb)
 {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs
  2022-09-11 12:23 [PATCH v7 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
                   ` (2 preceding siblings ...)
  2022-09-11 12:23 ` [PATCH v7 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
@ 2022-09-11 12:23 ` Shmulik Ladkani
  2022-09-20  2:58   ` Yonghong Song
  3 siblings, 1 reply; 12+ messages in thread
From: Shmulik Ladkani @ 2022-09-11 12:23 UTC (permalink / raw)
  To: bpf, Daniel Borkmann, John Fastabend, Joanne Koong, Andrii Nakryiko
  Cc: Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani

Add geneve test to test_tunnel. The test setup and scheme resembles the
existing vxlan test.

The test also exercises tunnel option assignment using
bpf_skb_set_tunnel_opt_dynptr.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

---
v6:
- Fix missing retcodes in progs/test_tunnel_kern.c
  spotted by John Fastabend <john.fastabend@gmail.com>
- Simplify bpf_skb_set_tunnel_opt_dynptr's interface, removing the
  superfluous 'len' parameter
  suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>
---
 .../selftests/bpf/prog_tests/test_tunnel.c    | 108 ++++++++++++++
 .../selftests/bpf/progs/test_tunnel_kern.c    | 138 ++++++++++++++++++
 2 files changed, 246 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index 852da04ff281..9aae03c720e9 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -87,6 +87,8 @@
 #define VXLAN_TUNL_DEV1 "vxlan11"
 #define IP6VXLAN_TUNL_DEV0 "ip6vxlan00"
 #define IP6VXLAN_TUNL_DEV1 "ip6vxlan11"
+#define GENEVE_TUNL_DEV0 "geneve00"
+#define GENEVE_TUNL_DEV1 "geneve11"
 
 #define PING_ARGS "-i 0.01 -c 3 -w 10 -q"
 
@@ -133,6 +135,38 @@ static void cleanup(void)
 	SYS_NOFAIL("ip rule del to %s table 20 2> /dev/null", IP4_ADDR2_VETH1);
 	SYS_NOFAIL("ip link del %s 2> /dev/null", VXLAN_TUNL_DEV1);
 	SYS_NOFAIL("ip link del %s 2> /dev/null", IP6VXLAN_TUNL_DEV1);
+	SYS_NOFAIL("ip link del %s 2> /dev/null", GENEVE_TUNL_DEV1);
+}
+
+static int add_geneve_tunnel(void)
+{
+	/* at_ns0 namespace */
+	SYS("ip netns exec at_ns0 ip link add dev %s type geneve external",
+	    GENEVE_TUNL_DEV0);
+	SYS("ip netns exec at_ns0 ip link set dev %s address %s up",
+	    GENEVE_TUNL_DEV0, MAC_TUNL_DEV0);
+	SYS("ip netns exec at_ns0 ip addr add dev %s %s/24",
+	    GENEVE_TUNL_DEV0, IP4_ADDR_TUNL_DEV0);
+	SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev %s",
+	    IP4_ADDR_TUNL_DEV1, MAC_TUNL_DEV1, GENEVE_TUNL_DEV0);
+
+	/* root namespace */
+	SYS("ip link add dev %s type geneve external", GENEVE_TUNL_DEV1);
+	SYS("ip link set dev %s address %s up", GENEVE_TUNL_DEV1, MAC_TUNL_DEV1);
+	SYS("ip addr add dev %s %s/24", GENEVE_TUNL_DEV1, IP4_ADDR_TUNL_DEV1);
+	SYS("ip neigh add %s lladdr %s dev %s",
+	    IP4_ADDR_TUNL_DEV0, MAC_TUNL_DEV0, GENEVE_TUNL_DEV1);
+
+	return 0;
+fail:
+	return -1;
+}
+
+static void delete_geneve_tunnel(void)
+{
+	SYS_NOFAIL("ip netns exec at_ns0 ip link delete dev %s",
+		   GENEVE_TUNL_DEV0);
+	SYS_NOFAIL("ip link delete dev %s", GENEVE_TUNL_DEV1);
 }
 
 static int add_vxlan_tunnel(void)
@@ -248,6 +282,79 @@ static int attach_tc_prog(struct bpf_tc_hook *hook, int igr_fd, int egr_fd)
 	return 0;
 }
 
+static void test_geneve_tunnel(void)
+{
+	struct test_tunnel_kern *skel = NULL;
+	struct nstoken *nstoken;
+	int local_ip_map_fd = -1;
+	int set_src_prog_fd, get_src_prog_fd;
+	int set_dst_prog_fd;
+	int key = 0, ifindex = -1;
+	uint local_ip;
+	int err;
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
+			    .attach_point = BPF_TC_INGRESS);
+
+	/* add genve tunnel */
+	err = add_geneve_tunnel();
+	if (!ASSERT_OK(err, "add geneve tunnel"))
+		goto done;
+
+	/* load and attach bpf prog to tunnel dev tc hook point */
+	skel = test_tunnel_kern__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_tunnel_kern__open_and_load"))
+		goto done;
+	ifindex = if_nametoindex(GENEVE_TUNL_DEV1);
+	if (!ASSERT_NEQ(ifindex, 0, "geneve11 ifindex"))
+		goto done;
+	tc_hook.ifindex = ifindex;
+	get_src_prog_fd = bpf_program__fd(skel->progs.geneve_get_tunnel_src);
+	set_src_prog_fd = bpf_program__fd(skel->progs.geneve_set_tunnel_src);
+	if (!ASSERT_GE(get_src_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (!ASSERT_GE(set_src_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (attach_tc_prog(&tc_hook, get_src_prog_fd, set_src_prog_fd))
+		goto done;
+
+	/* load and attach prog set_md to tunnel dev tc hook point at_ns0 */
+	nstoken = open_netns("at_ns0");
+	if (!ASSERT_OK_PTR(nstoken, "setns src"))
+		goto done;
+	ifindex = if_nametoindex(GENEVE_TUNL_DEV0);
+	if (!ASSERT_NEQ(ifindex, 0, "geneve00 ifindex"))
+		goto done;
+	tc_hook.ifindex = ifindex;
+	set_dst_prog_fd = bpf_program__fd(skel->progs.geneve_set_tunnel_dst);
+	if (!ASSERT_GE(set_dst_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (attach_tc_prog(&tc_hook, -1, set_dst_prog_fd))
+		goto done;
+	close_netns(nstoken);
+
+	/* use veth1 ip 1 as tunnel source ip */
+	local_ip_map_fd = bpf_map__fd(skel->maps.local_ip_map);
+	if (!ASSERT_GE(local_ip_map_fd, 0, "bpf_map__fd"))
+		goto done;
+	local_ip = IP4_ADDR1_HEX_VETH1;
+	err = bpf_map_update_elem(local_ip_map_fd, &key, &local_ip, BPF_ANY);
+	if (!ASSERT_OK(err, "update bpf local_ip_map"))
+		goto done;
+
+	/* ping test */
+	err = test_ping(AF_INET, IP4_ADDR_TUNL_DEV0);
+	if (!ASSERT_OK(err, "test_ping"))
+		goto done;
+
+done:
+	/* delete geneve tunnel */
+	delete_geneve_tunnel();
+	if (local_ip_map_fd >= 0)
+		close(local_ip_map_fd);
+	if (skel)
+		test_tunnel_kern__destroy(skel);
+}
+
 static void test_vxlan_tunnel(void)
 {
 	struct test_tunnel_kern *skel = NULL;
@@ -408,6 +515,7 @@ static void *test_tunnel_run_tests(void *arg)
 
 	RUN_TEST(vxlan_tunnel);
 	RUN_TEST(ip6vxlan_tunnel);
+	RUN_TEST(geneve_tunnel);
 
 	cleanup();
 
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index b11f6952b0c8..cb901b76a547 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -24,6 +24,20 @@
 
 #define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
 
+#define GENEVE_OPTS_LEN0 12
+#define GENEVE_OPTS_LEN1 20
+
+struct tun_opts_raw {
+	__u8 data[64];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, struct tun_opts_raw);
+} geneve_opts SEC(".maps");
+
 struct geneve_opt {
 	__be16	opt_class;
 	__u8	type;
@@ -286,6 +300,130 @@ int ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
+SEC("tc")
+int geneve_set_tunnel_dst(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct tun_opts_raw *opts;
+	struct bpf_dynptr dptr;
+	__u32 index = 0;
+	__u32 *local_ip = NULL;
+
+	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+	if (!local_ip) {
+		log_err(-1);
+		return TC_ACT_SHOT;
+	}
+
+	index = 0;
+	opts = bpf_map_lookup_elem(&geneve_opts, &index);
+	if (!opts) {
+		log_err(-1);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.local_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.remote_ipv4 = *local_ip;
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	/* set empty geneve options (of runtime length) using a dynptr */
+	__builtin_memset(opts, 0x0, sizeof(*opts));
+	if (*local_ip % 2)
+		bpf_dynptr_from_mem(opts, GENEVE_OPTS_LEN1, 0, &dptr);
+	else
+		bpf_dynptr_from_mem(opts, GENEVE_OPTS_LEN0, 0, &dptr);
+	ret = bpf_skb_set_tunnel_opt_dynptr(skb, &dptr);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("tc")
+int geneve_set_tunnel_src(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	__u32 index = 0;
+	__u32 *local_ip = NULL;
+
+	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+	if (!local_ip) {
+		log_err(-1);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.local_ipv4 = *local_ip;
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("tc")
+int geneve_get_tunnel_src(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct tun_opts_raw opts;
+	int expected_opts_len;
+	__u32 index = 0;
+	__u32 *local_ip = NULL;
+
+	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+	if (!local_ip) {
+		log_err(-1);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &opts, sizeof(opts));
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	expected_opts_len = *local_ip % 2 ? GENEVE_OPTS_LEN1 : GENEVE_OPTS_LEN0;
+	if (key.local_ipv4 != *local_ip || ret != expected_opts_len) {
+		bpf_printk("geneve key %d local ip 0x%x remote ip 0x%x opts_len %d\n",
+			   key.tunnel_id, key.local_ipv4,
+			   key.remote_ipv4, ret);
+		bpf_printk("local_ip 0x%x\n", *local_ip);
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
 SEC("tc")
 int vxlan_set_tunnel_dst(struct __sk_buff *skb)
 {
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers
  2022-09-11 12:23 ` [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers Shmulik Ladkani
@ 2022-09-20  2:32   ` Yonghong Song
  2022-09-21  8:38   ` Hou Tao
  1 sibling, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2022-09-20  2:32 UTC (permalink / raw)
  To: Shmulik Ladkani, bpf, Daniel Borkmann, John Fastabend,
	Joanne Koong, Andrii Nakryiko
  Cc: Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani, kernel test robot



On 9/11/22 5:23 AM, Shmulik Ladkani wrote:
> This allows kernel code dealing with dynptrs obtain dynptr's available
> size and current (w. proper offset) data pointer.
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs
  2022-09-11 12:23 ` [PATCH v7 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
@ 2022-09-20  2:58   ` Yonghong Song
  2022-09-20  5:22     ` Shmulik Ladkani
  0 siblings, 1 reply; 12+ messages in thread
From: Yonghong Song @ 2022-09-20  2:58 UTC (permalink / raw)
  To: Shmulik Ladkani, bpf, Daniel Borkmann, John Fastabend,
	Joanne Koong, Andrii Nakryiko
  Cc: Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani



On 9/11/22 5:23 AM, Shmulik Ladkani wrote:
> Add geneve test to test_tunnel. The test setup and scheme resembles the
> existing vxlan test.
> 
> The test also exercises tunnel option assignment using
> bpf_skb_set_tunnel_opt_dynptr.
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> 
> ---
> v6:
> - Fix missing retcodes in progs/test_tunnel_kern.c
>    spotted by John Fastabend <john.fastabend@gmail.com>
> - Simplify bpf_skb_set_tunnel_opt_dynptr's interface, removing the
>    superfluous 'len' parameter
>    suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>
> ---
>   .../selftests/bpf/prog_tests/test_tunnel.c    | 108 ++++++++++++++
>   .../selftests/bpf/progs/test_tunnel_kern.c    | 138 ++++++++++++++++++
>   2 files changed, 246 insertions(+)
> 
[...]
>   
> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> index b11f6952b0c8..cb901b76a547 100644
> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
> @@ -24,6 +24,20 @@
>   
>   #define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
>   
> +#define GENEVE_OPTS_LEN0 12
> +#define GENEVE_OPTS_LEN1 20
> +
> +struct tun_opts_raw {
> +	__u8 data[64];
> +};
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, struct tun_opts_raw);
> +} geneve_opts SEC(".maps");
> +
>   struct geneve_opt {
>   	__be16	opt_class;
>   	__u8	type;
> @@ -286,6 +300,130 @@ int ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
>   	return TC_ACT_OK;
>   }
>   
> +SEC("tc")
> +int geneve_set_tunnel_dst(struct __sk_buff *skb)
> +{
> +	int ret;
> +	struct bpf_tunnel_key key;
> +	struct tun_opts_raw *opts;
> +	struct bpf_dynptr dptr;
> +	__u32 index = 0;
> +	__u32 *local_ip = NULL;
> +
> +	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
> +	if (!local_ip) {
> +		log_err(-1);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	index = 0;
> +	opts = bpf_map_lookup_elem(&geneve_opts, &index);
> +	if (!opts) {
> +		log_err(-1);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	__builtin_memset(&key, 0x0, sizeof(key));
> +	key.local_ipv4 = 0xac100164; /* 172.16.1.100 */
> +	key.remote_ipv4 = *local_ip;
> +	key.tunnel_id = 2;
> +	key.tunnel_tos = 0;
> +	key.tunnel_ttl = 64;
> +
> +	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
> +				     BPF_F_ZERO_CSUM_TX);
> +	if (ret < 0) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	/* set empty geneve options (of runtime length) using a dynptr */
> +	__builtin_memset(opts, 0x0, sizeof(*opts));
> +	if (*local_ip % 2)
> +		bpf_dynptr_from_mem(opts, GENEVE_OPTS_LEN1, 0, &dptr);
> +	else
> +		bpf_dynptr_from_mem(opts, GENEVE_OPTS_LEN0, 0, &dptr);
> +	ret = bpf_skb_set_tunnel_opt_dynptr(skb, &dptr);

I think the above example is not good. since it can write as
	if (*local_ip % 2)
		ret = bpf_skb_set_tunnel_opt(skb, opts, GENEVE_OPTS_LEN1);
	else
		ret = bpf_skb_set_tunnel_opt(skb, opts,	GENEVE_OPTS_LEN0);

In the commit message of Patch 2, we have

===
For example, we have an ebpf program that gets geneve options on
incoming packets, stores them into a map (using a key representing
the incoming flow), and later needs to assign *same* options to
reply packets (belonging to same flow).
===

It would be great if you can create a test case for the above
use case.


> +	if (ret < 0) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	return TC_ACT_OK;
> +}
> +
[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs
  2022-09-20  2:58   ` Yonghong Song
@ 2022-09-20  5:22     ` Shmulik Ladkani
  2022-09-21  6:11       ` Yonghong Song
  0 siblings, 1 reply; 12+ messages in thread
From: Shmulik Ladkani @ 2022-09-20  5:22 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Daniel Borkmann, John Fastabend, Joanne Koong,
	Andrii Nakryiko, Alexei Starovoitov, Paul Chaignon,
	Shmulik Ladkani

On Mon, 19 Sep 2022 19:58:20 -0700 Yonghong Song <yhs@fb.com> wrote:

> > +	/* set empty geneve options (of runtime length) using a dynptr */
> > +	__builtin_memset(opts, 0x0, sizeof(*opts));
> > +	if (*local_ip % 2)
> > +		bpf_dynptr_from_mem(opts, GENEVE_OPTS_LEN1, 0, &dptr);
> > +	else
> > +		bpf_dynptr_from_mem(opts, GENEVE_OPTS_LEN0, 0, &dptr);
> > +	ret = bpf_skb_set_tunnel_opt_dynptr(skb, &dptr);  
> 
> I think the above example is not good. since it can write as
> 	if (*local_ip % 2)
> 		ret = bpf_skb_set_tunnel_opt(skb, opts, GENEVE_OPTS_LEN1);
> 	else
> 		ret = bpf_skb_set_tunnel_opt(skb, opts,	GENEVE_OPTS_LEN0);
> 
> In the commit message of Patch 2, we have
> 
> ===
> For example, we have an ebpf program that gets geneve options on
> incoming packets, stores them into a map (using a key representing
> the incoming flow), and later needs to assign *same* options to
> reply packets (belonging to same flow).
> ===
> 
> It would be great if you can create a test case for the above
> use case.

Yes, but please note dynptr trim/advance API is still WIP:

https://lore.kernel.org/bpf/CAJnrk1a53F=LLaU+gdmXGcZBBeUR-anALT3iO6pyHKiZpD0cNw@mail.gmail.com/

However, once we settled on the API for setting variable length tunnel
options from a *dynptr* (and not from raw buffer+len), we can just
exercise 'bpf_skb_set_tunnel_opt_dynptr' regardless the original
usecase (i.e. we can assume dynptrs can be properly mangled).

In any case, I can later amend the test once all dynptr convenience
helpers are accepted.

Best,
Shmulik

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs
  2022-09-20  5:22     ` Shmulik Ladkani
@ 2022-09-21  6:11       ` Yonghong Song
  0 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2022-09-21  6:11 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: bpf, Daniel Borkmann, John Fastabend, Joanne Koong,
	Andrii Nakryiko, Alexei Starovoitov, Paul Chaignon,
	Shmulik Ladkani



On 9/19/22 10:22 PM, Shmulik Ladkani wrote:
> On Mon, 19 Sep 2022 19:58:20 -0700 Yonghong Song <yhs@fb.com> wrote:
> 
>>> +	/* set empty geneve options (of runtime length) using a dynptr */
>>> +	__builtin_memset(opts, 0x0, sizeof(*opts));
>>> +	if (*local_ip % 2)
>>> +		bpf_dynptr_from_mem(opts, GENEVE_OPTS_LEN1, 0, &dptr);
>>> +	else
>>> +		bpf_dynptr_from_mem(opts, GENEVE_OPTS_LEN0, 0, &dptr);
>>> +	ret = bpf_skb_set_tunnel_opt_dynptr(skb, &dptr);
>>
>> I think the above example is not good. since it can write as
>> 	if (*local_ip % 2)
>> 		ret = bpf_skb_set_tunnel_opt(skb, opts, GENEVE_OPTS_LEN1);
>> 	else
>> 		ret = bpf_skb_set_tunnel_opt(skb, opts,	GENEVE_OPTS_LEN0);
>>
>> In the commit message of Patch 2, we have
>>
>> ===
>> For example, we have an ebpf program that gets geneve options on
>> incoming packets, stores them into a map (using a key representing
>> the incoming flow), and later needs to assign *same* options to
>> reply packets (belonging to same flow).
>> ===
>>
>> It would be great if you can create a test case for the above
>> use case.
> 
> Yes, but please note dynptr trim/advance API is still WIP:
> 
> https://lore.kernel.org/bpf/CAJnrk1a53F=LLaU+gdmXGcZBBeUR-anALT3iO6pyHKiZpD0cNw@mail.gmail.com/
> 
> However, once we settled on the API for setting variable length tunnel
> options from a *dynptr* (and not from raw buffer+len), we can just
> exercise 'bpf_skb_set_tunnel_opt_dynptr' regardless the original
> usecase (i.e. we can assume dynptrs can be properly mangled).
> 
> In any case, I can later amend the test once all dynptr convenience
> helpers are accepted.

Could you give more details how you could use these additional
dynptr trim/advance APIs in your use case? It would give an
overall picture whether bpf_skb_set_tunnel_opt_dynptr() is
useful or not.

W.r.t. your map use case, you could create a map and populate
needed info (geneve options, lens) in user space, and then
the bpf program tries to get such information from the map
and then call bpf_skb_set_tunnel_opt_dynptr(). Maybe this
could mimic your use case?

> 
> Best,
> Shmulik

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers
  2022-09-11 12:23 ` [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers Shmulik Ladkani
  2022-09-20  2:32   ` Yonghong Song
@ 2022-09-21  8:38   ` Hou Tao
  2022-09-21 21:27     ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 12+ messages in thread
From: Hou Tao @ 2022-09-21  8:38 UTC (permalink / raw)
  To: Shmulik Ladkani, bpf, Daniel Borkmann, John Fastabend,
	Joanne Koong, Andrii Nakryiko
  Cc: Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani, kernel test robot

Hi,

On 9/11/2022 8:23 PM, Shmulik Ladkani wrote:
> This allows kernel code dealing with dynptrs obtain dynptr's available
> size and current (w. proper offset) data pointer.
>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
SNIP
> +
> +static inline void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr)
> +{
> +	return ptr->data ? ptr->data + ptr->offset : NULL;
> +}
Have one dummy question here. Is ptr->data == NULL is possible ? According to
the function prototype of bpf_dynptr_from_mem(), data can not be NULL. And IMO
in order to simplify the usage of bpf_dynptr_kernel, we need to ensure ptr->data
should be not NULL, else will need to add a NULL checking for every access of
bpf_dynptr_kernel in kernel.
>  
>  #ifdef CONFIG_BPF_LSM
>  void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index fc08035f14ed..824864ac82d1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1408,7 +1408,7 @@ static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_typ
>  	ptr->size |= type << DYNPTR_TYPE_SHIFT;
>  }
>  
> -static u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
> +u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
>  {
>  	return ptr->size & DYNPTR_SIZE_MASK;
>  }


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers
  2022-09-21  8:38   ` Hou Tao
@ 2022-09-21 21:27     ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 12+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-09-21 21:27 UTC (permalink / raw)
  To: Hou Tao
  Cc: Shmulik Ladkani, bpf, Daniel Borkmann, John Fastabend,
	Joanne Koong, Andrii Nakryiko, Alexei Starovoitov, Paul Chaignon,
	Shmulik Ladkani, kernel test robot

On Wed, 21 Sept 2022 at 10:46, Hou Tao <houtao1@huawei.com> wrote:
>
> Hi,
>
> On 9/11/2022 8:23 PM, Shmulik Ladkani wrote:
> > This allows kernel code dealing with dynptrs obtain dynptr's available
> > size and current (w. proper offset) data pointer.
> >
> > Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> SNIP
> > +
> > +static inline void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr)
> > +{
> > +     return ptr->data ? ptr->data + ptr->offset : NULL;
> > +}
> Have one dummy question here. Is ptr->data == NULL is possible ? According to
> the function prototype of bpf_dynptr_from_mem(), data can not be NULL. And IMO
> in order to simplify the usage of bpf_dynptr_kernel, we need to ensure ptr->data
> should be not NULL, else will need to add a NULL checking for every access of
> bpf_dynptr_kernel in kernel.

Yes, ptr->data can always be NULL. And I think at this point if you're
accepting dynptr from BPF programs, the ship has sailed on ensuring it
is always non-NULL (and honestly I don't know if there's a huge
advantage to it vs the amount of verifier work that would be needed to
enable this).

For an example, see how bpf_ringbuf_reserve_dynptr sets it to NULL on
error. Verifier still tracks it as valid initialized dynptr, but later
operations will simply fail or return without doing anything at
runtime.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 bpf-next 2/4] bpf: Support setting variable-length tunnel options
  2022-09-11 12:23 ` [PATCH v7 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
@ 2022-09-22  0:20   ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2022-09-22  0:20 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: bpf, Daniel Borkmann, John Fastabend, Joanne Koong,
	Alexei Starovoitov, Paul Chaignon, Shmulik Ladkani

On Sun, Sep 11, 2022 at 5:23 AM Shmulik Ladkani
<shmulik@metanetworks.com> wrote:
>
> Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given
> an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer
> size (ARG_CONST_SIZE).
>
> However, in certain cases we wish to set tunnel options of dynamic
> length.
>
> For example, we have an ebpf program that gets geneve options on
> incoming packets, stores them into a map (using a key representing
> the incoming flow), and later needs to assign *same* options to
> reply packets (belonging to same flow).
>
> This is currently imposssible without knowing sender's exact geneve
> options length, which unfortunately is dymamic.
>
> Introduce 'bpf_skb_set_tunnel_opt_dynptr'.
>
> This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic
> pointer (ARG_PTR_TO_DYNPTR) parameter whose data points to the options
> buffer to set.
>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> ---
> v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function
> v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com>
> v6: Remove superfluous 'len' from bpf_skb_set_tunnel_opt_dynptr API
>     (rely on dynptr's internal size), suggested by Andrii Nakryiko <andrii.nakryiko@gmail.com>
> ---
>  include/uapi/linux/bpf.h       | 11 +++++++++++
>  net/core/filter.c              | 31 +++++++++++++++++++++++++++++--
>  tools/include/uapi/linux/bpf.h | 11 +++++++++++
>  3 files changed, 51 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 3df78c56c1bf..ba12f7e1ccb6 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5387,6 +5387,16 @@ union bpf_attr {
>   *     Return
>   *             Current *ktime*.
>   *
> + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt)
> + *     Description
> + *             Set tunnel options metadata for the packet associated to *skb*
> + *             to the option data pointed to by the *opt* dynptr.
> + *
> + *             See also the description of the **bpf_skb_get_tunnel_opt**\ ()
> + *             helper for additional information.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
> + *
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -5598,6 +5608,7 @@ union bpf_attr {
>         FN(tcp_raw_check_syncookie_ipv4),       \
>         FN(tcp_raw_check_syncookie_ipv6),       \
>         FN(ktime_get_tai_ns),           \
> +       FN(skb_set_tunnel_opt_dynptr),  \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e872f45399b0..1c652936ef86 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4674,8 +4674,7 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = {
>         .arg4_type      = ARG_ANYTHING,
>  };
>
> -BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
> -          const u8 *, from, u32, size)
> +static u64 __bpf_skb_set_tunopt(struct sk_buff *skb, const u8 *from, u32 size)
>  {
>         struct ip_tunnel_info *info = skb_tunnel_info(skb);
>         const struct metadata_dst *md = this_cpu_ptr(md_dst);
> @@ -4690,6 +4689,22 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
>         return 0;
>  }
>
> +BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
> +          const u8 *, from, u32, size)
> +{
> +       return __bpf_skb_set_tunopt(skb, from, size);
> +}
> +
> +BPF_CALL_2(bpf_skb_set_tunnel_opt_dynptr, struct sk_buff *, skb,
> +          struct bpf_dynptr_kern *, ptr)
> +{
> +       const u8 *from = bpf_dynptr_get_data(ptr);
> +
> +       if (unlikely(!from))
> +               return -EFAULT;
> +       return __bpf_skb_set_tunopt(skb, from, bpf_dynptr_get_size(ptr));
> +}
> +
>  static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
>         .func           = bpf_skb_set_tunnel_opt,
>         .gpl_only       = false,
> @@ -4699,6 +4714,14 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
>         .arg3_type      = ARG_CONST_SIZE,
>  };
>
> +static const struct bpf_func_proto bpf_skb_set_tunnel_opt_dynptr_proto = {
> +       .func           = bpf_skb_set_tunnel_opt_dynptr,
> +       .gpl_only       = false,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_PTR_TO_CTX,
> +       .arg2_type      = ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL,

being able to accept only DYNPTR_TYPE_LOCAL is quite limiting. We have
RINGBUF type, as well as soon we'll have MALLOC type. Even with SKB
and XDP types there is a linear area that kernel accept directly.

So if feels better to not specify exact type, accept any type, and
then have internal kernel helpers that will return you direct memory
pointer, if it is readable (i.e., for skb/xdp that would mean data
points to linear part).

So basically exactly the same behavior as bpf_dynptr_data() BPF helper.

Also note that dynptr is now CAP_BPF-only, so you might want to make
sure that your new helper is also CAP_BPF-guarded?

> +};
> +
>  static const struct bpf_func_proto *
>  bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
>  {
> @@ -4719,6 +4742,8 @@ bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
>                 return &bpf_skb_set_tunnel_key_proto;
>         case BPF_FUNC_skb_set_tunnel_opt:
>                 return &bpf_skb_set_tunnel_opt_proto;
> +       case BPF_FUNC_skb_set_tunnel_opt_dynptr:
> +               return &bpf_skb_set_tunnel_opt_dynptr_proto;
>         default:
>                 return NULL;
>         }
> @@ -7798,6 +7823,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>         case BPF_FUNC_skb_get_tunnel_opt:
>                 return &bpf_skb_get_tunnel_opt_proto;
>         case BPF_FUNC_skb_set_tunnel_opt:
> +       case BPF_FUNC_skb_set_tunnel_opt_dynptr:
>                 return bpf_get_skb_set_tunnel_proto(func_id);
>         case BPF_FUNC_redirect:
>                 return &bpf_redirect_proto;
> @@ -8145,6 +8171,7 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>         case BPF_FUNC_skb_get_tunnel_opt:
>                 return &bpf_skb_get_tunnel_opt_proto;
>         case BPF_FUNC_skb_set_tunnel_opt:
> +       case BPF_FUNC_skb_set_tunnel_opt_dynptr:
>                 return bpf_get_skb_set_tunnel_proto(func_id);
>         case BPF_FUNC_redirect:
>                 return &bpf_redirect_proto;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 3df78c56c1bf..ba12f7e1ccb6 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5387,6 +5387,16 @@ union bpf_attr {
>   *     Return
>   *             Current *ktime*.
>   *
> + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt)
> + *     Description
> + *             Set tunnel options metadata for the packet associated to *skb*
> + *             to the option data pointed to by the *opt* dynptr.
> + *
> + *             See also the description of the **bpf_skb_get_tunnel_opt**\ ()
> + *             helper for additional information.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
> + *
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -5598,6 +5608,7 @@ union bpf_attr {
>         FN(tcp_raw_check_syncookie_ipv4),       \
>         FN(tcp_raw_check_syncookie_ipv6),       \
>         FN(ktime_get_tai_ns),           \
> +       FN(skb_set_tunnel_opt_dynptr),  \
>         /* */
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> --
> 2.37.3
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-09-22  0:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-11 12:23 [PATCH v7 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
2022-09-11 12:23 ` [PATCH v7 bpf-next 1/4] bpf: Export 'bpf_dynptr_get_data, bpf_dynptr_get_size' helpers Shmulik Ladkani
2022-09-20  2:32   ` Yonghong Song
2022-09-21  8:38   ` Hou Tao
2022-09-21 21:27     ` Kumar Kartikeya Dwivedi
2022-09-11 12:23 ` [PATCH v7 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
2022-09-22  0:20   ` Andrii Nakryiko
2022-09-11 12:23 ` [PATCH v7 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
2022-09-11 12:23 ` [PATCH v7 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
2022-09-20  2:58   ` Yonghong Song
2022-09-20  5:22     ` Shmulik Ladkani
2022-09-21  6:11       ` Yonghong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.