bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 bpf-next 0/4] bpf: Support setting variable-length tunnel options
@ 2022-08-24  4:41 Shmulik Ladkani
  2022-08-24  4:41 ` [PATCH v5 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Shmulik Ladkani @ 2022-08-24  4:41 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Joanne Koong
  Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

Introduce 'skb_set_tunnel_opt_dynptr' to allow setting tunnel options of
dynamic length.

v2:
- Place test_tunnel's local route in a custom table, to ensure the IP
  isn't considered assigned to a device.
v3:
- Avoid 'inline' for the __bpf_skb_set_tunopt helper function
v4:
- change API to be based on bpf_dynptr,
  suggested by John Fastabend <john.fastabend@gmail.com>
v5:
- fix bpf_dynptr_get_data's incorrect usage of bpf_dynptr_kern's size
  spotted by Joanne Koong <joannelkoong@gmail.com>

Shmulik Ladkani (4):
  bpf: Add 'bpf_dynptr_get_data' helper
  bpf: Support setting variable-length tunnel options
  selftests/bpf: Simplify test_tunnel setup for allowing non-local
    tunnel traffic
  selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case
    to test_progs

 include/linux/bpf.h                           |   1 +
 include/uapi/linux/bpf.h                      |  12 ++
 kernel/bpf/helpers.c                          |   8 +
 net/core/filter.c                             |  36 +++-
 tools/include/uapi/linux/bpf.h                |  12 ++
 .../selftests/bpf/prog_tests/test_tunnel.c    | 131 ++++++++++--
 .../selftests/bpf/progs/test_tunnel_kern.c    | 200 ++++++++++++------
 7 files changed, 320 insertions(+), 80 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper
  2022-08-24  4:41 [PATCH v5 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
@ 2022-08-24  4:41 ` Shmulik Ladkani
  2022-08-25 16:14   ` John Fastabend
  2022-08-24  4:41 ` [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Shmulik Ladkani @ 2022-08-24  4:41 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Joanne Koong
  Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

The task of calculating bpf_dynptr_kern's available size, and the
current (offset) data pointer is common for BPF functions working with
ARG_PTR_TO_DYNPTR parameters.

Introduce 'bpf_dynptr_get_data' which returns the current data
(with properer offset), and the number of usable bytes it has.

This will void callers from directly calculating bpf_dynptr_kern's
data, offset and size.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
v5:
- fix bpf_dynptr_get_data's incorrect usage of bpf_dynptr_kern's size
  spotted by Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/bpf.h  | 1 +
 kernel/bpf/helpers.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 99fc7a64564f..d84d37bda87f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2577,6 +2577,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
 		     enum bpf_dynptr_type type, u32 offset, u32 size);
 void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
 int bpf_dynptr_check_size(u32 size);
+void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr, u32 *avail_bytes);
 
 #ifdef CONFIG_BPF_LSM
 void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index fc08035f14ed..96ff93941cae 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1427,6 +1427,14 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
 	bpf_dynptr_set_type(ptr, type);
 }
 
+void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr, u32 *avail_bytes)
+{
+	if (!ptr->data)
+		return NULL;
+	*avail_bytes = bpf_dynptr_get_size(ptr);
+	return ptr->data + ptr->offset;
+}
+
 void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
 {
 	memset(ptr, 0, sizeof(*ptr));
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options
  2022-08-24  4:41 [PATCH v5 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
  2022-08-24  4:41 ` [PATCH v5 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
@ 2022-08-24  4:41 ` Shmulik Ladkani
  2022-08-25 18:20   ` Andrii Nakryiko
  2022-08-24  4:41 ` [PATCH v5 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
  2022-08-24  4:41 ` [PATCH v5 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
  3 siblings, 1 reply; 12+ messages in thread
From: Shmulik Ladkani @ 2022-08-24  4:41 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Joanne Koong
  Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given
an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer
size (ARG_CONST_SIZE).

However, in certain cases we wish to set tunnel options of dynamic
length.

For example, we have an ebpf program that gets geneve options on
incoming packets, stores them into a map (using a key representing
the incoming flow), and later needs to assign *same* options to
reply packets (belonging to same flow).

This is currently imposssible without knowing sender's exact geneve
options length, which unfortunately is dymamic.

Introduce 'bpf_skb_set_tunnel_opt_dynptr'.

This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic
pointer (ARG_PTR_TO_DYNPTR) parameter 'ptr' whose data points to the
options buffer, and 'len', the byte length of options data caller wishes
to copy into ip_tunnnel_info.
'len' must never exceed the dynptr's internal size, o/w EINVAL is
returned.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function
v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com>
---
 include/uapi/linux/bpf.h       | 12 ++++++++++++
 net/core/filter.c              | 36 ++++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 12 ++++++++++++
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 644600dbb114..c7b313e30635 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5367,6 +5367,17 @@ union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)
+ *	Description
+ *		Set tunnel options metadata for the packet associated to *skb*
+ *		to the variable length *len* bytes of option data pointed to
+ *		by the *opt* dynptr.
+ *
+ *		See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ *		helper for additional information.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5578,6 +5589,7 @@ union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(skb_set_tunnel_opt_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index 63e25d8ce501..1407c87ba6ac 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4669,8 +4669,7 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
-BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
-	   const u8 *, from, u32, size)
+static u64 __bpf_skb_set_tunopt(struct sk_buff *skb, const u8 *from, u32 size)
 {
 	struct ip_tunnel_info *info = skb_tunnel_info(skb);
 	const struct metadata_dst *md = this_cpu_ptr(md_dst);
@@ -4685,6 +4684,26 @@ BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
 	return 0;
 }
 
+BPF_CALL_3(bpf_skb_set_tunnel_opt, struct sk_buff *, skb,
+	   const u8 *, from, u32, size)
+{
+	return __bpf_skb_set_tunopt(skb, from, size);
+}
+
+BPF_CALL_3(bpf_skb_set_tunnel_opt_dynptr, struct sk_buff *, skb,
+	   struct bpf_dynptr_kern *, ptr, u32, len)
+{
+	const u8 *from;
+	u32 avail;
+
+	from = bpf_dynptr_get_data(ptr, &avail);
+	if (unlikely(!from))
+		return -EFAULT;
+	if (unlikely(len > avail))
+		return -EINVAL;
+	return __bpf_skb_set_tunopt(skb, from, len);
+}
+
 static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
 	.func		= bpf_skb_set_tunnel_opt,
 	.gpl_only	= false,
@@ -4694,6 +4713,15 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
 	.arg3_type	= ARG_CONST_SIZE,
 };
 
+static const struct bpf_func_proto bpf_skb_set_tunnel_opt_dynptr_proto = {
+	.func		= bpf_skb_set_tunnel_opt_dynptr,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_DYNPTR | DYNPTR_TYPE_LOCAL,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
 {
@@ -4714,6 +4742,8 @@ bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
 		return &bpf_skb_set_tunnel_key_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
 		return &bpf_skb_set_tunnel_opt_proto;
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
+		return &bpf_skb_set_tunnel_opt_dynptr_proto;
 	default:
 		return NULL;
 	}
@@ -7808,6 +7838,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_get_tunnel_opt:
 		return &bpf_skb_get_tunnel_opt_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
 		return bpf_get_skb_set_tunnel_proto(func_id);
 	case BPF_FUNC_redirect:
 		return &bpf_redirect_proto;
@@ -8155,6 +8186,7 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_get_tunnel_opt:
 		return &bpf_skb_get_tunnel_opt_proto;
 	case BPF_FUNC_skb_set_tunnel_opt:
+	case BPF_FUNC_skb_set_tunnel_opt_dynptr:
 		return bpf_get_skb_set_tunnel_proto(func_id);
 	case BPF_FUNC_redirect:
 		return &bpf_redirect_proto;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4fb685591035..6a032a8a5fef 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5367,6 +5367,17 @@ union bpf_attr {
  *	Return
  *		Current *ktime*.
  *
+ * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)
+ *	Description
+ *		Set tunnel options metadata for the packet associated to *skb*
+ *		to the variable length *len* bytes of option data pointed to
+ *		by the *opt* dynptr.
+ *
+ *		See also the description of the **bpf_skb_get_tunnel_opt**\ ()
+ *		helper for additional information.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5578,6 +5589,7 @@ union bpf_attr {
 	FN(tcp_raw_check_syncookie_ipv4),	\
 	FN(tcp_raw_check_syncookie_ipv6),	\
 	FN(ktime_get_tai_ns),		\
+	FN(skb_set_tunnel_opt_dynptr),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic
  2022-08-24  4:41 [PATCH v5 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
  2022-08-24  4:41 ` [PATCH v5 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
  2022-08-24  4:41 ` [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
@ 2022-08-24  4:41 ` Shmulik Ladkani
  2022-08-24  4:41 ` [PATCH v5 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
  3 siblings, 0 replies; 12+ messages in thread
From: Shmulik Ladkani @ 2022-08-24  4:41 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Joanne Koong
  Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

Commit 1115169f47ae ("selftests/bpf: Don't assign outer source IP to host")
removed the secondary IP (IP4_ADDR2_VETH1) assigned to veth1, in order
to test bpf_skb_set_tunnel_key's functionality when tunnel destination
isn't assigned to an interface.

The chosen setup for testing the "tunnel to unassigned outer IP"
scenario was rather complex: (1) static ARP entries in order to
bypass ARP (o/w requests will fail as the target address isn't assigned
locally), and (2) a BPF program running on veth1 ingress which
manipulates the IP header's daddr to the actual IP assigned to the
interface (o/w tunnel traffic won't be accepted locally).

This is complex, and adds a dependency on this hidden "dnat"-like eBPF
program, that needs to be replicated when new tunnel tests are added.

Instead, we can have a much simpler setup: Add the secondary IP as a
*local route* in a table pointed by a custom fib rule. No static arp
entries are needed, and the special eBPF program that "dnats" the outer
destination can be removed.

This commit is a revert of 1115169f47ae, with the addition of the local
route of IP4_ADDR2_VETH1 (instead of the original address assignment).

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
v2: Place the local route for the secondary IP in a custom table
    pointed by a custom fib rule; this ensures the IP is not considered
    assigned to a device.
---
 .../selftests/bpf/prog_tests/test_tunnel.c    | 23 ++----
 .../selftests/bpf/progs/test_tunnel_kern.c    | 80 +++----------------
 2 files changed, 17 insertions(+), 86 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index eea274110267..852da04ff281 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -82,7 +82,6 @@
 
 #define MAC_TUNL_DEV0 "52:54:00:d9:01:00"
 #define MAC_TUNL_DEV1 "52:54:00:d9:02:00"
-#define MAC_VETH1 "52:54:00:d9:03:00"
 
 #define VXLAN_TUNL_DEV0 "vxlan00"
 #define VXLAN_TUNL_DEV1 "vxlan11"
@@ -109,9 +108,15 @@
 static int config_device(void)
 {
 	SYS("ip netns add at_ns0");
-	SYS("ip link add veth0 address " MAC_VETH1 " type veth peer name veth1");
+	SYS("ip link add veth0 type veth peer name veth1");
 	SYS("ip link set veth0 netns at_ns0");
 	SYS("ip addr add " IP4_ADDR1_VETH1 "/24 dev veth1");
+	/* Create a custom rule routing IP4_ADDR2_VETH1 as local.
+	 * Do not place it in "local" table, to avoid this IP being considered
+	 * assigned to a device.
+	 */
+	SYS("ip rule add to " IP4_ADDR2_VETH1 " table 20");
+	SYS("ip route add local " IP4_ADDR2_VETH1 "/32 dev veth1 table 20");
 	SYS("ip link set dev veth1 up mtu 1500");
 	SYS("ip netns exec at_ns0 ip addr add " IP4_ADDR_VETH0 "/24 dev veth0");
 	SYS("ip netns exec at_ns0 ip link set dev veth0 up mtu 1500");
@@ -125,6 +130,7 @@ static void cleanup(void)
 {
 	SYS_NOFAIL("test -f /var/run/netns/at_ns0 && ip netns delete at_ns0");
 	SYS_NOFAIL("ip link del veth1 2> /dev/null");
+	SYS_NOFAIL("ip rule del to %s table 20 2> /dev/null", IP4_ADDR2_VETH1);
 	SYS_NOFAIL("ip link del %s 2> /dev/null", VXLAN_TUNL_DEV1);
 	SYS_NOFAIL("ip link del %s 2> /dev/null", IP6VXLAN_TUNL_DEV1);
 }
@@ -140,8 +146,6 @@ static int add_vxlan_tunnel(void)
 	    VXLAN_TUNL_DEV0, IP4_ADDR_TUNL_DEV0);
 	SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev %s",
 	    IP4_ADDR_TUNL_DEV1, MAC_TUNL_DEV1, VXLAN_TUNL_DEV0);
-	SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev veth0",
-	    IP4_ADDR2_VETH1, MAC_VETH1);
 
 	/* root namespace */
 	SYS("ip link add dev %s type vxlan external gbp dstport 4789",
@@ -279,17 +283,6 @@ static void test_vxlan_tunnel(void)
 	if (attach_tc_prog(&tc_hook, get_src_prog_fd, set_src_prog_fd))
 		goto done;
 
-	/* load and attach bpf prog to veth dev tc hook point */
-	ifindex = if_nametoindex("veth1");
-	if (!ASSERT_NEQ(ifindex, 0, "veth1 ifindex"))
-		goto done;
-	tc_hook.ifindex = ifindex;
-	set_dst_prog_fd = bpf_program__fd(skel->progs.veth_set_outer_dst);
-	if (!ASSERT_GE(set_dst_prog_fd, 0, "bpf_program__fd"))
-		goto done;
-	if (attach_tc_prog(&tc_hook, set_dst_prog_fd, -1))
-		goto done;
-
 	/* load and attach prog set_md to tunnel dev tc hook point at_ns0 */
 	nstoken = open_netns("at_ns0");
 	if (!ASSERT_OK_PTR(nstoken, "setns src"))
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index df0673c4ecbe..17f2f325b3f3 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -14,24 +14,15 @@
 #include <linux/if_packet.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
-#include <linux/icmp.h>
 #include <linux/types.h>
 #include <linux/socket.h>
 #include <linux/pkt_cls.h>
 #include <linux/erspan.h>
-#include <linux/udp.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_endian.h>
 
 #define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
 
-#define VXLAN_UDP_PORT 4789
-
-/* Only IPv4 address assigned to veth1.
- * 172.16.1.200
- */
-#define ASSIGNED_ADDR_VETH1 0xac1001c8
-
 struct geneve_opt {
 	__be16	opt_class;
 	__u8	type;
@@ -42,11 +33,6 @@ struct geneve_opt {
 	__u8	opt_data[8]; /* hard-coded to 8 byte */
 };
 
-struct vxlanhdr {
-	__be32 vx_flags;
-	__be32 vx_vni;
-} __attribute__((packed));
-
 struct vxlan_metadata {
 	__u32     gbp;
 };
@@ -383,8 +369,14 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
 	int ret;
 	struct bpf_tunnel_key key;
 	struct vxlan_metadata md;
-	__u32 orig_daddr;
 	__u32 index = 0;
+	__u32 *local_ip = NULL;
+
+	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+	if (!local_ip) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
 
 	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
 	if (ret < 0) {
@@ -398,10 +390,11 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
 		return TC_ACT_SHOT;
 	}
 
-	if (key.local_ipv4 != ASSIGNED_ADDR_VETH1 || md.gbp != 0x800FF) {
+	if (key.local_ipv4 != *local_ip || md.gbp != 0x800FF) {
 		bpf_printk("vxlan key %d local ip 0x%x remote ip 0x%x gbp 0x%x\n",
 			   key.tunnel_id, key.local_ipv4,
 			   key.remote_ipv4, md.gbp);
+		bpf_printk("local_ip 0x%x\n", *local_ip);
 		log_err(ret);
 		return TC_ACT_SHOT;
 	}
@@ -409,61 +402,6 @@ int vxlan_get_tunnel_src(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
-SEC("tc")
-int veth_set_outer_dst(struct __sk_buff *skb)
-{
-	struct ethhdr *eth = (struct ethhdr *)(long)skb->data;
-	__u32 assigned_ip = bpf_htonl(ASSIGNED_ADDR_VETH1);
-	void *data_end = (void *)(long)skb->data_end;
-	struct udphdr *udph;
-	struct iphdr *iph;
-	__u32 index = 0;
-	int ret = 0;
-	int shrink;
-	__s64 csum;
-
-	if ((void *)eth + sizeof(*eth) > data_end) {
-		log_err(ret);
-		return TC_ACT_SHOT;
-	}
-
-	if (eth->h_proto != bpf_htons(ETH_P_IP))
-		return TC_ACT_OK;
-
-	iph = (struct iphdr *)(eth + 1);
-	if ((void *)iph + sizeof(*iph) > data_end) {
-		log_err(ret);
-		return TC_ACT_SHOT;
-	}
-	if (iph->protocol != IPPROTO_UDP)
-		return TC_ACT_OK;
-
-	udph = (struct udphdr *)(iph + 1);
-	if ((void *)udph + sizeof(*udph) > data_end) {
-		log_err(ret);
-		return TC_ACT_SHOT;
-	}
-	if (udph->dest != bpf_htons(VXLAN_UDP_PORT))
-		return TC_ACT_OK;
-
-	if (iph->daddr != assigned_ip) {
-		csum = bpf_csum_diff(&iph->daddr, sizeof(__u32), &assigned_ip,
-				     sizeof(__u32), 0);
-		if (bpf_skb_store_bytes(skb, ETH_HLEN + offsetof(struct iphdr, daddr),
-					&assigned_ip, sizeof(__u32), 0) < 0) {
-			log_err(ret);
-			return TC_ACT_SHOT;
-		}
-		if (bpf_l3_csum_replace(skb, ETH_HLEN + offsetof(struct iphdr, check),
-					0, csum, 0) < 0) {
-			log_err(ret);
-			return TC_ACT_SHOT;
-		}
-		bpf_skb_change_type(skb, PACKET_HOST);
-	}
-	return TC_ACT_OK;
-}
-
 SEC("tc")
 int ip6vxlan_set_tunnel_dst(struct __sk_buff *skb)
 {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs
  2022-08-24  4:41 [PATCH v5 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
                   ` (2 preceding siblings ...)
  2022-08-24  4:41 ` [PATCH v5 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
@ 2022-08-24  4:41 ` Shmulik Ladkani
  2022-08-25 16:33   ` John Fastabend
  3 siblings, 1 reply; 12+ messages in thread
From: Shmulik Ladkani @ 2022-08-24  4:41 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend, Joanne Koong
  Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

Add geneve test to test_tunnel. The test setup and scheme resembles the
existing vxlan test.

The test also checks variable length tunnel option assignment using
bpf_skb_set_tunnel_opt_dynptr.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 .../selftests/bpf/prog_tests/test_tunnel.c    | 108 ++++++++++++++
 .../selftests/bpf/progs/test_tunnel_kern.c    | 136 ++++++++++++++++++
 2 files changed, 244 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
index 852da04ff281..9aae03c720e9 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c
@@ -87,6 +87,8 @@
 #define VXLAN_TUNL_DEV1 "vxlan11"
 #define IP6VXLAN_TUNL_DEV0 "ip6vxlan00"
 #define IP6VXLAN_TUNL_DEV1 "ip6vxlan11"
+#define GENEVE_TUNL_DEV0 "geneve00"
+#define GENEVE_TUNL_DEV1 "geneve11"
 
 #define PING_ARGS "-i 0.01 -c 3 -w 10 -q"
 
@@ -133,6 +135,38 @@ static void cleanup(void)
 	SYS_NOFAIL("ip rule del to %s table 20 2> /dev/null", IP4_ADDR2_VETH1);
 	SYS_NOFAIL("ip link del %s 2> /dev/null", VXLAN_TUNL_DEV1);
 	SYS_NOFAIL("ip link del %s 2> /dev/null", IP6VXLAN_TUNL_DEV1);
+	SYS_NOFAIL("ip link del %s 2> /dev/null", GENEVE_TUNL_DEV1);
+}
+
+static int add_geneve_tunnel(void)
+{
+	/* at_ns0 namespace */
+	SYS("ip netns exec at_ns0 ip link add dev %s type geneve external",
+	    GENEVE_TUNL_DEV0);
+	SYS("ip netns exec at_ns0 ip link set dev %s address %s up",
+	    GENEVE_TUNL_DEV0, MAC_TUNL_DEV0);
+	SYS("ip netns exec at_ns0 ip addr add dev %s %s/24",
+	    GENEVE_TUNL_DEV0, IP4_ADDR_TUNL_DEV0);
+	SYS("ip netns exec at_ns0 ip neigh add %s lladdr %s dev %s",
+	    IP4_ADDR_TUNL_DEV1, MAC_TUNL_DEV1, GENEVE_TUNL_DEV0);
+
+	/* root namespace */
+	SYS("ip link add dev %s type geneve external", GENEVE_TUNL_DEV1);
+	SYS("ip link set dev %s address %s up", GENEVE_TUNL_DEV1, MAC_TUNL_DEV1);
+	SYS("ip addr add dev %s %s/24", GENEVE_TUNL_DEV1, IP4_ADDR_TUNL_DEV1);
+	SYS("ip neigh add %s lladdr %s dev %s",
+	    IP4_ADDR_TUNL_DEV0, MAC_TUNL_DEV0, GENEVE_TUNL_DEV1);
+
+	return 0;
+fail:
+	return -1;
+}
+
+static void delete_geneve_tunnel(void)
+{
+	SYS_NOFAIL("ip netns exec at_ns0 ip link delete dev %s",
+		   GENEVE_TUNL_DEV0);
+	SYS_NOFAIL("ip link delete dev %s", GENEVE_TUNL_DEV1);
 }
 
 static int add_vxlan_tunnel(void)
@@ -248,6 +282,79 @@ static int attach_tc_prog(struct bpf_tc_hook *hook, int igr_fd, int egr_fd)
 	return 0;
 }
 
+static void test_geneve_tunnel(void)
+{
+	struct test_tunnel_kern *skel = NULL;
+	struct nstoken *nstoken;
+	int local_ip_map_fd = -1;
+	int set_src_prog_fd, get_src_prog_fd;
+	int set_dst_prog_fd;
+	int key = 0, ifindex = -1;
+	uint local_ip;
+	int err;
+	DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
+			    .attach_point = BPF_TC_INGRESS);
+
+	/* add genve tunnel */
+	err = add_geneve_tunnel();
+	if (!ASSERT_OK(err, "add geneve tunnel"))
+		goto done;
+
+	/* load and attach bpf prog to tunnel dev tc hook point */
+	skel = test_tunnel_kern__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_tunnel_kern__open_and_load"))
+		goto done;
+	ifindex = if_nametoindex(GENEVE_TUNL_DEV1);
+	if (!ASSERT_NEQ(ifindex, 0, "geneve11 ifindex"))
+		goto done;
+	tc_hook.ifindex = ifindex;
+	get_src_prog_fd = bpf_program__fd(skel->progs.geneve_get_tunnel_src);
+	set_src_prog_fd = bpf_program__fd(skel->progs.geneve_set_tunnel_src);
+	if (!ASSERT_GE(get_src_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (!ASSERT_GE(set_src_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (attach_tc_prog(&tc_hook, get_src_prog_fd, set_src_prog_fd))
+		goto done;
+
+	/* load and attach prog set_md to tunnel dev tc hook point at_ns0 */
+	nstoken = open_netns("at_ns0");
+	if (!ASSERT_OK_PTR(nstoken, "setns src"))
+		goto done;
+	ifindex = if_nametoindex(GENEVE_TUNL_DEV0);
+	if (!ASSERT_NEQ(ifindex, 0, "geneve00 ifindex"))
+		goto done;
+	tc_hook.ifindex = ifindex;
+	set_dst_prog_fd = bpf_program__fd(skel->progs.geneve_set_tunnel_dst);
+	if (!ASSERT_GE(set_dst_prog_fd, 0, "bpf_program__fd"))
+		goto done;
+	if (attach_tc_prog(&tc_hook, -1, set_dst_prog_fd))
+		goto done;
+	close_netns(nstoken);
+
+	/* use veth1 ip 1 as tunnel source ip */
+	local_ip_map_fd = bpf_map__fd(skel->maps.local_ip_map);
+	if (!ASSERT_GE(local_ip_map_fd, 0, "bpf_map__fd"))
+		goto done;
+	local_ip = IP4_ADDR1_HEX_VETH1;
+	err = bpf_map_update_elem(local_ip_map_fd, &key, &local_ip, BPF_ANY);
+	if (!ASSERT_OK(err, "update bpf local_ip_map"))
+		goto done;
+
+	/* ping test */
+	err = test_ping(AF_INET, IP4_ADDR_TUNL_DEV0);
+	if (!ASSERT_OK(err, "test_ping"))
+		goto done;
+
+done:
+	/* delete geneve tunnel */
+	delete_geneve_tunnel();
+	if (local_ip_map_fd >= 0)
+		close(local_ip_map_fd);
+	if (skel)
+		test_tunnel_kern__destroy(skel);
+}
+
 static void test_vxlan_tunnel(void)
 {
 	struct test_tunnel_kern *skel = NULL;
@@ -408,6 +515,7 @@ static void *test_tunnel_run_tests(void *arg)
 
 	RUN_TEST(vxlan_tunnel);
 	RUN_TEST(ip6vxlan_tunnel);
+	RUN_TEST(geneve_tunnel);
 
 	cleanup();
 
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
index 17f2f325b3f3..dfd274d2f65d 100644
--- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c
@@ -23,6 +23,17 @@
 
 #define log_err(__ret) bpf_printk("ERROR line:%d ret:%d\n", __LINE__, __ret)
 
+struct tun_opts_raw {
+	__u8 data[64];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, struct tun_opts_raw);
+} geneve_opts SEC(".maps");
+
 struct geneve_opt {
 	__be16	opt_class;
 	__u8	type;
@@ -285,6 +296,131 @@ int ip4ip6erspan_get_tunnel(struct __sk_buff *skb)
 	return TC_ACT_OK;
 }
 
+SEC("tc")
+int geneve_set_tunnel_dst(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct tun_opts_raw *opts;
+	struct bpf_dynptr dptr;
+	__u32 index = 0;
+	__u32 *local_ip = NULL;
+	int opts_len;
+
+	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+	if (!local_ip) {
+		log_err(-1);
+		return TC_ACT_SHOT;
+	}
+
+	index = 0;
+	opts = bpf_map_lookup_elem(&geneve_opts, &index);
+	if (!opts) {
+		log_err(-1);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.local_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.remote_ipv4 = *local_ip;
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(opts, 0x0, sizeof(*opts));
+	bpf_dynptr_from_mem(opts, sizeof(*opts), 0, &dptr);
+	/* dynamic number of empty geneve options (4 bytes each).
+	 * total len capped at sizeof(*opts) and is multiple of 4
+	 */
+	opts_len = (skb->len % sizeof(*opts)) & ~(sizeof(__u32) - 1);
+	ret = bpf_skb_set_tunnel_opt_dynptr(skb, &dptr, opts_len);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("tc")
+int geneve_set_tunnel_src(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	__u32 index = 0;
+	__u32 *local_ip = NULL;
+
+	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+	if (!local_ip) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	__builtin_memset(&key, 0x0, sizeof(key));
+	key.local_ipv4 = *local_ip;
+	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
+	key.tunnel_id = 2;
+	key.tunnel_tos = 0;
+	key.tunnel_ttl = 64;
+
+	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
+				     BPF_F_ZERO_CSUM_TX);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("tc")
+int geneve_get_tunnel_src(struct __sk_buff *skb)
+{
+	int ret;
+	struct bpf_tunnel_key key;
+	struct tun_opts_raw opts;
+	int expected_opts_len;
+	__u32 index = 0;
+	__u32 *local_ip = NULL;
+
+	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
+	if (!local_ip) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	ret = bpf_skb_get_tunnel_opt(skb, &opts, sizeof(opts));
+	if (ret < 0) {
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	expected_opts_len = (skb->len % sizeof(opts)) & ~(sizeof(__u32) - 1);
+	if (key.local_ipv4 != *local_ip || ret != expected_opts_len) {
+		bpf_printk("geneve key %d local ip 0x%x remote ip 0x%x opts_len %d\n",
+			   key.tunnel_id, key.local_ipv4,
+			   key.remote_ipv4, ret);
+		bpf_printk("local_ip 0x%x\n", *local_ip);
+		log_err(ret);
+		return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
 SEC("tc")
 int vxlan_set_tunnel_dst(struct __sk_buff *skb)
 {
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH v5 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper
  2022-08-24  4:41 ` [PATCH v5 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
@ 2022-08-25 16:14   ` John Fastabend
  2022-08-30 16:57     ` Shmulik Ladkani
  0 siblings, 1 reply; 12+ messages in thread
From: John Fastabend @ 2022-08-25 16:14 UTC (permalink / raw)
  To: Shmulik Ladkani, bpf, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Joanne Koong
  Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

Shmulik Ladkani wrote:
> The task of calculating bpf_dynptr_kern's available size, and the
> current (offset) data pointer is common for BPF functions working with
> ARG_PTR_TO_DYNPTR parameters.
> 
> Introduce 'bpf_dynptr_get_data' which returns the current data
> (with properer offset), and the number of usable bytes it has.
> 
> This will void callers from directly calculating bpf_dynptr_kern's
> data, offset and size.
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
> v5:
> - fix bpf_dynptr_get_data's incorrect usage of bpf_dynptr_kern's size
>   spotted by Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/bpf.h  | 1 +
>  kernel/bpf/helpers.c | 8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 99fc7a64564f..d84d37bda87f 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2577,6 +2577,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
>  		     enum bpf_dynptr_type type, u32 offset, u32 size);
>  void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
>  int bpf_dynptr_check_size(u32 size);
> +void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr, u32 *avail_bytes);
>  
>  #ifdef CONFIG_BPF_LSM
>  void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index fc08035f14ed..96ff93941cae 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1427,6 +1427,14 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
>  	bpf_dynptr_set_type(ptr, type);
>  }
>  
> +void *bpf_dynptr_get_data(struct bpf_dynptr_kern *ptr, u32 *avail_bytes)
> +{
> +	if (!ptr->data)
> +		return NULL;
> +	*avail_bytes = bpf_dynptr_get_size(ptr);
> +	return ptr->data + ptr->offset;
> +}
> +
>  void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
>  {
>  	memset(ptr, 0, sizeof(*ptr));
> -- 
> 2.37.2
> 

As a bit of an addmitedly nitpick I just wonder if having the avail_bytes
passed through like this is much use anymore? For example,

+BPF_CALL_3(bpf_skb_set_tunnel_opt_dynptr, struct sk_buff *, skb,
+	   struct bpf_dynptr_kern *, ptr, u32, len)
+{
+	const u8 *from;
+	u32 avail;
+
-       if (!ptr->data)
-		return -EFAULT;
-       avail = bpf_dynptr_get_size(ptr)
+	from = bpf_dynptr_get_data(ptr, &avail);
+	if (unlikely(len > avail))
+		return -EINVAL;
+	return __bpf_skb_set_tunopt(skb, from, len);
+}
+

seems just about as compact to me and then drop the null check from the
helper so we have a bpf_dynptr_get_data(*ptr) that just does the
data+offset arithmatic. Then it could also be used in a few other
spots where that calculation seems common.

I find it easier to read at least and think the helper would get
more use, also land it in one of the .h files. And avoids bouncing
avail around.

Bit of a gripe but what do you think?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v5 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs
  2022-08-24  4:41 ` [PATCH v5 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
@ 2022-08-25 16:33   ` John Fastabend
  0 siblings, 0 replies; 12+ messages in thread
From: John Fastabend @ 2022-08-25 16:33 UTC (permalink / raw)
  To: Shmulik Ladkani, bpf, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Joanne Koong
  Cc: Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

Shmulik Ladkani wrote:
> Add geneve test to test_tunnel. The test setup and scheme resembles the
> existing vxlan test.
> 
> The test also checks variable length tunnel option assignment using
> bpf_skb_set_tunnel_opt_dynptr.
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/test_tunnel.c    | 108 ++++++++++++++
>  .../selftests/bpf/progs/test_tunnel_kern.c    | 136 ++++++++++++++++++
>  2 files changed, 244 insertions(+)

Overall lgtm couple missing ret codes.

[...]

> +
> +SEC("tc")
> +int geneve_set_tunnel_src(struct __sk_buff *skb)
> +{
> +	int ret;
> +	struct bpf_tunnel_key key;
> +	__u32 index = 0;
> +	__u32 *local_ip = NULL;
> +
> +	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
> +	if (!local_ip) {
> +		log_err(ret);

log_err(-1)?

> +		return TC_ACT_SHOT;
> +	}
> +
> +	__builtin_memset(&key, 0x0, sizeof(key));
> +	key.local_ipv4 = *local_ip;
> +	key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */
> +	key.tunnel_id = 2;
> +	key.tunnel_tos = 0;
> +	key.tunnel_ttl = 64;
> +
> +	ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key),
> +				     BPF_F_ZERO_CSUM_TX);
> +	if (ret < 0) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	return TC_ACT_OK;
> +}
> +
> +SEC("tc")
> +int geneve_get_tunnel_src(struct __sk_buff *skb)
> +{
> +	int ret;
> +	struct bpf_tunnel_key key;
> +	struct tun_opts_raw opts;
> +	int expected_opts_len;
> +	__u32 index = 0;
> +	__u32 *local_ip = NULL;
> +
> +	local_ip = bpf_map_lookup_elem(&local_ip_map, &index);
> +	if (!local_ip) {
> +		log_err(ret);

log_err(-1)

> +		return TC_ACT_SHOT;
> +	}
> +
> +	ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
> +	if (ret < 0) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	ret = bpf_skb_get_tunnel_opt(skb, &opts, sizeof(opts));
> +	if (ret < 0) {
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	expected_opts_len = (skb->len % sizeof(opts)) & ~(sizeof(__u32) - 1);
> +	if (key.local_ipv4 != *local_ip || ret != expected_opts_len) {
> +		bpf_printk("geneve key %d local ip 0x%x remote ip 0x%x opts_len %d\n",
> +			   key.tunnel_id, key.local_ipv4,
> +			   key.remote_ipv4, ret);

In general I don't really like the printks. But ok.

> +		bpf_printk("local_ip 0x%x\n", *local_ip);
> +		log_err(ret);
> +		return TC_ACT_SHOT;
> +	}
> +
> +	return TC_ACT_OK;
> +}
> +
>  SEC("tc")
>  int vxlan_set_tunnel_dst(struct __sk_buff *skb)
>  {
> -- 
> 2.37.2
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options
  2022-08-24  4:41 ` [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
@ 2022-08-25 18:20   ` Andrii Nakryiko
  2022-08-30 20:02     ` Shmulik Ladkani
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2022-08-25 18:20 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Joanne Koong, Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

On Tue, Aug 23, 2022 at 9:41 PM Shmulik Ladkani
<shmulik@metanetworks.com> wrote:
>
> Existing 'bpf_skb_set_tunnel_opt' allows setting tunnel options given
> an option buffer (ARG_PTR_TO_MEM) and the compile-time fixed buffer
> size (ARG_CONST_SIZE).
>
> However, in certain cases we wish to set tunnel options of dynamic
> length.
>
> For example, we have an ebpf program that gets geneve options on
> incoming packets, stores them into a map (using a key representing
> the incoming flow), and later needs to assign *same* options to
> reply packets (belonging to same flow).
>
> This is currently imposssible without knowing sender's exact geneve
> options length, which unfortunately is dymamic.
>
> Introduce 'bpf_skb_set_tunnel_opt_dynptr'.
>
> This is a variant of 'bpf_skb_set_tunnel_opt' which gets a bpf dynamic
> pointer (ARG_PTR_TO_DYNPTR) parameter 'ptr' whose data points to the
> options buffer, and 'len', the byte length of options data caller wishes
> to copy into ip_tunnnel_info.
> 'len' must never exceed the dynptr's internal size, o/w EINVAL is
> returned.
>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
> v3: Avoid 'inline' for the __bpf_skb_set_tunopt helper function
> v4: change API to be based on bpf_dynptr, suggested by John Fastabend <john.fastabend@gmail.com>
> ---
>  include/uapi/linux/bpf.h       | 12 ++++++++++++
>  net/core/filter.c              | 36 ++++++++++++++++++++++++++++++++--
>  tools/include/uapi/linux/bpf.h | 12 ++++++++++++
>  3 files changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 644600dbb114..c7b313e30635 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5367,6 +5367,17 @@ union bpf_attr {
>   *     Return
>   *             Current *ktime*.
>   *
> + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)

why can't we rely on dynptr's len instead of specifying extra one
here? dynptr is a range of memory, so just specify that you take that
entire range?

And then we'll have (or partially already have) generic dynptr helpers
to adjust internal dynptr offset and len.

> + *     Description
> + *             Set tunnel options metadata for the packet associated to *skb*
> + *             to the variable length *len* bytes of option data pointed to
> + *             by the *opt* dynptr.
> + *
> + *             See also the description of the **bpf_skb_get_tunnel_opt**\ ()
> + *             helper for additional information.
> + *     Return
> + *             0 on success, or a negative error in case of failure.
> + *
>   */

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper
  2022-08-25 16:14   ` John Fastabend
@ 2022-08-30 16:57     ` Shmulik Ladkani
  0 siblings, 0 replies; 12+ messages in thread
From: Shmulik Ladkani @ 2022-08-30 16:57 UTC (permalink / raw)
  To: John Fastabend
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Joanne Koong,
	Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

On Thu, 25 Aug 2022 09:14:50 -0700
John Fastabend <john.fastabend@gmail.com> wrote:

> As a bit of an addmitedly nitpick I just wonder if having the avail_bytes
> passed through like this is much use anymore? For example,
> 
> +BPF_CALL_3(bpf_skb_set_tunnel_opt_dynptr, struct sk_buff *, skb,
> +	   struct bpf_dynptr_kern *, ptr, u32, len)
> +{
> +	const u8 *from;
> +	u32 avail;
> +
> -       if (!ptr->data)
> -		return -EFAULT;
> -       avail = bpf_dynptr_get_size(ptr)
> +	from = bpf_dynptr_get_data(ptr, &avail);
> +	if (unlikely(len > avail))
> +		return -EINVAL;
> +	return __bpf_skb_set_tunopt(skb, from, len);
> +}
> +
> 
> seems just about as compact to me and then drop the null check from the
> helper so we have a bpf_dynptr_get_data(*ptr) that just does the
> data+offset arithmatic. Then it could also be used in a few other
> spots where that calculation seems common.
> 
> I find it easier to read at least and think the helper would get
> more use, also land it in one of the .h files. And avoids bouncing
> avail around.
> 
> Bit of a gripe but what do you think?

Sure, I don't mind. 

My rationale extracting 'avail' was due to the fact the combo of 
"if !ptr->data + bpf_dynptr_get_size + bpf_dynptr_get_data" will be
repeated in future locations that need to access the stored data.
Therefore encapsulating this looked beneficial.

BTW, note we'll need to make bpf_dynptr_get_size public (currently it's
static).

Let me know your preference, I'm fine either way.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options
  2022-08-25 18:20   ` Andrii Nakryiko
@ 2022-08-30 20:02     ` Shmulik Ladkani
  2022-08-30 20:13       ` Shmulik Ladkani
  0 siblings, 1 reply; 12+ messages in thread
From: Shmulik Ladkani @ 2022-08-30 20:02 UTC (permalink / raw)
  To: Andrii Nakryiko, Joanne Koong
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

On Thu, 25 Aug 2022 11:20:31 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)  
> 
> why can't we rely on dynptr's len instead of specifying extra one
> here? dynptr is a range of memory, so just specify that you take that
> entire range?
> 
> And then we'll have (or partially already have) generic dynptr helpers
> to adjust internal dynptr offset and len.

Alright.

For my usecase I need to use *part* of the tunnel options that were
previously stored as a dynptr.

Therefore I need to introduce a new bpf helper that adjusts the dynptr.

How about this suggestion (sketch, not yet tried):

// adjusts the dynptr to point to *len* bytes starting from the
// specified *offset*

BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
{
	int err;
        u32 size;

	if (!ptr->data)
		return -EINVAL;

	err = bpf_dynptr_check_off_len(ptr, offset, len);
	if (err)
		return err;

	ptr->offset += offset;
	size = bpf_dynptr_get_size(ptr) - len;
	ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size;
	return 0;
}


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options
  2022-08-30 20:02     ` Shmulik Ladkani
@ 2022-08-30 20:13       ` Shmulik Ladkani
  2022-08-30 23:35         ` Joanne Koong
  0 siblings, 1 reply; 12+ messages in thread
From: Shmulik Ladkani @ 2022-08-30 20:13 UTC (permalink / raw)
  To: Andrii Nakryiko, Joanne Koong
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

On Tue, 30 Aug 2022 23:02:57 +0300
Shmulik Ladkani <shmulik@metanetworks.com> wrote:

> On Thu, 25 Aug 2022 11:20:31 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> > > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)    
> > 
> > why can't we rely on dynptr's len instead of specifying extra one
> > here? dynptr is a range of memory, so just specify that you take that
> > entire range?
> > 
> > And then we'll have (or partially already have) generic dynptr helpers
> > to adjust internal dynptr offset and len.  
> 
> Alright.
> 
> For my usecase I need to use *part* of the tunnel options that were
> previously stored as a dynptr.
> 
> Therefore I need to introduce a new bpf helper that adjusts the dynptr.
> 
> How about this suggestion (sketch, not yet tried):
> 
> // adjusts the dynptr to point to *len* bytes starting from the
> // specified *offset*
> 
> BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
> {
> 	int err;
>         u32 size;
> 
> 	if (!ptr->data)
> 		return -EINVAL;
> 
> 	err = bpf_dynptr_check_off_len(ptr, offset, len);
> 	if (err)
> 		return err;
> 
> 	ptr->offset += offset;
> 	size = bpf_dynptr_get_size(ptr) - len;
> 	ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size;
> 	return 0;
> }
> 

Correction, meant:
 	ptr->offset += offset;
 	ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | len;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options
  2022-08-30 20:13       ` Shmulik Ladkani
@ 2022-08-30 23:35         ` Joanne Koong
  0 siblings, 0 replies; 12+ messages in thread
From: Joanne Koong @ 2022-08-30 23:35 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Paul Chaignon, Shmulik Ladkani

On Tue, Aug 30, 2022 at 1:13 PM Shmulik Ladkani
<shmulik@metanetworks.com> wrote:
>
> On Tue, 30 Aug 2022 23:02:57 +0300
> Shmulik Ladkani <shmulik@metanetworks.com> wrote:
>
> > On Thu, 25 Aug 2022 11:20:31 -0700
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > > > + * long bpf_skb_set_tunnel_opt_dynptr(struct sk_buff *skb, struct bpf_dynptr *opt, u32 len)
> > >
> > > why can't we rely on dynptr's len instead of specifying extra one
> > > here? dynptr is a range of memory, so just specify that you take that
> > > entire range?
> > >
> > > And then we'll have (or partially already have) generic dynptr helpers
> > > to adjust internal dynptr offset and len.
> >
> > Alright.
> >
> > For my usecase I need to use *part* of the tunnel options that were
> > previously stored as a dynptr.
> >
> > Therefore I need to introduce a new bpf helper that adjusts the dynptr.
> >
> > How about this suggestion (sketch, not yet tried):
> >
> > // adjusts the dynptr to point to *len* bytes starting from the
> > // specified *offset*
> >
> > BPF_CALL_3(bpf_dynptr_slice, struct bpf_dynptr_kern *, ptr, u32, offset, u32, len)
> > {
> >       int err;
> >         u32 size;
> >
> >       if (!ptr->data)
> >               return -EINVAL;
> >
> >       err = bpf_dynptr_check_off_len(ptr, offset, len);
> >       if (err)
> >               return err;
> >
> >       ptr->offset += offset;
> >       size = bpf_dynptr_get_size(ptr) - len;
> >       ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | size;
> >       return 0;
> > }
> >
>
> Correction, meant:
>         ptr->offset += offset;
>         ptr->size = (ptr->size & ~(u32)DYNPTR_SIZE_MASK) | len;
>

I have a patchset for dynptr convenience helpers that I plan to tidy
up and push out later this week or mid-next week. it includes
"bpf_dynptr_advance()" and "bpf_dynptr_trim()", which will let you
adjust the dynptr ("_advance()" advances the offset, "_trim()" trims
the size)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-30 23:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24  4:41 [PATCH v5 bpf-next 0/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
2022-08-24  4:41 ` [PATCH v5 bpf-next 1/4] bpf: Add 'bpf_dynptr_get_data' helper Shmulik Ladkani
2022-08-25 16:14   ` John Fastabend
2022-08-30 16:57     ` Shmulik Ladkani
2022-08-24  4:41 ` [PATCH v5 bpf-next 2/4] bpf: Support setting variable-length tunnel options Shmulik Ladkani
2022-08-25 18:20   ` Andrii Nakryiko
2022-08-30 20:02     ` Shmulik Ladkani
2022-08-30 20:13       ` Shmulik Ladkani
2022-08-30 23:35         ` Joanne Koong
2022-08-24  4:41 ` [PATCH v5 bpf-next 3/4] selftests/bpf: Simplify test_tunnel setup for allowing non-local tunnel traffic Shmulik Ladkani
2022-08-24  4:41 ` [PATCH v5 bpf-next 4/4] selftests/bpf: Add geneve with bpf_skb_set_tunnel_opt_dynptr test-case to test_progs Shmulik Ladkani
2022-08-25 16:33   ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).