All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
@ 2021-10-06 23:05 Joanne Koong
  2021-10-06 23:05 ` [PATCH bpf-next v2 1/3] bpf/xdp: Add bpf_load_hdr_opt support for xdp Joanne Koong
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Joanne Koong @ 2021-10-06 23:05 UTC (permalink / raw)
  To: bpf; +Cc: kafai, netdev, Kernel-team, Joanne Koong

Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to
parse the tcp header option. It will be useful to allow other bpf prog
types to have a similar way of handling tcp hdr options.

This series adds XDP support for bpf_load_hdr_opt(). At a high level,
these patches are:

1/3 patch - Add functionality for xdp bpf_load_hdr_opt in net/core.

2/3 patch - Rename existing test_tcp_hdr_options to test_sockops_tcp_hdr_options.

3/3 patch - Add tests for xdp bpf_load_hdr_opt (test_xdp_tcp_hdr_options).

v1 -> v2:
* 1/3 - Change "(flags & 0xffffffffffff)" to
(flags & ((1ULL << BPF_LOAD_HDR_OPT_TCP_OFFSET_SHIFT) - 1))
* 3/3 - Skip check_opt_out on failures
* 3/3 - Change CHECKs to ASSERTs

Joanne Koong (3):
  bpf/xdp: Add bpf_load_hdr_opt support for xdp
  bpf/selftests: Rename test_tcp_hdr_options to
    test_sockops_tcp_hdr_options
  bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests

 include/uapi/linux/bpf.h                      |  26 ++-
 net/core/filter.c                             |  88 ++++++--
 tools/include/uapi/linux/bpf.h                |  26 ++-
 ...dr_options.c => sockops_tcp_hdr_options.c} |  18 +-
 .../bpf/prog_tests/xdp_tcp_hdr_options.c      | 144 +++++++++++++
 ....c => test_sockops_misc_tcp_hdr_options.c} |   0
 ...tions.c => test_sockops_tcp_hdr_options.c} |   0
 .../bpf/progs/test_xdp_tcp_hdr_options.c      | 198 ++++++++++++++++++
 8 files changed, 454 insertions(+), 46 deletions(-)
 rename tools/testing/selftests/bpf/prog_tests/{tcp_hdr_options.c => sockops_tcp_hdr_options.c} (96%)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_tcp_hdr_options.c
 rename tools/testing/selftests/bpf/progs/{test_misc_tcp_hdr_options.c => test_sockops_misc_tcp_hdr_options.c} (100%)
 rename tools/testing/selftests/bpf/progs/{test_tcp_hdr_options.c => test_sockops_tcp_hdr_options.c} (100%)
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_tcp_hdr_options.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 1/3] bpf/xdp: Add bpf_load_hdr_opt support for xdp
  2021-10-06 23:05 [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Joanne Koong
@ 2021-10-06 23:05 ` Joanne Koong
  2021-10-06 23:50   ` Song Liu
  2021-10-06 23:05 ` [PATCH bpf-next v2 2/3] bpf/selftests: Rename test_tcp_hdr_options to test_sockops_tcp_hdr_options Joanne Koong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Joanne Koong @ 2021-10-06 23:05 UTC (permalink / raw)
  To: bpf; +Cc: kafai, netdev, Kernel-team, Joanne Koong

This patch enables XDP programs to use the bpf_load_hdr_opt helper
function to load header options.

The upper 16 bits of "flags" is used to denote the offset to the tcp
header. No other flags are, at this time, used by XDP programs.
In the future, more flags can be included to support other types of
header options.

Much of the logic for loading header options can be shared between
sockops and xdp programs. In net/core/filter.c, this common shared
logic is refactored into a separate function both sockops and xdp
use.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 include/uapi/linux/bpf.h       | 26 ++++++----
 net/core/filter.c              | 88 ++++++++++++++++++++++++++--------
 tools/include/uapi/linux/bpf.h | 26 ++++++----
 3 files changed, 103 insertions(+), 37 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6fc59d61937a..2a90e9e5088f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4272,21 +4272,28 @@ union bpf_attr {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
- * long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags)
+ * long bpf_load_hdr_opt(void *ctx, void *searchby_res, u32 len, u64 flags)
  *	Description
- *		Load header option.  Support reading a particular TCP header
- *		option for bpf program (**BPF_PROG_TYPE_SOCK_OPS**).
+ *		Load header option. Support reading a particular TCP header
+ *		option for bpf programs (**BPF_PROG_TYPE_SOCK_OPS** and
+ *		**BPF_PROG_TYPE_XDP**).
  *
- *		If *flags* is 0, it will search the option from the
- *		*skops*\ **->skb_data**.  The comment in **struct bpf_sock_ops**
- *		has details on what skb_data contains under different
- *		*skops*\ **->op**.
+ *		*ctx* is either **struct bpf_sock_ops** for SOCK_OPS programs or
+ *		**struct xdp_md** for XDP programs.
+ *
+ *		For SOCK_OPS programs, if *flags* is 0, it will search the option
+ *		from the *skops*\ **->skb_data**.  The comment in
+ *		**struct bpf_sock_ops** has details on what skb_data contains
+ *		under different *skops*\ **->op**.
+ *
+ *		For XDP programs, the upper 16 bits of **flags** should contain
+ *		the offset to the tcp header.
  *
  *		The first byte of the *searchby_res* specifies the
  *		kind that it wants to search.
  *
  *		If the searching kind is an experimental kind
- *		(i.e. 253 or 254 according to RFC6994).  It also
+ *		(i.e. 253 or 254 according to RFC6994), it also
  *		needs to specify the "magic" which is either
  *		2 bytes or 4 bytes.  It then also needs to
  *		specify the size of the magic by using
@@ -4309,7 +4316,7 @@ union bpf_attr {
  *		*len* must be at least 2 bytes which is the minimal size
  *		of a header option.
  *
- *		Supported flags:
+ *		Supported flags for **SOCK_OPS** programs:
  *
  *		* **BPF_LOAD_HDR_OPT_TCP_SYN** to search from the
  *		  saved_syn packet or the just-received syn packet.
@@ -6018,6 +6025,7 @@ enum {
 
 enum {
 	BPF_LOAD_HDR_OPT_TCP_SYN = (1ULL << 0),
+	BPF_LOAD_HDR_OPT_TCP_OFFSET_SHIFT = 48,
 };
 
 /* args[0] value during BPF_SOCK_OPS_HDR_OPT_LEN_CB and
diff --git a/net/core/filter.c b/net/core/filter.c
index 4bace37a6a44..400ff2ec8ce5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6904,19 +6904,19 @@ static const u8 *bpf_search_tcp_opt(const u8 *op, const u8 *opend,
 	return ERR_PTR(-ENOMSG);
 }
 
-BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
-	   void *, search_res, u32, len, u64, flags)
+static int bpf_load_hdr_opt(const u8 *op, const u8 *opend, void *search_res,
+			    u32 len)
 {
-	bool eol, load_syn = flags & BPF_LOAD_HDR_OPT_TCP_SYN;
-	const u8 *op, *opend, *magic, *search = search_res;
 	u8 search_kind, search_len, copy_len, magic_len;
+	const u8 *magic, *search = search_res;
+	bool eol;
 	int ret;
 
 	/* 2 byte is the minimal option len except TCPOPT_NOP and
 	 * TCPOPT_EOL which are useless for the bpf prog to learn
 	 * and this helper disallow loading them also.
 	 */
-	if (len < 2 || flags & ~BPF_LOAD_HDR_OPT_TCP_SYN)
+	if (len < 2)
 		return -EINVAL;
 
 	search_kind = search[0];
@@ -6939,6 +6939,67 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
 		magic_len = 0;
 	}
 
+	op = bpf_search_tcp_opt(op, opend, search_kind, magic, magic_len,
+				&eol);
+
+	if (IS_ERR(op))
+		return PTR_ERR(op);
+
+	copy_len = op[1];
+	ret = copy_len;
+	if (copy_len > len) {
+		ret = -ENOSPC;
+		copy_len = len;
+	}
+
+	memcpy(search_res, op, copy_len);
+	return ret;
+}
+
+BPF_CALL_4(bpf_xdp_load_hdr_opt, struct xdp_buff *, xdp,
+	   void *, search_res, u32, len, u64, flags)
+{
+	const void *op, *opend;
+	struct tcphdr *th;
+
+	/* The upper 16 bits of flags contain the offset to the tcp header.
+	 * No other bits should be set.
+	 */
+	if (flags & ((1ULL << BPF_LOAD_HDR_OPT_TCP_OFFSET_SHIFT) - 1))
+		return -EINVAL;
+
+	th = xdp->data + (flags >> BPF_LOAD_HDR_OPT_TCP_OFFSET_SHIFT);
+	op = (void *)th + sizeof(struct tcphdr);
+	if (unlikely(op > xdp->data_end))
+		return -EINVAL;
+
+	opend = (void *)th + th->doff * 4;
+	if (unlikely(opend > xdp->data_end))
+		return -EINVAL;
+
+	return bpf_load_hdr_opt(op, opend, search_res, len);
+}
+
+static const struct bpf_func_proto bpf_xdp_load_hdr_opt_proto = {
+	.func		= bpf_xdp_load_hdr_opt,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
+	   void *, search_res, u32, len, u64, flags)
+{
+	bool load_syn = flags & BPF_LOAD_HDR_OPT_TCP_SYN;
+	const u8 *op, *opend;
+	int ret;
+
+	if (flags & ~BPF_LOAD_HDR_OPT_TCP_SYN)
+		return -EINVAL;
+
 	if (load_syn) {
 		ret = bpf_sock_ops_get_syn(bpf_sock, TCP_BPF_SYN, &op);
 		if (ret < 0)
@@ -6956,20 +7017,7 @@ BPF_CALL_4(bpf_sock_ops_load_hdr_opt, struct bpf_sock_ops_kern *, bpf_sock,
 		op = bpf_sock->skb->data + sizeof(struct tcphdr);
 	}
 
-	op = bpf_search_tcp_opt(op, opend, search_kind, magic, magic_len,
-				&eol);
-	if (IS_ERR(op))
-		return PTR_ERR(op);
-
-	copy_len = op[1];
-	ret = copy_len;
-	if (copy_len > len) {
-		ret = -ENOSPC;
-		copy_len = len;
-	}
-
-	memcpy(search_res, op, copy_len);
-	return ret;
+	return bpf_load_hdr_opt(op, opend, search_res, len);
 }
 
 static const struct bpf_func_proto bpf_sock_ops_load_hdr_opt_proto = {
@@ -7488,6 +7536,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_check_syncookie_proto;
 	case BPF_FUNC_tcp_gen_syncookie:
 		return &bpf_tcp_gen_syncookie_proto;
+	case BPF_FUNC_load_hdr_opt:
+		return &bpf_xdp_load_hdr_opt_proto;
 #endif
 	default:
 		return bpf_sk_base_func_proto(func_id);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6fc59d61937a..2a90e9e5088f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4272,21 +4272,28 @@ union bpf_attr {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
- * long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags)
+ * long bpf_load_hdr_opt(void *ctx, void *searchby_res, u32 len, u64 flags)
  *	Description
- *		Load header option.  Support reading a particular TCP header
- *		option for bpf program (**BPF_PROG_TYPE_SOCK_OPS**).
+ *		Load header option. Support reading a particular TCP header
+ *		option for bpf programs (**BPF_PROG_TYPE_SOCK_OPS** and
+ *		**BPF_PROG_TYPE_XDP**).
  *
- *		If *flags* is 0, it will search the option from the
- *		*skops*\ **->skb_data**.  The comment in **struct bpf_sock_ops**
- *		has details on what skb_data contains under different
- *		*skops*\ **->op**.
+ *		*ctx* is either **struct bpf_sock_ops** for SOCK_OPS programs or
+ *		**struct xdp_md** for XDP programs.
+ *
+ *		For SOCK_OPS programs, if *flags* is 0, it will search the option
+ *		from the *skops*\ **->skb_data**.  The comment in
+ *		**struct bpf_sock_ops** has details on what skb_data contains
+ *		under different *skops*\ **->op**.
+ *
+ *		For XDP programs, the upper 16 bits of **flags** should contain
+ *		the offset to the tcp header.
  *
  *		The first byte of the *searchby_res* specifies the
  *		kind that it wants to search.
  *
  *		If the searching kind is an experimental kind
- *		(i.e. 253 or 254 according to RFC6994).  It also
+ *		(i.e. 253 or 254 according to RFC6994), it also
  *		needs to specify the "magic" which is either
  *		2 bytes or 4 bytes.  It then also needs to
  *		specify the size of the magic by using
@@ -4309,7 +4316,7 @@ union bpf_attr {
  *		*len* must be at least 2 bytes which is the minimal size
  *		of a header option.
  *
- *		Supported flags:
+ *		Supported flags for **SOCK_OPS** programs:
  *
  *		* **BPF_LOAD_HDR_OPT_TCP_SYN** to search from the
  *		  saved_syn packet or the just-received syn packet.
@@ -6018,6 +6025,7 @@ enum {
 
 enum {
 	BPF_LOAD_HDR_OPT_TCP_SYN = (1ULL << 0),
+	BPF_LOAD_HDR_OPT_TCP_OFFSET_SHIFT = 48,
 };
 
 /* args[0] value during BPF_SOCK_OPS_HDR_OPT_LEN_CB and
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 2/3] bpf/selftests: Rename test_tcp_hdr_options to test_sockops_tcp_hdr_options
  2021-10-06 23:05 [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Joanne Koong
  2021-10-06 23:05 ` [PATCH bpf-next v2 1/3] bpf/xdp: Add bpf_load_hdr_opt support for xdp Joanne Koong
@ 2021-10-06 23:05 ` Joanne Koong
  2021-10-06 23:47   ` Song Liu
  2021-10-06 23:05 ` [PATCH bpf-next v2 3/3] bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests Joanne Koong
  2021-10-07 14:41 ` [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Toke Høiland-Jørgensen
  3 siblings, 1 reply; 19+ messages in thread
From: Joanne Koong @ 2021-10-06 23:05 UTC (permalink / raw)
  To: bpf; +Cc: kafai, netdev, Kernel-team, Joanne Koong

Currently, tcp_hdr_options is only supported for sockops type programs.
This patchset adds xdp tcp_hdr_options support. To more easily
differentiate between these two tests, this patch does the following
renames (with  no functional changes):

test_tcp_hdr_options -> test_sockops_tcp_hdr_options
test_misc_tcp_hdr_options -> test_sockops_misc_tcp_hdr_options

The next patch will add xdp_test_tcp_hdr_options.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 ...hdr_options.c => sockops_tcp_hdr_options.c} | 18 +++++++++---------
 ...s.c => test_sockops_misc_tcp_hdr_options.c} |  0
 ...ptions.c => test_sockops_tcp_hdr_options.c} |  0
 3 files changed, 9 insertions(+), 9 deletions(-)
 rename tools/testing/selftests/bpf/prog_tests/{tcp_hdr_options.c => sockops_tcp_hdr_options.c} (96%)
 rename tools/testing/selftests/bpf/progs/{test_misc_tcp_hdr_options.c => test_sockops_misc_tcp_hdr_options.c} (100%)
 rename tools/testing/selftests/bpf/progs/{test_tcp_hdr_options.c => test_sockops_tcp_hdr_options.c} (100%)

diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/sockops_tcp_hdr_options.c
similarity index 96%
rename from tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
rename to tools/testing/selftests/bpf/prog_tests/sockops_tcp_hdr_options.c
index 1fa772079967..f8fb12f4c1ed 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockops_tcp_hdr_options.c
@@ -12,8 +12,8 @@
 #include "cgroup_helpers.h"
 #include "network_helpers.h"
 #include "test_tcp_hdr_options.h"
-#include "test_tcp_hdr_options.skel.h"
-#include "test_misc_tcp_hdr_options.skel.h"
+#include "test_sockops_tcp_hdr_options.skel.h"
+#include "test_sockops_misc_tcp_hdr_options.skel.h"
 
 #define LO_ADDR6 "::1"
 #define CG_NAME "/tcpbpf-hdr-opt-test"
@@ -25,8 +25,8 @@ static struct bpf_test_option exp_active_fin_in;
 static struct hdr_stg exp_passive_hdr_stg;
 static struct hdr_stg exp_active_hdr_stg = { .active = true, };
 
-static struct test_misc_tcp_hdr_options *misc_skel;
-static struct test_tcp_hdr_options *skel;
+static struct test_sockops_misc_tcp_hdr_options *misc_skel;
+static struct test_sockops_tcp_hdr_options *skel;
 static int lport_linum_map_fd;
 static int hdr_stg_map_fd;
 static __u32 duration;
@@ -570,15 +570,15 @@ static struct test tests[] = {
 	DEF_TEST(misc),
 };
 
-void test_tcp_hdr_options(void)
+void test_sockops_tcp_hdr_options(void)
 {
 	int i;
 
-	skel = test_tcp_hdr_options__open_and_load();
+	skel = test_sockops_tcp_hdr_options__open_and_load();
 	if (CHECK(!skel, "open and load skel", "failed"))
 		return;
 
-	misc_skel = test_misc_tcp_hdr_options__open_and_load();
+	misc_skel = test_sockops_misc_tcp_hdr_options__open_and_load();
 	if (CHECK(!misc_skel, "open and load misc test skel", "failed"))
 		goto skel_destroy;
 
@@ -600,6 +600,6 @@ void test_tcp_hdr_options(void)
 
 	close(cg_fd);
 skel_destroy:
-	test_misc_tcp_hdr_options__destroy(misc_skel);
-	test_tcp_hdr_options__destroy(skel);
+	test_sockops_misc_tcp_hdr_options__destroy(misc_skel);
+	test_sockops_tcp_hdr_options__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_sockops_misc_tcp_hdr_options.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
rename to tools/testing/selftests/bpf/progs/test_sockops_misc_tcp_hdr_options.c
diff --git a/tools/testing/selftests/bpf/progs/test_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_sockops_tcp_hdr_options.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/test_tcp_hdr_options.c
rename to tools/testing/selftests/bpf/progs/test_sockops_tcp_hdr_options.c
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next v2 3/3] bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests
  2021-10-06 23:05 [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Joanne Koong
  2021-10-06 23:05 ` [PATCH bpf-next v2 1/3] bpf/xdp: Add bpf_load_hdr_opt support for xdp Joanne Koong
  2021-10-06 23:05 ` [PATCH bpf-next v2 2/3] bpf/selftests: Rename test_tcp_hdr_options to test_sockops_tcp_hdr_options Joanne Koong
@ 2021-10-06 23:05 ` Joanne Koong
  2021-10-06 23:52   ` Song Liu
  2021-10-07 14:41 ` [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Toke Høiland-Jørgensen
  3 siblings, 1 reply; 19+ messages in thread
From: Joanne Koong @ 2021-10-06 23:05 UTC (permalink / raw)
  To: bpf; +Cc: kafai, netdev, Kernel-team, Joanne Koong

This patch adds tests for bpf_load_tcp_hdr_options used by xdp
programs.

test_xdp_tcp_hdr_options.c:
- Tests ipv4 and ipv6 packets with TCPOPT_EXP and non-TCPOPT_EXP
tcp options set. Verify that options can be parsed and loaded
successfully.
- Tests error paths: TCPOPT_EXP with invalid magic, option with
invalid kind_len, non-existent option, invalid flags, option size
smaller than kind_len, invalid packet

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 .../bpf/prog_tests/xdp_tcp_hdr_options.c      | 144 +++++++++++++
 .../bpf/progs/test_xdp_tcp_hdr_options.c      | 198 ++++++++++++++++++
 2 files changed, 342 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_tcp_hdr_options.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_tcp_hdr_options.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/xdp_tcp_hdr_options.c
new file mode 100644
index 000000000000..2148199f2fcc
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_tcp_hdr_options.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include "test_progs.h"
+#include "network_helpers.h"
+#include "test_tcp_hdr_options.h"
+#include "test_xdp_tcp_hdr_options.skel.h"
+
+struct xdp_exprm_opt {
+	__u8 kind;
+	__u8 len;
+	__u16 magic;
+	struct bpf_test_option data;
+} __packed;
+
+struct xdp_regular_opt {
+	__u8 kind;
+	__u8 len;
+	struct bpf_test_option data;
+} __packed;
+
+struct xdp_test_opt {
+	struct xdp_exprm_opt exprm_opt;
+	struct xdp_regular_opt regular_opt;
+} __packed;
+
+struct xdp_ipv4_packet {
+	struct ipv4_packet pkt_v4;
+	struct xdp_test_opt test_opt;
+} __packed;
+
+struct xdp_ipv6_packet {
+	struct ipv6_packet pkt_v6;
+	struct xdp_test_opt test_opt;
+} __packed;
+
+static __u8 opt_flags = OPTION_MAX_DELACK_MS | OPTION_RAND;
+static __u8 exprm_max_delack_ms = 12;
+static __u8 regular_max_delack_ms = 21;
+static __u8 exprm_rand = 0xfa;
+static __u8 regular_rand = 0xce;
+
+static void init_test_opt(struct xdp_test_opt *test_opt,
+			  struct test_xdp_tcp_hdr_options *skel)
+{
+	test_opt->exprm_opt.kind = TCPOPT_EXP;
+	/* +1 for kind, +1 for kind-len, +2 for magic, +1 for flags, +1 for
+	 * OPTION_MAX_DELACK_MAX, +1 FOR OPTION_RAND
+	 */
+	test_opt->exprm_opt.len = 3 + TCP_BPF_EXPOPT_BASE_LEN;
+	test_opt->exprm_opt.magic = __bpf_htons(skel->rodata->test_magic);
+	test_opt->exprm_opt.data.flags = opt_flags;
+	test_opt->exprm_opt.data.max_delack_ms = exprm_max_delack_ms;
+	test_opt->exprm_opt.data.rand = exprm_rand;
+
+	test_opt->regular_opt.kind = skel->rodata->test_kind;
+	/* +1 for kind, +1 for kind-len, +1 for flags, +1 FOR
+	 * OPTION_MAX_DELACK_MS, +1 FOR OPTION_RAND
+	 */
+	test_opt->regular_opt.len = 5;
+	test_opt->regular_opt.data.flags = opt_flags;
+	test_opt->regular_opt.data.max_delack_ms = regular_max_delack_ms;
+	test_opt->regular_opt.data.rand = regular_rand;
+}
+
+static void check_opt_out(struct test_xdp_tcp_hdr_options *skel)
+{
+	struct bpf_test_option *opt_out;
+
+	opt_out = &skel->bss->exprm_opt_out;
+	ASSERT_EQ(opt_out->flags, opt_flags, "check exprm flags");
+	ASSERT_EQ(opt_out->max_delack_ms, exprm_max_delack_ms,
+		  "check exprm max_delack_ms");
+	ASSERT_EQ(opt_out->rand, exprm_rand, "check exprm rand");
+
+	opt_out = &skel->bss->regular_opt_out;
+	ASSERT_EQ(opt_out->flags, opt_flags, "check regular flags");
+	ASSERT_EQ(opt_out->max_delack_ms, regular_max_delack_ms,
+		  "check regular max_delack_ms");
+	ASSERT_EQ(opt_out->rand, regular_rand, "check regular rand");
+}
+
+void test_xdp_tcp_hdr_options(void)
+{
+	int err, prog_fd, prog_err_path_fd, prog_invalid_pkt_fd;
+	struct xdp_ipv6_packet ipv6_pkt, invalid_pkt;
+	struct test_xdp_tcp_hdr_options *skel;
+	struct xdp_ipv4_packet ipv4_pkt;
+	struct xdp_test_opt test_opt;
+	__u32 duration, retval, size;
+	char buf[128];
+
+	/* Load XDP program to introspect */
+	skel = test_xdp_tcp_hdr_options__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel open and load"))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs._xdp_load_hdr_opt);
+
+	init_test_opt(&test_opt, skel);
+
+	/* Init the packets */
+	ipv4_pkt.pkt_v4 = pkt_v4;
+	ipv4_pkt.pkt_v4.tcp.doff += 3;
+	ipv4_pkt.test_opt = test_opt;
+
+	ipv6_pkt.pkt_v6 = pkt_v6;
+	ipv6_pkt.pkt_v6.tcp.doff += 3;
+	ipv6_pkt.test_opt = test_opt;
+
+	invalid_pkt.pkt_v6 = pkt_v6;
+	/* Set to an offset that will exceed the xdp data_end */
+	invalid_pkt.pkt_v6.tcp.doff += 4;
+	invalid_pkt.test_opt = test_opt;
+
+	/* Test on ipv4 packet */
+	err = bpf_prog_test_run(prog_fd, 1, &ipv4_pkt, sizeof(ipv4_pkt),
+				buf, &size, &retval, &duration);
+	if (ASSERT_TRUE(!err && retval == XDP_PASS, "xdp_tcp_hdr_options ipv4"))
+		check_opt_out(skel);
+
+	/* Test on ipv6 packet */
+	err = bpf_prog_test_run(prog_fd, 1, &ipv6_pkt, sizeof(ipv6_pkt),
+				buf, &size, &retval, &duration);
+	if (ASSERT_TRUE(!err && retval == XDP_PASS, "xdp_tcp_hdr_options ipv6"))
+		check_opt_out(skel);
+
+	/* Test error paths */
+	prog_err_path_fd =
+		bpf_program__fd(skel->progs._xdp_load_hdr_opt_err_paths);
+	err = bpf_prog_test_run(prog_err_path_fd, 1, &ipv6_pkt, sizeof(ipv6_pkt),
+				buf, &size, &retval, &duration);
+	ASSERT_TRUE(!err && retval == XDP_PASS, "xdp_tcp_hdr_options err_path");
+
+	/* Test invalid packet */
+	prog_invalid_pkt_fd =
+		bpf_program__fd(skel->progs._xdp_load_hdr_opt_invalid_pkt);
+	err = bpf_prog_test_run(prog_invalid_pkt_fd, 1, &invalid_pkt,
+				sizeof(invalid_pkt), buf, &size, &retval,
+				&duration);
+	ASSERT_TRUE(!err && retval == XDP_PASS, "xdp_tcp_hdr_options invalid_pkt");
+
+	test_xdp_tcp_hdr_options__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_xdp_tcp_hdr_options.c
new file mode 100644
index 000000000000..3fe6e1ebd78a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_tcp_hdr_options.c
@@ -0,0 +1,198 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <errno.h>
+#include <stdbool.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/tcp.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+#define BPF_PROG_TEST_TCP_HDR_OPTIONS
+#include "test_tcp_hdr_options.h"
+
+struct bpf_test_option regular_opt_out;
+struct bpf_test_option exprm_opt_out;
+
+const __u16 test_magic = 0xeB9F;
+const __u8 test_kind = 0xB9;
+
+int err_val = 0;
+
+static void copy_opt_to_out(struct bpf_test_option *test_option, __u8 *data)
+{
+	test_option->flags = data[0];
+	test_option->max_delack_ms = data[1];
+	test_option->rand = data[2];
+}
+
+static int parse_xdp(struct xdp_md *xdp, __u64 *out_flags)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	__u64 tcphdr_offset = 0, nh_off;
+	void *data = (void *)(long)xdp->data;
+	struct ethhdr *eth = data;
+	int ret;
+
+	nh_off = sizeof(*eth);
+	if (data + nh_off > data_end) {
+		err_val = 1;
+		return XDP_DROP;
+	}
+
+	/* Calculate the offset to the tcp hdr */
+	if (eth->h_proto == __bpf_constant_htons(ETH_P_IPV6)) {
+		tcphdr_offset = sizeof(struct ethhdr) +
+			sizeof(struct ipv6hdr);
+	} else if (eth->h_proto == bpf_htons(ETH_P_IP)) {
+		tcphdr_offset = sizeof(struct ethhdr) +
+			sizeof(struct iphdr);
+	} else {
+		err_val = 2;
+		return XDP_DROP;
+	}
+
+	*out_flags = tcphdr_offset << BPF_LOAD_HDR_OPT_TCP_OFFSET_SHIFT;
+
+	return XDP_PASS;
+}
+
+SEC("xdp")
+int _xdp_load_hdr_opt(struct xdp_md *xdp)
+{
+	struct tcp_exprm_opt exprm_opt = { 0 };
+	struct tcp_opt regular_opt = { 0 };
+	__u64 flags = 0;
+	int ret;
+
+	ret = parse_xdp(xdp, &flags);
+	if (ret != XDP_PASS)
+		return ret;
+
+	/* Test TCPOPT_EXP */
+	exprm_opt.kind = TCPOPT_EXP;
+	exprm_opt.len = 4;
+	exprm_opt.magic = __bpf_htons(test_magic);
+	ret = bpf_load_hdr_opt(xdp, &exprm_opt,
+			       sizeof(exprm_opt), flags);
+	if (ret < 0) {
+		err_val = 3;
+		return XDP_DROP;
+	}
+
+	copy_opt_to_out(&exprm_opt_out, exprm_opt.data);
+
+	/* Test non-TCP_OPT_EXP */
+	regular_opt.kind = test_kind;
+	ret = bpf_load_hdr_opt(xdp, &regular_opt,
+			       sizeof(regular_opt), flags);
+	if (ret < 0) {
+		err_val = 4;
+		return XDP_DROP;
+	}
+
+	copy_opt_to_out(&regular_opt_out, regular_opt.data);
+
+	return XDP_PASS;
+}
+
+SEC("xdp")
+int _xdp_load_hdr_opt_err_paths(struct xdp_md *xdp)
+{
+	struct tcp_exprm_opt exprm_opt = { 0 };
+	struct tcp_opt regular_opt = { 0 };
+	__u64 flags = 0;
+	int ret;
+
+	ret = parse_xdp(xdp, &flags);
+	if (ret != XDP_PASS)
+		return ret;
+
+	/* Test TCPOPT_EXP with invalid magic */
+	exprm_opt.kind = TCPOPT_EXP;
+	exprm_opt.len = 4;
+	exprm_opt.magic = __bpf_htons(test_magic + 1);
+	ret = bpf_load_hdr_opt(xdp, &exprm_opt,
+			       sizeof(exprm_opt), flags);
+	if (ret != -ENOMSG) {
+		err_val = 3;
+		return XDP_DROP;
+	}
+
+	/* Test TCPOPT_EXP with 0 magic */
+	exprm_opt.magic = 0;
+	ret = bpf_load_hdr_opt(xdp, &exprm_opt,
+			       sizeof(exprm_opt), flags);
+	if (ret != -ENOMSG) {
+		err_val = 4;
+		return XDP_DROP;
+	}
+
+	exprm_opt.magic = __bpf_htons(test_magic);
+
+	/* Test TCPOPT_EXP with invalid kind length */
+	exprm_opt.len = 5;
+	ret = bpf_load_hdr_opt(xdp, &exprm_opt,
+			       sizeof(exprm_opt), flags);
+	if (ret != -EINVAL) {
+		err_val = 5;
+		return XDP_DROP;
+	}
+
+	/* Test that non-existent option is not found */
+	regular_opt.kind = test_kind + 1;
+	ret = bpf_load_hdr_opt(xdp, &regular_opt,
+			       sizeof(regular_opt), flags);
+	if (ret != -ENOMSG) {
+		err_val = 6;
+		return XDP_DROP;
+	}
+
+	/* Test invalid flags */
+	regular_opt.kind = test_kind;
+	ret = bpf_load_hdr_opt(xdp, &regular_opt, sizeof(regular_opt),
+			       flags | BPF_LOAD_HDR_OPT_TCP_SYN);
+	if (ret != -EINVAL) {
+		err_val = 7;
+		return XDP_DROP;
+	}
+
+	/* Test non-TCP_OPT_EXP with option size smaller than kind len */
+	ret = bpf_load_hdr_opt(xdp, &regular_opt,
+			       sizeof(regular_opt) - 2, flags);
+	if (ret != -ENOSPC) {
+		err_val = 8;
+		return XDP_DROP;
+	}
+
+	return XDP_PASS;
+}
+
+SEC("xdp")
+int _xdp_load_hdr_opt_invalid_pkt(struct xdp_md *xdp)
+{
+	struct tcp_exprm_opt exprm_opt = { 0 };
+	__u64 flags = 0;
+	int ret;
+
+	ret = parse_xdp(xdp, &flags);
+	if (ret != XDP_PASS)
+		return ret;
+
+	exprm_opt.kind = TCPOPT_EXP;
+	exprm_opt.len = 4;
+	exprm_opt.magic = __bpf_htons(test_magic);
+	ret = bpf_load_hdr_opt(xdp, &exprm_opt,
+			       sizeof(exprm_opt), flags);
+	if (ret != -EINVAL) {
+		err_val = 3;
+		return XDP_DROP;
+	}
+
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 2/3] bpf/selftests: Rename test_tcp_hdr_options to test_sockops_tcp_hdr_options
  2021-10-06 23:05 ` [PATCH bpf-next v2 2/3] bpf/selftests: Rename test_tcp_hdr_options to test_sockops_tcp_hdr_options Joanne Koong
@ 2021-10-06 23:47   ` Song Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2021-10-06 23:47 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Martin KaFai Lau, Networking, Kernel Team

On Wed, Oct 6, 2021 at 4:14 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> Currently, tcp_hdr_options is only supported for sockops type programs.
> This patchset adds xdp tcp_hdr_options support. To more easily
> differentiate between these two tests, this patch does the following
> renames (with  no functional changes):
>
> test_tcp_hdr_options -> test_sockops_tcp_hdr_options
> test_misc_tcp_hdr_options -> test_sockops_misc_tcp_hdr_options
>
> The next patch will add xdp_test_tcp_hdr_options.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

IIRC, I already acked 2/3 in v1, so you can include the Acked-by tag in v2.

Thanks,
Song

> ---
>  ...hdr_options.c => sockops_tcp_hdr_options.c} | 18 +++++++++---------
>  ...s.c => test_sockops_misc_tcp_hdr_options.c} |  0
>  ...ptions.c => test_sockops_tcp_hdr_options.c} |  0
>  3 files changed, 9 insertions(+), 9 deletions(-)
>  rename tools/testing/selftests/bpf/prog_tests/{tcp_hdr_options.c => sockops_tcp_hdr_options.c} (96%)
>  rename tools/testing/selftests/bpf/progs/{test_misc_tcp_hdr_options.c => test_sockops_misc_tcp_hdr_options.c} (100%)
>  rename tools/testing/selftests/bpf/progs/{test_tcp_hdr_options.c => test_sockops_tcp_hdr_options.c} (100%)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/sockops_tcp_hdr_options.c
> similarity index 96%
> rename from tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
> rename to tools/testing/selftests/bpf/prog_tests/sockops_tcp_hdr_options.c
> index 1fa772079967..f8fb12f4c1ed 100644
> --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockops_tcp_hdr_options.c
> @@ -12,8 +12,8 @@
>  #include "cgroup_helpers.h"
>  #include "network_helpers.h"
>  #include "test_tcp_hdr_options.h"
> -#include "test_tcp_hdr_options.skel.h"
> -#include "test_misc_tcp_hdr_options.skel.h"
> +#include "test_sockops_tcp_hdr_options.skel.h"
> +#include "test_sockops_misc_tcp_hdr_options.skel.h"
>
>  #define LO_ADDR6 "::1"
>  #define CG_NAME "/tcpbpf-hdr-opt-test"
> @@ -25,8 +25,8 @@ static struct bpf_test_option exp_active_fin_in;
>  static struct hdr_stg exp_passive_hdr_stg;
>  static struct hdr_stg exp_active_hdr_stg = { .active = true, };
>
> -static struct test_misc_tcp_hdr_options *misc_skel;
> -static struct test_tcp_hdr_options *skel;
> +static struct test_sockops_misc_tcp_hdr_options *misc_skel;
> +static struct test_sockops_tcp_hdr_options *skel;
>  static int lport_linum_map_fd;
>  static int hdr_stg_map_fd;
>  static __u32 duration;
> @@ -570,15 +570,15 @@ static struct test tests[] = {
>         DEF_TEST(misc),
>  };
>
> -void test_tcp_hdr_options(void)
> +void test_sockops_tcp_hdr_options(void)
>  {
>         int i;
>
> -       skel = test_tcp_hdr_options__open_and_load();
> +       skel = test_sockops_tcp_hdr_options__open_and_load();
>         if (CHECK(!skel, "open and load skel", "failed"))
>                 return;
>
> -       misc_skel = test_misc_tcp_hdr_options__open_and_load();
> +       misc_skel = test_sockops_misc_tcp_hdr_options__open_and_load();
>         if (CHECK(!misc_skel, "open and load misc test skel", "failed"))
>                 goto skel_destroy;
>
> @@ -600,6 +600,6 @@ void test_tcp_hdr_options(void)
>
>         close(cg_fd);
>  skel_destroy:
> -       test_misc_tcp_hdr_options__destroy(misc_skel);
> -       test_tcp_hdr_options__destroy(skel);
> +       test_sockops_misc_tcp_hdr_options__destroy(misc_skel);
> +       test_sockops_tcp_hdr_options__destroy(skel);
>  }
> diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_sockops_misc_tcp_hdr_options.c
> similarity index 100%
> rename from tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> rename to tools/testing/selftests/bpf/progs/test_sockops_misc_tcp_hdr_options.c
> diff --git a/tools/testing/selftests/bpf/progs/test_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_sockops_tcp_hdr_options.c
> similarity index 100%
> rename from tools/testing/selftests/bpf/progs/test_tcp_hdr_options.c
> rename to tools/testing/selftests/bpf/progs/test_sockops_tcp_hdr_options.c
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 1/3] bpf/xdp: Add bpf_load_hdr_opt support for xdp
  2021-10-06 23:05 ` [PATCH bpf-next v2 1/3] bpf/xdp: Add bpf_load_hdr_opt support for xdp Joanne Koong
@ 2021-10-06 23:50   ` Song Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2021-10-06 23:50 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Martin KaFai Lau, Networking, Kernel Team

On Wed, Oct 6, 2021 at 4:09 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch enables XDP programs to use the bpf_load_hdr_opt helper
> function to load header options.
>
> The upper 16 bits of "flags" is used to denote the offset to the tcp
> header. No other flags are, at this time, used by XDP programs.
> In the future, more flags can be included to support other types of
> header options.
>
> Much of the logic for loading header options can be shared between
> sockops and xdp programs. In net/core/filter.c, this common shared
> logic is refactored into a separate function both sockops and xdp
> use.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 3/3] bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests
  2021-10-06 23:05 ` [PATCH bpf-next v2 3/3] bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests Joanne Koong
@ 2021-10-06 23:52   ` Song Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2021-10-06 23:52 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Martin KaFai Lau, Networking, Kernel Team

On Wed, Oct 6, 2021 at 4:09 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch adds tests for bpf_load_tcp_hdr_options used by xdp
> programs.
>
> test_xdp_tcp_hdr_options.c:
> - Tests ipv4 and ipv6 packets with TCPOPT_EXP and non-TCPOPT_EXP
> tcp options set. Verify that options can be parsed and loaded
> successfully.
> - Tests error paths: TCPOPT_EXP with invalid magic, option with
> invalid kind_len, non-existent option, invalid flags, option size
> smaller than kind_len, invalid packet
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-06 23:05 [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Joanne Koong
                   ` (2 preceding siblings ...)
  2021-10-06 23:05 ` [PATCH bpf-next v2 3/3] bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests Joanne Koong
@ 2021-10-07 14:41 ` Toke Høiland-Jørgensen
  2021-10-07 20:57   ` Joanne Koong
  3 siblings, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-07 14:41 UTC (permalink / raw)
  To: Joanne Koong, bpf; +Cc: kafai, netdev, Kernel-team

Joanne Koong <joannekoong@fb.com> writes:

> Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to
> parse the tcp header option. It will be useful to allow other bpf prog
> types to have a similar way of handling tcp hdr options.
>
> This series adds XDP support for bpf_load_hdr_opt(). At a high level,
> these patches are:

Why is this needed? Why not just parse the header directly in XDP? Seems
a bit arbitrary to add a helper for this particular type of packet
payload parsing to this particular program type. I.e., what about other
headers (IP options?)? Are we going to have a whole bunch of
special-purpose parsing helpers to pick out protocol data from packets?

Also, why only enable this for XDP (and not, say the TC hook as well)?

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-07 14:41 ` [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Toke Høiland-Jørgensen
@ 2021-10-07 20:57   ` Joanne Koong
  2021-10-07 21:25     ` Daniel Borkmann
  0 siblings, 1 reply; 19+ messages in thread
From: Joanne Koong @ 2021-10-07 20:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, bpf; +Cc: kafai, netdev, Kernel-team

On 10/7/21 7:41 AM, Toke Høiland-Jørgensen wrote:

> Joanne Koong <joannekoong@fb.com> writes:
>
>> Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to
>> parse the tcp header option. It will be useful to allow other bpf prog
>> types to have a similar way of handling tcp hdr options.
>>
>> This series adds XDP support for bpf_load_hdr_opt(). At a high level,
>> these patches are:
> Why is this needed? Why not just parse the header directly in XDP?
Parsing a variable number of TCP options is challenging for the verifier.
Some programs are using #pragma unroll as a temporary workaround
(https://github.com/xdp-project/bpf-examples/blob/master/pping/pping_kern.c#L95)
I believe Christian Deacon also recently posted about this on the xdp 
mailing list
with a link to his bpf fail logs in 
https://github.com/gamemann/XDP-TCP-Header-Options
which showcases some of the difficulties involved

> Seems
> a bit arbitrary to add a helper for this particular type of packet
> payload parsing to this particular program type. I.e., what about other
> headers (IP options?)?
The current use case needs so far have been for parsing tcp headers, but
in the future, when there are needs for parsing other types, they
can be supported as well through bpf_load_hdr_opt.

> Are we going to have a whole bunch of
> special-purpose parsing helpers to pick out protocol data from packets?

I think bpf_load_hdr_opt is generic enough to support parsing
any kind of protocol data (as specified through flags) in the packets
> Also, why only enable this for XDP (and not, say the TC hook as well)?
The plan is to also support this in tc as well (this will be in a separate
patchset)
> -Toke
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-07 20:57   ` Joanne Koong
@ 2021-10-07 21:25     ` Daniel Borkmann
  2021-10-07 23:52       ` Martin KaFai Lau
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2021-10-07 21:25 UTC (permalink / raw)
  To: Joanne Koong, Toke Høiland-Jørgensen, bpf
  Cc: kafai, netdev, Kernel-team

On 10/7/21 10:57 PM, Joanne Koong wrote:
> On 10/7/21 7:41 AM, Toke Høiland-Jørgensen wrote:
>> Joanne Koong <joannekoong@fb.com> writes:
>>
>>> Currently, bpf_sockops programs have been using bpf_load_hdr_opt() to
>>> parse the tcp header option. It will be useful to allow other bpf prog
>>> types to have a similar way of handling tcp hdr options.
>>>
>>> This series adds XDP support for bpf_load_hdr_opt(). At a high level,
>>> these patches are:
>> Why is this needed? Why not just parse the header directly in XDP?
> Parsing a variable number of TCP options is challenging for the verifier.
> Some programs are using #pragma unroll as a temporary workaround
> (https://github.com/xdp-project/bpf-examples/blob/master/pping/pping_kern.c#L95)
> I believe Christian Deacon also recently posted about this on the xdp mailing list
> with a link to his bpf fail logs in https://github.com/gamemann/XDP-TCP-Header-Options
> which showcases some of the difficulties involved
> 
>> Seems
>> a bit arbitrary to add a helper for this particular type of packet
>> payload parsing to this particular program type. I.e., what about other
>> headers (IP options?)?
> The current use case needs so far have been for parsing tcp headers, but
> in the future, when there are needs for parsing other types, they
> can be supported as well through bpf_load_hdr_opt.
> 
>> Are we going to have a whole bunch of
>> special-purpose parsing helpers to pick out protocol data from packets?
> 
> I think bpf_load_hdr_opt is generic enough to support parsing
> any kind of protocol data (as specified through flags) in the packets

I tend to agree with Toke here that this is not generic. What has been tried
to improve the verifier instead before submitting the series? It would be much
more preferable to improve the developer experience with regards to a generic
solution, so that other/similar problems can be tackled in one go as well such
as IP options, extension headers, etc.

>> Also, why only enable this for XDP (and not, say the TC hook as well)?
> The plan is to also support this in tc as well (this will be in a separate
> patchset)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-07 21:25     ` Daniel Borkmann
@ 2021-10-07 23:52       ` Martin KaFai Lau
  2021-10-08 22:20         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2021-10-07 23:52 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Joanne Koong, Toke Høiland-Jørgensen, bpf, netdev, Kernel-team

On Thu, Oct 07, 2021 at 11:25:29PM +0200, Daniel Borkmann wrote:
> I tend to agree with Toke here that this is not generic. What has been tried
> to improve the verifier instead before submitting the series? It would be much
> more preferable to improve the developer experience with regards to a generic
> solution, so that other/similar problems can be tackled in one go as well such
> as IP options, extension headers, etc.
It would be nice to improve verifier to recognize it more smoothly.  Would
love to hear idea how to do it.

When adding the tcp header options for bpf_sockops, a bpf_store_hdr_opt()
is needed to ensure the header option is sane.  When writing test to parse
variable length header option, I also pulled in tricks (e.g. "#pragma unroll"
is easier to get it work.  Tried bounded loop but then hits max insns and
then moved some cases into subprog...etc).  Most (if not all) TCP headers
has some options (e.g. tstamp), so it will be useful to have an easy way
to search a particular option and bpf_load_hdr_opt() was also added to
bpf_sockops.

When bpf-tcp-hdr-opt was added, the initial use case was only useful in the
TCP stack because it wants to change the behavior of a tcp_sock.  When
bpf allows to add tcp-hdr-opt easily at tcp stack, it opens up other use
cases and one of them is to load-balance by the tcp-hdr-opt "server_id" in
XDP [1].  The same user that writes the bpf_sockops prog to add/parse
tcp-opt needs to do extra tricks to get this work in xdp prog.  The user
had repeated a similar try-and-error exercise (e.g. while the logical thinking
is to somehow bounded by the max tcp header size but that does not work
in unroll.  With the current cases in the loop, 15 is the max
magic number that works and hoping it will continue to work).

[1]: https://linuxplumbersconf.org/event/11/contributions/950/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-07 23:52       ` Martin KaFai Lau
@ 2021-10-08 22:20         ` Toke Høiland-Jørgensen
  2021-10-11 18:43           ` Martin KaFai Lau
  2021-10-19  0:00           ` Alexei Starovoitov
  0 siblings, 2 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-08 22:20 UTC (permalink / raw)
  To: Martin KaFai Lau, Daniel Borkmann; +Cc: Joanne Koong, bpf, netdev, Kernel-team

Martin KaFai Lau <kafai@fb.com> writes:

> On Thu, Oct 07, 2021 at 11:25:29PM +0200, Daniel Borkmann wrote:
>> I tend to agree with Toke here that this is not generic. What has been tried
>> to improve the verifier instead before submitting the series? It would be much
>> more preferable to improve the developer experience with regards to a generic
>> solution, so that other/similar problems can be tackled in one go as well such
>> as IP options, extension headers, etc.
> It would be nice to improve verifier to recognize it more smoothly.  Would
> love to hear idea how to do it.

So as far as I could tell, the verifier blows up in part because when
there's multiple bounded loops in sequence the verifier gets into a
combinatorial explosion of exploring all paths through the first loop
combined with all paths through the second. So if we could teach the
verifier to recognise that each loop is a separate entity to avoid this,
I think looping through headers would be a lot easier.

As you can probably tell, though, there is quite a bit of handwaving in
the above, and I have no idea how to actually do this. Some kind of
invariant analysis, maybe? But is this possible in general?

> When adding the tcp header options for bpf_sockops, a bpf_store_hdr_opt()
> is needed to ensure the header option is sane.  When writing test to parse
> variable length header option, I also pulled in tricks (e.g. "#pragma unroll"
> is easier to get it work.  Tried bounded loop but then hits max insns and
> then moved some cases into subprog...etc).  Most (if not all) TCP headers
> has some options (e.g. tstamp), so it will be useful to have an easy way
> to search a particular option and bpf_load_hdr_opt() was also added to
> bpf_sockops.

So if we can't fix the verifier, maybe we could come up with a more
general helper for packet parsing? Something like:

bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
{
  ptr = ctx->data + offset;
  while (ptr < ctx->data_end) {
    offset = callback_fn(ptr, ctx->data_end, callback_arg);
    if (offset == 0)
      return 0;
    ptr += offset;
  }
  
  // out of bounds before callback was done
  return -EINVAL;
}
   
This would work for parsing any kind of packet header or TLV-style data
without having to teach the kernel about each header type. It'll have
quite a bit of overhead if all the callbacks happen via indirect calls,
but maybe the verifier can inline the calls (or at least turn them into
direct CALL instructions)?

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-08 22:20         ` Toke Høiland-Jørgensen
@ 2021-10-11 18:43           ` Martin KaFai Lau
  2021-10-12 14:11             ` Toke Høiland-Jørgensen
  2021-10-19  0:00           ` Alexei Starovoitov
  1 sibling, 1 reply; 19+ messages in thread
From: Martin KaFai Lau @ 2021-10-11 18:43 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, Joanne Koong, bpf, netdev, Kernel-team

On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
> So if we can't fix the verifier, maybe we could come up with a more
> general helper for packet parsing? Something like:
> 
> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
> {
>   ptr = ctx->data + offset;
>   while (ptr < ctx->data_end) {
>     offset = callback_fn(ptr, ctx->data_end, callback_arg);
>     if (offset == 0)
>       return 0;
>     ptr += offset;
>   }
>   
>   // out of bounds before callback was done
>   return -EINVAL;
> }
>    
> This would work for parsing any kind of packet header or TLV-style data
> without having to teach the kernel about each header type. It'll have
> quite a bit of overhead if all the callbacks happen via indirect calls,
> but maybe the verifier can inline the calls (or at least turn them into
> direct CALL instructions)?
Direct call different callback_fn?  bpf_for_each_pkt_chunk() is a kernel
function.  It would be nice if the verifier could do that.

This for_each helper had been considered also.  Other than the need
to callback in a loop, the thought was to extend the existing
bpf_load_hdr_opt() because our initial feedback is the same
header handling logic cannot be used in xdp which is confusing.

I don't mind to go with the for_each helper.  However, with another
thought, if it needs to call a function in the loop anyway, I think
it could also be done in bpf by putting a global function in a loop.
Need to try and double check.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-11 18:43           ` Martin KaFai Lau
@ 2021-10-12 14:11             ` Toke Høiland-Jørgensen
  2021-10-12 20:51               ` Joanne Koong
  0 siblings, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-12 14:11 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: Daniel Borkmann, Joanne Koong, bpf, netdev, Kernel-team

Martin KaFai Lau <kafai@fb.com> writes:

> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>> So if we can't fix the verifier, maybe we could come up with a more
>> general helper for packet parsing? Something like:
>> 
>> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
>> {
>>   ptr = ctx->data + offset;
>>   while (ptr < ctx->data_end) {
>>     offset = callback_fn(ptr, ctx->data_end, callback_arg);
>>     if (offset == 0)
>>       return 0;
>>     ptr += offset;
>>   }
>>   
>>   // out of bounds before callback was done
>>   return -EINVAL;
>> }
>>    
>> This would work for parsing any kind of packet header or TLV-style data
>> without having to teach the kernel about each header type. It'll have
>> quite a bit of overhead if all the callbacks happen via indirect calls,
>> but maybe the verifier can inline the calls (or at least turn them into
>> direct CALL instructions)?
> Direct call different callback_fn?  bpf_for_each_pkt_chunk() is a kernel
> function.  It would be nice if the verifier could do that.

Ohh, right, think-o on my part. It could be done if the helper was
inlined in its entirety, though? Not sure if that's feasible?

> This for_each helper had been considered also. Other than the need to
> callback in a loop, the thought was to extend the existing
> bpf_load_hdr_opt() because our initial feedback is the same header
> handling logic cannot be used in xdp which is confusing.

TBH, I had not noticed this helper before. Now that I have, it does
seems like the kind of thing that belongs as a BPF library function
rather than a helper in the first place :)

> I don't mind to go with the for_each helper.  However, with another
> thought, if it needs to call a function in the loop anyway, I think
> it could also be done in bpf by putting a global function in a loop.
> Need to try and double check.

Hmm, that would be interesting if possible!

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-12 14:11             ` Toke Høiland-Jørgensen
@ 2021-10-12 20:51               ` Joanne Koong
  2021-10-13 10:19                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 19+ messages in thread
From: Joanne Koong @ 2021-10-12 20:51 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Martin KaFai Lau
  Cc: Daniel Borkmann, bpf, netdev, Kernel-team

On 10/12/21 7:11 AM, Toke Høiland-Jørgensen wrote:

> Martin KaFai Lau <kafai@fb.com> writes:
>> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>> [...] 
>> I don't mind to go with the for_each helper.  However, with another
>> thought, if it needs to call a function in the loop anyway, I think
>> it could also be done in bpf by putting a global function in a loop.
>> Need to try and double check.
> Hmm, that would be interesting if possible!
>
> -Toke
>
Martin and I tried this out. When we moved out the logic for parsing the
options into a global non-inlined function, the verifier approved the
program.

As such, I will abandon this patchset and submit a new separate patch
that will add a test to ensure we're always able to parse header options
through a global function.

Thanks for the conversation on this, Toke, Martin, and Daniel!


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-12 20:51               ` Joanne Koong
@ 2021-10-13 10:19                 ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-13 10:19 UTC (permalink / raw)
  To: Joanne Koong, Martin KaFai Lau; +Cc: Daniel Borkmann, bpf, netdev, Kernel-team

Joanne Koong <joannekoong@fb.com> writes:

> On 10/12/21 7:11 AM, Toke Høiland-Jørgensen wrote:
>
>> Martin KaFai Lau <kafai@fb.com> writes:
>>> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>>> [...] 
>>> I don't mind to go with the for_each helper.  However, with another
>>> thought, if it needs to call a function in the loop anyway, I think
>>> it could also be done in bpf by putting a global function in a loop.
>>> Need to try and double check.
>> Hmm, that would be interesting if possible!
>>
>> -Toke
>>
> Martin and I tried this out. When we moved out the logic for parsing the
> options into a global non-inlined function, the verifier approved the
> program.

Excellent!

> As such, I will abandon this patchset and submit a new separate patch
> that will add a test to ensure we're always able to parse header options
> through a global function.
>
> Thanks for the conversation on this, Toke, Martin, and Daniel!

Sounds good - you're welcome :)

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-08 22:20         ` Toke Høiland-Jørgensen
  2021-10-11 18:43           ` Martin KaFai Lau
@ 2021-10-19  0:00           ` Alexei Starovoitov
  2021-10-19 16:02             ` Yonghong Song
  2021-10-19 16:10             ` Toke Høiland-Jørgensen
  1 sibling, 2 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2021-10-19  0:00 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Martin KaFai Lau, Daniel Borkmann, Joanne Koong, bpf, netdev,
	Kernel-team

On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
> 
> So if we can't fix the verifier, maybe we could come up with a more
> general helper for packet parsing? Something like:
> 
> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
> {
>   ptr = ctx->data + offset;
>   while (ptr < ctx->data_end) {
>     offset = callback_fn(ptr, ctx->data_end, callback_arg);
>     if (offset == 0)
>       return 0;
>     ptr += offset;
>   }
>   
>   // out of bounds before callback was done
>   return -EINVAL;
> }

We're starting to work on this since it will be useful not only for
packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration.

> This would work for parsing any kind of packet header or TLV-style data
> without having to teach the kernel about each header type. It'll have
> quite a bit of overhead if all the callbacks happen via indirect calls,
> but maybe the verifier can inline the calls (or at least turn them into
> direct CALL instructions)?

Right. That's the main downside.
If the bpf_for_each*() helper is simple enough the verifier can inline it
similar to map_gen_lookup. In such case the indirect call will be a direct call,
so the overhead won't be that bad, but it's still a function call and
static function will have full prologue+epilogue.
Converting static function into direct jump would be really challenging
for the verifier and won't provide much benefit, since r6-r9 save/restore
would need to happen anyway even for such 'inlined' static func, since
llvm will be freely using r6-r9 for insns inside function body
assuming that it's a normal function.

May be there is a way to avoid call overhead with with clang extensions.
If we want to do:
int mem_eq(char *p1, char *p2, int size)
{
  int i;
  for (i = 0; i < size; i++)
    if (p1[i] != p2[i])
      return 0;
  return 1;
}

With clang extension we might write it as:
int bpf_mem_eq(char *p1, char *p2, int size)
{
  int i = 0;
  int iter;

  iter = __builtin_for_until(i, size, ({
      if (p1[i] != p2[i])
        goto out;
  }));
  out:
  if (iter != size)
    return 0;
  return 1;
}

The llvm will generate pseudo insns for this __builtin_for.
The verifier will analyze the loop body for the range [0, size)
and replace pseudo insns with normal branches after the verification.
We might even keep the normal C syntax for loops and use
llvm HardwareLoops pass to add pseudo insns.
It's more or less the same ideas for loops we discussed before
bounded loops were introduced.
The main problem with bounded loops is that the loop body will
typically be verified the number of times equal to the number of iterations.
So for any non-trivial loop such iteration count is not much more
than 100. The verifier can do scalar evolution analysis, but
it's likely won't work for many cases and user experience
will suffer. With __builtin_for the scalar evolution is not necessary,
since induction variable is one and explicit and its range is explicit too.
That enables single pass over loop body.
One might argue that for (i = 0; i < 10000; i += 10) loops are
necessary too, but instead of complicating the verifier with sparse
ranges it's better to put that on users that can do:
  iter = __builtin_for_until(i, 10000 / 10, ({
      j = i * 10;
      use j;
  }));
Single explicit induction variable makes the verification practical.
The loop body won't be as heavily optimized as normal loop,
but it's a good thing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-19  0:00           ` Alexei Starovoitov
@ 2021-10-19 16:02             ` Yonghong Song
  2021-10-19 16:10             ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 19+ messages in thread
From: Yonghong Song @ 2021-10-19 16:02 UTC (permalink / raw)
  To: Alexei Starovoitov, Toke Høiland-Jørgensen
  Cc: Martin KaFai Lau, Daniel Borkmann, Joanne Koong, bpf, netdev,
	Kernel-team



On 10/18/21 5:00 PM, Alexei Starovoitov wrote:
> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>>
>> So if we can't fix the verifier, maybe we could come up with a more
>> general helper for packet parsing? Something like:
>>
>> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
>> {
>>    ptr = ctx->data + offset;
>>    while (ptr < ctx->data_end) {
>>      offset = callback_fn(ptr, ctx->data_end, callback_arg);
>>      if (offset == 0)
>>        return 0;
>>      ptr += offset;
>>    }
>>    
>>    // out of bounds before callback was done
>>    return -EINVAL;
>> }
> 
> We're starting to work on this since it will be useful not only for
> packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration.
> 
>> This would work for parsing any kind of packet header or TLV-style data
>> without having to teach the kernel about each header type. It'll have
>> quite a bit of overhead if all the callbacks happen via indirect calls,
>> but maybe the verifier can inline the calls (or at least turn them into
>> direct CALL instructions)?
> 
> Right. That's the main downside.
> If the bpf_for_each*() helper is simple enough the verifier can inline it
> similar to map_gen_lookup. In such case the indirect call will be a direct call,
> so the overhead won't be that bad, but it's still a function call and
> static function will have full prologue+epilogue.
> Converting static function into direct jump would be really challenging
> for the verifier and won't provide much benefit, since r6-r9 save/restore
> would need to happen anyway even for such 'inlined' static func, since
> llvm will be freely using r6-r9 for insns inside function body
> assuming that it's a normal function.
> 
> May be there is a way to avoid call overhead with with clang extensions.
> If we want to do:
> int mem_eq(char *p1, char *p2, int size)
> {
>    int i;
>    for (i = 0; i < size; i++)
>      if (p1[i] != p2[i])
>        return 0;
>    return 1;
> }
> 
> With clang extension we might write it as:
> int bpf_mem_eq(char *p1, char *p2, int size)
> {
>    int i = 0;
>    int iter;
> 
>    iter = __builtin_for_until(i, size, ({
>        if (p1[i] != p2[i])
>          goto out;
>    }));
>    out:
>    if (iter != size)
>      return 0;
>    return 1;
> }
> 
> The llvm will generate pseudo insns for this __builtin_for.
> The verifier will analyze the loop body for the range [0, size)
> and replace pseudo insns with normal branches after the verification.
> We might even keep the normal C syntax for loops and use
> llvm HardwareLoops pass to add pseudo insns.
> It's more or less the same ideas for loops we discussed before
> bounded loops were introduced.
> The main problem with bounded loops is that the loop body will
> typically be verified the number of times equal to the number of iterations.
> So for any non-trivial loop such iteration count is not much more
> than 100. The verifier can do scalar evolution analysis, but
> it's likely won't work for many cases and user experience
> will suffer. With __builtin_for the scalar evolution is not necessary,
> since induction variable is one and explicit and its range is explicit too.
> That enables single pass over loop body.
> One might argue that for (i = 0; i < 10000; i += 10) loops are
> necessary too, but instead of complicating the verifier with sparse
> ranges it's better to put that on users that can do:
>    iter = __builtin_for_until(i, 10000 / 10, ({
>        j = i * 10;
>        use j;
>    }));
> Single explicit induction variable makes the verification practical.
> The loop body won't be as heavily optimized as normal loop,
> but it's a good thing.

We have discussed how to verify *well-formed* loops back in 2018.
(BPF control flow, supporting loops and other patterns:
  https://www.linuxplumbersconf.org/event/2/contributions/116/)
Now probably the time to revisit it again!

I think Alexei's proposal is the right direction to have
compiler preserving the loop structure with some pseudo instructions
and verifier is able to range-based verification
instead of iterating all loop iterations.

LLVM already has some IR loop intrinsics like below:
   def int_set_loop_iterations :
   def int_start_loop_iterations :
   def int_test_set_loop_iterations :
   def int_test_start_loop_iterations :
   def int_loop_decrement :
   def int_loop_decrement_reg :
to facilitate hardware loop. BPF target can define its own
intrinsics to help define a well structured loop.

I will look into this.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt
  2021-10-19  0:00           ` Alexei Starovoitov
  2021-10-19 16:02             ` Yonghong Song
@ 2021-10-19 16:10             ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-19 16:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Daniel Borkmann, Joanne Koong, bpf, netdev,
	Kernel-team

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Sat, Oct 09, 2021 at 12:20:27AM +0200, Toke Høiland-Jørgensen wrote:
>> 
>> So if we can't fix the verifier, maybe we could come up with a more
>> general helper for packet parsing? Something like:
>> 
>> bpf_for_each_pkt_chunk(ctx, offset, callback_fn, callback_arg)
>> {
>>   ptr = ctx->data + offset;
>>   while (ptr < ctx->data_end) {
>>     offset = callback_fn(ptr, ctx->data_end, callback_arg);
>>     if (offset == 0)
>>       return 0;
>>     ptr += offset;
>>   }
>>   
>>   // out of bounds before callback was done
>>   return -EINVAL;
>> }
>
> We're starting to work on this since it will be useful not only for
> packet parsing, TLV parsing, but potentially any kind of 'for' loop iteration.

Awesome! :)

>> This would work for parsing any kind of packet header or TLV-style data
>> without having to teach the kernel about each header type. It'll have
>> quite a bit of overhead if all the callbacks happen via indirect calls,
>> but maybe the verifier can inline the calls (or at least turn them into
>> direct CALL instructions)?
>
> Right. That's the main downside.
> If the bpf_for_each*() helper is simple enough the verifier can inline it
> similar to map_gen_lookup. In such case the indirect call will be a direct call,
> so the overhead won't be that bad, but it's still a function call and
> static function will have full prologue+epilogue.
> Converting static function into direct jump would be really challenging
> for the verifier and won't provide much benefit, since r6-r9 save/restore
> would need to happen anyway even for such 'inlined' static func, since
> llvm will be freely using r6-r9 for insns inside function body
> assuming that it's a normal function.

I reckon it could be acceptable to have the overhead of a regular
function call per iteration, but obviously it would be better to avoid
it.

> May be there is a way to avoid call overhead with with clang extensions.
> If we want to do:
> int mem_eq(char *p1, char *p2, int size)
> {
>   int i;
>   for (i = 0; i < size; i++)
>     if (p1[i] != p2[i])
>       return 0;
>   return 1;
> }
>
> With clang extension we might write it as:
> int bpf_mem_eq(char *p1, char *p2, int size)
> {
>   int i = 0;
>   int iter;
>
>   iter = __builtin_for_until(i, size, ({
>       if (p1[i] != p2[i])
>         goto out;
>   }));
>   out:
>   if (iter != size)
>     return 0;
>   return 1;
> }
>
> The llvm will generate pseudo insns for this __builtin_for.
> The verifier will analyze the loop body for the range [0, size)
> and replace pseudo insns with normal branches after the verification.

That seems like an interesting approach! The __builtin_for thing is a
little awkward, but not more than turning the loop into a separate
function + callback.

What about backwards compatibility? Would you have to ensure your kernel
understands the loop instructions before you put them into your code, or
could libbpf be taught to rewrite them if the kernel doesn't understand
them (say, to a separate function that is called in a regular bounded
loop)?

> We might even keep the normal C syntax for loops and use
> llvm HardwareLoops pass to add pseudo insns.

Now *this* would be cool!

> It's more or less the same ideas for loops we discussed before
> bounded loops were introduced.

Why was it rejected at the time?

> The main problem with bounded loops is that the loop body will
> typically be verified the number of times equal to the number of iterations.
> So for any non-trivial loop such iteration count is not much more
> than 100. The verifier can do scalar evolution analysis, but
> it's likely won't work for many cases and user experience
> will suffer. With __builtin_for the scalar evolution is not necessary,
> since induction variable is one and explicit and its range is explicit too.
> That enables single pass over loop body.
> One might argue that for (i = 0; i < 10000; i += 10) loops are
> necessary too, but instead of complicating the verifier with sparse
> ranges it's better to put that on users that can do:
>   iter = __builtin_for_until(i, 10000 / 10, ({
>       j = i * 10;
>       use j;
>   }));
> Single explicit induction variable makes the verification practical.
> The loop body won't be as heavily optimized as normal loop,
> but it's a good thing.

Agreed, limiting things to single-step loops would be acceptable.

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-10-19 16:10 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-06 23:05 [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Joanne Koong
2021-10-06 23:05 ` [PATCH bpf-next v2 1/3] bpf/xdp: Add bpf_load_hdr_opt support for xdp Joanne Koong
2021-10-06 23:50   ` Song Liu
2021-10-06 23:05 ` [PATCH bpf-next v2 2/3] bpf/selftests: Rename test_tcp_hdr_options to test_sockops_tcp_hdr_options Joanne Koong
2021-10-06 23:47   ` Song Liu
2021-10-06 23:05 ` [PATCH bpf-next v2 3/3] bpf/selftests: Add xdp bpf_load_tcp_hdr_options tests Joanne Koong
2021-10-06 23:52   ` Song Liu
2021-10-07 14:41 ` [PATCH bpf-next v2 0/3] Add XDP support for bpf_load_hdr_opt Toke Høiland-Jørgensen
2021-10-07 20:57   ` Joanne Koong
2021-10-07 21:25     ` Daniel Borkmann
2021-10-07 23:52       ` Martin KaFai Lau
2021-10-08 22:20         ` Toke Høiland-Jørgensen
2021-10-11 18:43           ` Martin KaFai Lau
2021-10-12 14:11             ` Toke Høiland-Jørgensen
2021-10-12 20:51               ` Joanne Koong
2021-10-13 10:19                 ` Toke Høiland-Jørgensen
2021-10-19  0:00           ` Alexei Starovoitov
2021-10-19 16:02             ` Yonghong Song
2021-10-19 16:10             ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.