netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect
@ 2023-08-24 14:39 Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for " Liu Jian
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Liu Jian @ 2023-08-24 14:39 UTC (permalink / raw)
  To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern
  Cc: netdev, bpf, liujian56

v2->v3: Change BPF_F_PERMANENTLY to BPF_F_PERMANENT.
	Modified the relationship with apply/cork_bytes.
	And change the two helpers's description

Liu Jian (7):
  bpf, sockmap: add BPF_F_PERMANENT flag for skmsg redirect
  selftests/bpf: Add txmsg permanently test for sockmap
  selftests/bpf: Add txmsg redir permanently test for sockmap
  selftests/bpf: add skmsg verdict tests
  selftests/bpf: add two skmsg verdict tests for BPF_F_PERMANENT flag
  selftests/bpf: add tests for verdict skmsg to itself
  selftests/bpf: add tests for verdict skmsg to closed socket

 include/linux/skmsg.h                         |   1 +
 include/uapi/linux/bpf.h                      |  15 ++-
 net/core/skmsg.c                              |   5 +
 net/core/sock_map.c                           |   4 +-
 net/ipv4/tcp_bpf.c                            |  18 ++-
 tools/include/uapi/linux/bpf.h                |  15 ++-
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 122 ++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_kern.h   |   3 +-
 .../bpf/progs/test_sockmap_msg_verdict.c      |  25 ++++
 tools/testing/selftests/bpf/test_sockmap.c    |  41 +++++-
 10 files changed, 234 insertions(+), 15 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_msg_verdict.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for skmsg redirect
  2023-08-24 14:39 [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect Liu Jian
@ 2023-08-24 14:39 ` Liu Jian
  2023-08-25 13:04   ` Jakub Sitnicki
  2023-08-24 14:39 ` [PATCH bpf-next v3 2/7] selftests/bpf: Add txmsg permanently test for sockmap Liu Jian
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Liu Jian @ 2023-08-24 14:39 UTC (permalink / raw)
  To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern
  Cc: netdev, bpf, liujian56

If the sockmap msg redirection function is used only to forward packets
and no other operation, the execution result of the BPF_SK_MSG_VERDICT
program is the same each time. In this case, the BPF program only needs to
be run once. Add BPF_F_PERMANENT flag to bpf_msg_redirect_map() and
bpf_msg_redirect_hash() to implement this ability.

Then we can enable this function in the bpf program as follows:
bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENT);

Test results using netperf  TCP_STREAM mode:
for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
done

before:
3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
after:
4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 include/linux/skmsg.h          |  1 +
 include/uapi/linux/bpf.h       | 15 +++++++++++++--
 net/core/skmsg.c               |  5 +++++
 net/core/sock_map.c            |  4 ++--
 net/ipv4/tcp_bpf.c             | 18 +++++++++++++-----
 tools/include/uapi/linux/bpf.h | 15 +++++++++++++--
 6 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 054d7911bfc9..2f4e9811ff85 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -82,6 +82,7 @@ struct sk_psock {
 	u32				cork_bytes;
 	u32				eval;
 	bool				redir_ingress; /* undefined if sk_redir is null */
+	bool				redir_permanent;
 	struct sk_msg			*cork;
 	struct sk_psock_progs		progs;
 #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 70da85200695..f4de1ba390b4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3004,7 +3004,12 @@ union bpf_attr {
  * 		egress interfaces can be used for redirection. The
  * 		**BPF_F_INGRESS** value in *flags* is used to make the
  * 		distinction (ingress path is selected if the flag is present,
- * 		egress path otherwise). This is the only flag supported for now.
+ * 		egress path otherwise). The **BPF_F_PERMANENT** value in
+ *		*flags* is used to indicates whether the eBPF result is
+ *		permanently (please note that, BPF_F_PERMANENT does not work with
+ *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
+ *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
+ *		automatically disabled).
  * 	Return
  * 		**SK_PASS** on success, or **SK_DROP** on error.
  *
@@ -3276,7 +3281,12 @@ union bpf_attr {
  *		egress interfaces can be used for redirection. The
  *		**BPF_F_INGRESS** value in *flags* is used to make the
  *		distinction (ingress path is selected if the flag is present,
- *		egress path otherwise). This is the only flag supported for now.
+ *		egress path otherwise). The **BPF_F_PERMANENT** value in
+ *		*flags* is used to indicates whether the eBPF result is
+ *		permanently (please note that, BPF_F_PERMANENT does not work with
+ *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
+ *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
+ *		automatically disabled).
  *	Return
  *		**SK_PASS** on success, or **SK_DROP** on error.
  *
@@ -5872,6 +5882,7 @@ enum {
 /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
 enum {
 	BPF_F_INGRESS			= (1ULL << 0),
+	BPF_F_PERMANENT			= (1ULL << 1),
 };
 
 /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index a29508e1ff35..df1443cf5fbd 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -885,6 +885,11 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 			goto out;
 		}
 		psock->redir_ingress = sk_msg_to_ingress(msg);
+		if (!msg->apply_bytes && !msg->cork_bytes)
+			psock->redir_permanent =
+				msg->flags & BPF_F_PERMANENT;
+		else
+			psock->redir_permanent = false;
 		psock->sk_redir = msg->sk_redir;
 		sock_hold(psock->sk_redir);
 	}
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 08ab108206bf..35a361614f5e 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, msg,
 {
 	struct sock *sk;
 
-	if (unlikely(flags & ~(BPF_F_INGRESS)))
+	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENT)))
 		return SK_DROP;
 
 	sk = __sock_map_lookup_elem(map, key);
@@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg *, msg,
 {
 	struct sock *sk;
 
-	if (unlikely(flags & ~(BPF_F_INGRESS)))
+	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENT)))
 		return SK_DROP;
 
 	sk = __sock_hash_lookup_elem(map, key);
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 81f0dff69e0b..b53e356562a6 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 		if (!psock->apply_bytes) {
 			/* Clean up before releasing the sock lock. */
 			eval = psock->eval;
-			psock->eval = __SK_NONE;
-			psock->sk_redir = NULL;
+			if (!psock->redir_permanent) {
+				psock->eval = __SK_NONE;
+				psock->sk_redir = NULL;
+			}
 		}
 		if (psock->cork) {
 			cork = true;
@@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 		ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
 					    msg, tosend, flags);
 		sent = origsize - msg->sg.size;
+		/* disable the ability when something wrong */
+		if (unlikely(ret < 0))
+			psock->redir_permanent = 0;
 
-		if (eval == __SK_REDIRECT)
+		if (!psock->redir_permanent && eval == __SK_REDIRECT) {
 			sock_put(sk_redir);
+			psock->sk_redir = NULL;
+			psock->eval = __SK_NONE;
+		}
 
 		lock_sock(sk);
 		if (unlikely(ret < 0)) {
@@ -460,8 +468,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
 	}
 
 	if (likely(!ret)) {
-		if (!psock->apply_bytes) {
-			psock->eval =  __SK_NONE;
+		if (!psock->apply_bytes && !psock->redir_permanent) {
+			psock->eval = __SK_NONE;
 			if (psock->sk_redir) {
 				sock_put(psock->sk_redir);
 				psock->sk_redir = NULL;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 70da85200695..f4de1ba390b4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3004,7 +3004,12 @@ union bpf_attr {
  * 		egress interfaces can be used for redirection. The
  * 		**BPF_F_INGRESS** value in *flags* is used to make the
  * 		distinction (ingress path is selected if the flag is present,
- * 		egress path otherwise). This is the only flag supported for now.
+ * 		egress path otherwise). The **BPF_F_PERMANENT** value in
+ *		*flags* is used to indicates whether the eBPF result is
+ *		permanently (please note that, BPF_F_PERMANENT does not work with
+ *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
+ *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
+ *		automatically disabled).
  * 	Return
  * 		**SK_PASS** on success, or **SK_DROP** on error.
  *
@@ -3276,7 +3281,12 @@ union bpf_attr {
  *		egress interfaces can be used for redirection. The
  *		**BPF_F_INGRESS** value in *flags* is used to make the
  *		distinction (ingress path is selected if the flag is present,
- *		egress path otherwise). This is the only flag supported for now.
+ *		egress path otherwise). The **BPF_F_PERMANENT** value in
+ *		*flags* is used to indicates whether the eBPF result is
+ *		permanently (please note that, BPF_F_PERMANENT does not work with
+ *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
+ *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
+ *		automatically disabled).
  *	Return
  *		**SK_PASS** on success, or **SK_DROP** on error.
  *
@@ -5872,6 +5882,7 @@ enum {
 /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
 enum {
 	BPF_F_INGRESS			= (1ULL << 0),
+	BPF_F_PERMANENT			= (1ULL << 1),
 };
 
 /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 2/7] selftests/bpf: Add txmsg permanently test for sockmap
  2023-08-24 14:39 [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for " Liu Jian
@ 2023-08-24 14:39 ` Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 3/7] selftests/bpf: Add txmsg redir " Liu Jian
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Liu Jian @ 2023-08-24 14:39 UTC (permalink / raw)
  To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern
  Cc: netdev, bpf, liujian56

Add one test for txmsg ingress permanently test for sockmap.

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 024a0faafb3b..845e1580e86e 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -77,6 +77,7 @@ int txmsg_end_push;
 int txmsg_start_pop;
 int txmsg_pop;
 int txmsg_ingress;
+int txmsg_permanent;
 int txmsg_redir_skb;
 int txmsg_ktls_skb;
 int txmsg_ktls_skb_drop;
@@ -107,6 +108,7 @@ static const struct option long_options[] = {
 	{"txmsg_start_pop",  required_argument,	NULL, 'w'},
 	{"txmsg_pop",	     required_argument,	NULL, 'x'},
 	{"txmsg_ingress", no_argument,		&txmsg_ingress, 1 },
+	{"txmsg_permanent", no_argument,	&txmsg_permanent, 1 },
 	{"txmsg_redir_skb", no_argument,	&txmsg_redir_skb, 1 },
 	{"ktls", no_argument,			&ktls, 1 },
 	{"peek", no_argument,			&peek_flag, 1 },
@@ -175,7 +177,7 @@ static void test_reset(void)
 	txmsg_start_push = txmsg_end_push = 0;
 	txmsg_pass = txmsg_drop = txmsg_redir = 0;
 	txmsg_apply = txmsg_cork = 0;
-	txmsg_ingress = txmsg_redir_skb = 0;
+	txmsg_ingress = txmsg_permanent = txmsg_redir_skb = 0;
 	txmsg_ktls_skb = txmsg_ktls_skb_drop = txmsg_ktls_skb_redir = 0;
 	txmsg_omit_skb_parser = 0;
 	skb_use_parser = 0;
@@ -1167,6 +1169,9 @@ static int run_options(struct sockmap_options *options, int cg_fd,  int test)
 		if (txmsg_ingress) {
 			int in = BPF_F_INGRESS;
 
+			if (txmsg_permanent)
+				in |= BPF_F_PERMANENT;
+
 			i = 0;
 			err = bpf_map_update_elem(map_fd[6], &i, &in, BPF_ANY);
 			if (err) {
@@ -1506,6 +1511,14 @@ static void test_txmsg_ingress_redir(int cgrp, struct sockmap_options *opt)
 	test_send(opt, cgrp);
 }
 
+static void test_txmsg_ingress_redir_permanent(int cgrp, struct sockmap_options *opt)
+{
+	txmsg_pass = txmsg_drop = 0;
+	txmsg_ingress = txmsg_redir = 1;
+	txmsg_permanent = 1;
+	test_send(opt, cgrp);
+}
+
 static void test_txmsg_skb(int cgrp, struct sockmap_options *opt)
 {
 	bool data = opt->data_test;
@@ -1862,6 +1875,7 @@ struct _test test[] = {
 	{"txmsg test redirect wait send mem", test_txmsg_redir_wait_sndmem},
 	{"txmsg test drop", test_txmsg_drop},
 	{"txmsg test ingress redirect", test_txmsg_ingress_redir},
+	{"txmsg test ingress redirect permanent", test_txmsg_ingress_redir_permanent},
 	{"txmsg test skb", test_txmsg_skb},
 	{"txmsg test apply", test_txmsg_apply},
 	{"txmsg test cork", test_txmsg_cork},
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 3/7] selftests/bpf: Add txmsg redir permanently test for sockmap
  2023-08-24 14:39 [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for " Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 2/7] selftests/bpf: Add txmsg permanently test for sockmap Liu Jian
@ 2023-08-24 14:39 ` Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 4/7] selftests/bpf: add skmsg verdict tests Liu Jian
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Liu Jian @ 2023-08-24 14:39 UTC (permalink / raw)
  To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern
  Cc: netdev, bpf, liujian56

Add one test for txmsg redir permanently test for sockmap.

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 .../selftests/bpf/progs/test_sockmap_kern.h   |  3 ++-
 tools/testing/selftests/bpf/test_sockmap.c    | 27 ++++++++++++++++---
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_kern.h b/tools/testing/selftests/bpf/progs/test_sockmap_kern.h
index 99d2ea9fb658..b0a2ddd55b83 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_kern.h
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_kern.h
@@ -298,8 +298,9 @@ int bpf_prog6(struct sk_msg_md *msg)
 
 	f = bpf_map_lookup_elem(&sock_redir_flags, &zero);
 	if (f && *f) {
-		key = 2;
 		flags = *f;
+		if (flags & BPF_F_INGRESS)
+			key = 2;
 	}
 #ifdef SOCKMAP
 	return bpf_msg_redirect_map(msg, &sock_map_redir, key, flags);
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 845e1580e86e..c602ac8780a8 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1166,14 +1166,27 @@ static int run_options(struct sockmap_options *options, int cg_fd,  int test)
 
 		}
 
+		if (txmsg_permanent) {
+			int txmsg_flag = BPF_F_PERMANENT;
+
+			i = 0;
+			err = bpf_map_update_elem(map_fd[6], &i, &txmsg_flag, BPF_ANY);
+			if (err) {
+				fprintf(stderr,
+					"ERROR: bpf_map_update_elem (txmsg_permanent): %d (%s)\n",
+					err, strerror(errno));
+				goto out;
+			}
+		}
+
 		if (txmsg_ingress) {
-			int in = BPF_F_INGRESS;
+			int txmsg_flag = BPF_F_INGRESS;
 
 			if (txmsg_permanent)
-				in |= BPF_F_PERMANENT;
+				txmsg_flag |= BPF_F_PERMANENT;
 
 			i = 0;
-			err = bpf_map_update_elem(map_fd[6], &i, &in, BPF_ANY);
+			err = bpf_map_update_elem(map_fd[6], &i, &txmsg_flag, BPF_ANY);
 			if (err) {
 				fprintf(stderr,
 					"ERROR: bpf_map_update_elem (txmsg_ingress): %d (%s)\n",
@@ -1490,6 +1503,13 @@ static void test_txmsg_redir(int cgrp, struct sockmap_options *opt)
 	test_send(opt, cgrp);
 }
 
+static void test_txmsg_redir_permanent(int cgrp, struct sockmap_options *opt)
+{
+	txmsg_redir = 1;
+	txmsg_permanent = 1;
+	test_send(opt, cgrp);
+}
+
 static void test_txmsg_redir_wait_sndmem(int cgrp, struct sockmap_options *opt)
 {
 	txmsg_redir = 1;
@@ -1872,6 +1892,7 @@ static int populate_progs(char *bpf_file)
 struct _test test[] = {
 	{"txmsg test passthrough", test_txmsg_pass},
 	{"txmsg test redirect", test_txmsg_redir},
+	{"txmsg test redirect permanent", test_txmsg_redir_permanent},
 	{"txmsg test redirect wait send mem", test_txmsg_redir_wait_sndmem},
 	{"txmsg test drop", test_txmsg_drop},
 	{"txmsg test ingress redirect", test_txmsg_ingress_redir},
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 4/7] selftests/bpf: add skmsg verdict tests
  2023-08-24 14:39 [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect Liu Jian
                   ` (2 preceding siblings ...)
  2023-08-24 14:39 ` [PATCH bpf-next v3 3/7] selftests/bpf: Add txmsg redir " Liu Jian
@ 2023-08-24 14:39 ` Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 5/7] selftests/bpf: add two skmsg verdict tests for BPF_F_PERMANENT flag Liu Jian
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Liu Jian @ 2023-08-24 14:39 UTC (permalink / raw)
  To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern
  Cc: netdev, bpf, liujian56

Add two normal skmsg verdict tests in sockmap_basic.c

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 71 +++++++++++++++++++
 .../bpf/progs/test_sockmap_msg_verdict.c      | 25 +++++++
 2 files changed, 96 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_msg_verdict.c

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 064cc5e8d9ad..93bf1907abd9 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -12,6 +12,7 @@
 #include "test_sockmap_progs_query.skel.h"
 #include "test_sockmap_pass_prog.skel.h"
 #include "test_sockmap_drop_prog.skel.h"
+#include "test_sockmap_msg_verdict.skel.h"
 #include "bpf_iter_sockmap.skel.h"
 
 #include "sockmap_helpers.h"
@@ -475,6 +476,72 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog)
 		test_sockmap_drop_prog__destroy(drop);
 }
 
+static void test_sockmap_msg_verdict(bool is_ingress)
+{
+	int key, sent, recvd, recv_fd;
+	int err, map, verdict, s, c0, c1, p0, p1;
+	struct test_sockmap_msg_verdict *skel;
+	char buf[256] = "0123456789";
+
+	skel = test_sockmap_msg_verdict__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+	verdict = bpf_program__fd(skel->progs.prog_skmsg_verdict);
+	map = bpf_map__fd(skel->maps.sock_map);
+
+
+	err = bpf_prog_attach(verdict, map, BPF_SK_MSG_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach"))
+		goto out;
+
+	s = socket_loopback(AF_INET, SOCK_STREAM);
+	if (!ASSERT_GT(s, -1, "socket_loopback(s)"))
+		goto out;
+	err = create_socket_pairs(s, AF_INET, SOCK_STREAM, &c0, &c1, &p0, &p1);
+	if (!ASSERT_OK(err, "create_socket_pairs(s)"))
+		goto out;
+
+	key = 0;
+	err = bpf_map_update_elem(map, &key, &p1, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(key0)"))
+		goto out_close;
+	key = 1;
+	err = bpf_map_update_elem(map, &key, &c1, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(key1)"))
+		goto out_close;
+	key = 2;
+	err = bpf_map_update_elem(map, &key, &p0, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(key2)"))
+		goto out_close;
+	key = 3;
+	err = bpf_map_update_elem(map, &key, &c0, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(key3)"))
+		goto out_close;
+
+	if (is_ingress) {
+		recv_fd = c1;
+		skel->bss->skmsg_redir_flags = BPF_F_INGRESS;
+		skel->bss->skmsg_redir_key = 1;
+	} else {
+		recv_fd = c0;
+		skel->bss->skmsg_redir_flags = 0;
+		skel->bss->skmsg_redir_key = 2;
+	}
+
+	sent = xsend(p1, &buf, sizeof(buf), 0);
+	ASSERT_EQ(sent, sizeof(buf), "xsend(p1)");
+	recvd = recv_timeout(recv_fd, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC);
+	ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(recv_fd)");
+
+out_close:
+	close(c0);
+	close(p0);
+	close(c1);
+	close(p1);
+out:
+	test_sockmap_msg_verdict__destroy(skel);
+}
+
 void test_sockmap_basic(void)
 {
 	if (test__start_subtest("sockmap create_update_free"))
@@ -515,4 +582,8 @@ void test_sockmap_basic(void)
 		test_sockmap_skb_verdict_fionread(true);
 	if (test__start_subtest("sockmap skb_verdict fionread on drop"))
 		test_sockmap_skb_verdict_fionread(false);
+	if (test__start_subtest("sockmap msg_verdict"))
+		test_sockmap_msg_verdict(false);
+	if (test__start_subtest("sockmap msg_verdict ingress"))
+		test_sockmap_msg_verdict(true);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_msg_verdict.c b/tools/testing/selftests/bpf/progs/test_sockmap_msg_verdict.c
new file mode 100644
index 000000000000..002b76a1ae35
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_msg_verdict.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, 4);
+	__type(key, int);
+	__type(value, int);
+} sock_map SEC(".maps");
+
+u64 skmsg_redir_flags = 0;
+u32 skmsg_redir_key = 0;
+
+SEC("sk_msg")
+int prog_skmsg_verdict(struct sk_msg_md *msg)
+{
+	u64 flags = skmsg_redir_flags;
+	int key = skmsg_redir_key;
+
+	bpf_msg_redirect_map(msg, &sock_map, key, flags);
+	return SK_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 5/7] selftests/bpf: add two skmsg verdict tests for BPF_F_PERMANENT flag
  2023-08-24 14:39 [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect Liu Jian
                   ` (3 preceding siblings ...)
  2023-08-24 14:39 ` [PATCH bpf-next v3 4/7] selftests/bpf: add skmsg verdict tests Liu Jian
@ 2023-08-24 14:39 ` Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 6/7] selftests/bpf: add tests for verdict skmsg to itself Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 7/7] selftests/bpf: add tests for verdict skmsg to closed socket Liu Jian
  6 siblings, 0 replies; 13+ messages in thread
From: Liu Jian @ 2023-08-24 14:39 UTC (permalink / raw)
  To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern
  Cc: netdev, bpf, liujian56

Add two tests for BPF_F_PERMANENT flag in sockmap_basic.c.

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c    | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 93bf1907abd9..1a29e76fe29f 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -476,7 +476,7 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog)
 		test_sockmap_drop_prog__destroy(drop);
 }
 
-static void test_sockmap_msg_verdict(bool is_ingress)
+static void test_sockmap_msg_verdict(bool is_ingress, bool is_permanent)
 {
 	int key, sent, recvd, recv_fd;
 	int err, map, verdict, s, c0, c1, p0, p1;
@@ -528,11 +528,18 @@ static void test_sockmap_msg_verdict(bool is_ingress)
 		skel->bss->skmsg_redir_key = 2;
 	}
 
+	if (is_permanent)
+		skel->bss->skmsg_redir_flags |= BPF_F_PERMANENT;
+
 	sent = xsend(p1, &buf, sizeof(buf), 0);
 	ASSERT_EQ(sent, sizeof(buf), "xsend(p1)");
 	recvd = recv_timeout(recv_fd, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC);
 	ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(recv_fd)");
 
+	sent = xsend(p1, &buf, sizeof(buf), 0);
+	ASSERT_EQ(sent, sizeof(buf), "xsend(p1)");
+	recvd = recv_timeout(recv_fd, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC);
+	ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(recv_fd)");
 out_close:
 	close(c0);
 	close(p0);
@@ -583,7 +590,11 @@ void test_sockmap_basic(void)
 	if (test__start_subtest("sockmap skb_verdict fionread on drop"))
 		test_sockmap_skb_verdict_fionread(false);
 	if (test__start_subtest("sockmap msg_verdict"))
-		test_sockmap_msg_verdict(false);
+		test_sockmap_msg_verdict(false, false);
 	if (test__start_subtest("sockmap msg_verdict ingress"))
-		test_sockmap_msg_verdict(true);
+		test_sockmap_msg_verdict(true, false);
+	if (test__start_subtest("sockmap msg_verdict permanent"))
+		test_sockmap_msg_verdict(false, true);
+	if (test__start_subtest("sockmap msg_verdict ingress permanent"))
+		test_sockmap_msg_verdict(true, true);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 6/7] selftests/bpf: add tests for verdict skmsg to itself
  2023-08-24 14:39 [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect Liu Jian
                   ` (4 preceding siblings ...)
  2023-08-24 14:39 ` [PATCH bpf-next v3 5/7] selftests/bpf: add two skmsg verdict tests for BPF_F_PERMANENT flag Liu Jian
@ 2023-08-24 14:39 ` Liu Jian
  2023-08-24 14:39 ` [PATCH bpf-next v3 7/7] selftests/bpf: add tests for verdict skmsg to closed socket Liu Jian
  6 siblings, 0 replies; 13+ messages in thread
From: Liu Jian @ 2023-08-24 14:39 UTC (permalink / raw)
  To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern
  Cc: netdev, bpf, liujian56

Add tests for verdict skmsg to itself in sockmap_basic.c

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 32 +++++++++++++------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 1a29e76fe29f..1fcfa30720c6 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -476,7 +476,7 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog)
 		test_sockmap_drop_prog__destroy(drop);
 }
 
-static void test_sockmap_msg_verdict(bool is_ingress, bool is_permanent)
+static void test_sockmap_msg_verdict(bool is_ingress, bool is_permanent, bool is_self)
 {
 	int key, sent, recvd, recv_fd;
 	int err, map, verdict, s, c0, c1, p0, p1;
@@ -519,13 +519,23 @@ static void test_sockmap_msg_verdict(bool is_ingress, bool is_permanent)
 		goto out_close;
 
 	if (is_ingress) {
-		recv_fd = c1;
 		skel->bss->skmsg_redir_flags = BPF_F_INGRESS;
-		skel->bss->skmsg_redir_key = 1;
+		if (is_self) {
+			skel->bss->skmsg_redir_key = 0;
+			recv_fd = p1;
+		} else {
+			skel->bss->skmsg_redir_key = 1;
+			recv_fd = c1;
+		}
 	} else {
-		recv_fd = c0;
 		skel->bss->skmsg_redir_flags = 0;
-		skel->bss->skmsg_redir_key = 2;
+		if (is_self) {
+			skel->bss->skmsg_redir_key = 0;
+			recv_fd = c1;
+		} else {
+			skel->bss->skmsg_redir_key = 2;
+			recv_fd = c0;
+		}
 	}
 
 	if (is_permanent)
@@ -590,11 +600,15 @@ void test_sockmap_basic(void)
 	if (test__start_subtest("sockmap skb_verdict fionread on drop"))
 		test_sockmap_skb_verdict_fionread(false);
 	if (test__start_subtest("sockmap msg_verdict"))
-		test_sockmap_msg_verdict(false, false);
+		test_sockmap_msg_verdict(false, false, false);
 	if (test__start_subtest("sockmap msg_verdict ingress"))
-		test_sockmap_msg_verdict(true, false);
+		test_sockmap_msg_verdict(true, false, false);
 	if (test__start_subtest("sockmap msg_verdict permanent"))
-		test_sockmap_msg_verdict(false, true);
+		test_sockmap_msg_verdict(false, true, false);
 	if (test__start_subtest("sockmap msg_verdict ingress permanent"))
-		test_sockmap_msg_verdict(true, true);
+		test_sockmap_msg_verdict(true, true, false);
+	if (test__start_subtest("sockmap msg_verdict permanent self"))
+		test_sockmap_msg_verdict(false, true, true);
+	if (test__start_subtest("sockmap msg_verdict ingress permanent self"))
+		test_sockmap_msg_verdict(true, true, true);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH bpf-next v3 7/7] selftests/bpf: add tests for verdict skmsg to closed socket
  2023-08-24 14:39 [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect Liu Jian
                   ` (5 preceding siblings ...)
  2023-08-24 14:39 ` [PATCH bpf-next v3 6/7] selftests/bpf: add tests for verdict skmsg to itself Liu Jian
@ 2023-08-24 14:39 ` Liu Jian
  6 siblings, 0 replies; 13+ messages in thread
From: Liu Jian @ 2023-08-24 14:39 UTC (permalink / raw)
  To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern
  Cc: netdev, bpf, liujian56

Add four tests for verdict skmsg to closed socket in sockmap_basic.c.

Signed-off-by: Liu Jian <liujian56@huawei.com>
---
 .../selftests/bpf/prog_tests/sockmap_basic.c  | 42 +++++++++++++++----
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 1fcfa30720c6..dabea0997982 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -476,9 +476,10 @@ static void test_sockmap_skb_verdict_fionread(bool pass_prog)
 		test_sockmap_drop_prog__destroy(drop);
 }
 
-static void test_sockmap_msg_verdict(bool is_ingress, bool is_permanent, bool is_self)
+static void test_sockmap_msg_verdict(bool is_ingress, bool is_permanent, bool is_self,
+				     bool target_shutdown)
 {
-	int key, sent, recvd, recv_fd;
+	int key, sent, recvd, recv_fd, target_fd;
 	int err, map, verdict, s, c0, c1, p0, p1;
 	struct test_sockmap_msg_verdict *skel;
 	char buf[256] = "0123456789";
@@ -522,18 +523,22 @@ static void test_sockmap_msg_verdict(bool is_ingress, bool is_permanent, bool is
 		skel->bss->skmsg_redir_flags = BPF_F_INGRESS;
 		if (is_self) {
 			skel->bss->skmsg_redir_key = 0;
+			target_fd = p1;
 			recv_fd = p1;
 		} else {
 			skel->bss->skmsg_redir_key = 1;
+			target_fd = c1;
 			recv_fd = c1;
 		}
 	} else {
 		skel->bss->skmsg_redir_flags = 0;
 		if (is_self) {
 			skel->bss->skmsg_redir_key = 0;
+			target_fd = p1;
 			recv_fd = c1;
 		} else {
 			skel->bss->skmsg_redir_key = 2;
+			target_fd = p0;
 			recv_fd = c0;
 		}
 	}
@@ -546,6 +551,19 @@ static void test_sockmap_msg_verdict(bool is_ingress, bool is_permanent, bool is
 	recvd = recv_timeout(recv_fd, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC);
 	ASSERT_EQ(recvd, sizeof(buf), "recv_timeout(recv_fd)");
 
+	if (target_shutdown) {
+		signal(SIGPIPE, SIG_IGN);
+		close(target_fd);
+		sent = send(p1, &buf, sizeof(buf), 0);
+		if (is_permanent) {
+			ASSERT_EQ(sent, -1, "xsend(p1)");
+			ASSERT_EQ(errno, EPIPE, "xsend(p1)");
+		} else {
+			ASSERT_EQ(sent, sizeof(buf), "xsend(p1)");
+		}
+		goto out_close;
+	}
+
 	sent = xsend(p1, &buf, sizeof(buf), 0);
 	ASSERT_EQ(sent, sizeof(buf), "xsend(p1)");
 	recvd = recv_timeout(recv_fd, &buf, sizeof(buf), SOCK_NONBLOCK, IO_TIMEOUT_SEC);
@@ -600,15 +618,23 @@ void test_sockmap_basic(void)
 	if (test__start_subtest("sockmap skb_verdict fionread on drop"))
 		test_sockmap_skb_verdict_fionread(false);
 	if (test__start_subtest("sockmap msg_verdict"))
-		test_sockmap_msg_verdict(false, false, false);
+		test_sockmap_msg_verdict(false, false, false, false);
 	if (test__start_subtest("sockmap msg_verdict ingress"))
-		test_sockmap_msg_verdict(true, false, false);
+		test_sockmap_msg_verdict(true, false, false, false);
 	if (test__start_subtest("sockmap msg_verdict permanent"))
-		test_sockmap_msg_verdict(false, true, false);
+		test_sockmap_msg_verdict(false, true, false, false);
 	if (test__start_subtest("sockmap msg_verdict ingress permanent"))
-		test_sockmap_msg_verdict(true, true, false);
+		test_sockmap_msg_verdict(true, true, false, false);
 	if (test__start_subtest("sockmap msg_verdict permanent self"))
-		test_sockmap_msg_verdict(false, true, true);
+		test_sockmap_msg_verdict(false, true, true, false);
 	if (test__start_subtest("sockmap msg_verdict ingress permanent self"))
-		test_sockmap_msg_verdict(true, true, true);
+		test_sockmap_msg_verdict(true, true, true, false);
+	if (test__start_subtest("sockmap msg_verdict permanent shutdown"))
+		test_sockmap_msg_verdict(false, true, false, true);
+	if (test__start_subtest("sockmap msg_verdict ingress permanent shutdown"))
+		test_sockmap_msg_verdict(true, true, false, true);
+	if (test__start_subtest("sockmap msg_verdict shutdown"))
+		test_sockmap_msg_verdict(false, false, false, true);
+	if (test__start_subtest("sockmap msg_verdict ingress shutdown"))
+		test_sockmap_msg_verdict(true, false, false, true);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for skmsg redirect
  2023-08-24 14:39 ` [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for " Liu Jian
@ 2023-08-25 13:04   ` Jakub Sitnicki
  2023-08-26  1:32     ` John Fastabend
  2023-08-26 11:53     ` liujian (CE)
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2023-08-25 13:04 UTC (permalink / raw)
  To: Liu Jian
  Cc: john.fastabend, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern, netdev, bpf

On Thu, Aug 24, 2023 at 10:39 PM +08, Liu Jian wrote:
> If the sockmap msg redirection function is used only to forward packets
> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
> program is the same each time. In this case, the BPF program only needs to
> be run once. Add BPF_F_PERMANENT flag to bpf_msg_redirect_map() and
> bpf_msg_redirect_hash() to implement this ability.
>
> Then we can enable this function in the bpf program as follows:
> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENT);
>
> Test results using netperf  TCP_STREAM mode:
> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
> done
>
> before:
> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
> after:
> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
>
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
>  include/linux/skmsg.h          |  1 +
>  include/uapi/linux/bpf.h       | 15 +++++++++++++--
>  net/core/skmsg.c               |  5 +++++
>  net/core/sock_map.c            |  4 ++--
>  net/ipv4/tcp_bpf.c             | 18 +++++++++++++-----
>  tools/include/uapi/linux/bpf.h | 15 +++++++++++++--
>  6 files changed, 47 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 054d7911bfc9..2f4e9811ff85 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -82,6 +82,7 @@ struct sk_psock {
>  	u32				cork_bytes;
>  	u32				eval;
>  	bool				redir_ingress; /* undefined if sk_redir is null */
> +	bool				redir_permanent;
>  	struct sk_msg			*cork;
>  	struct sk_psock_progs		progs;
>  #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 70da85200695..f4de1ba390b4 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3004,7 +3004,12 @@ union bpf_attr {
>   * 		egress interfaces can be used for redirection. The
>   * 		**BPF_F_INGRESS** value in *flags* is used to make the
>   * 		distinction (ingress path is selected if the flag is present,
> - * 		egress path otherwise). This is the only flag supported for now.
> + * 		egress path otherwise). The **BPF_F_PERMANENT** value in
> + *		*flags* is used to indicates whether the eBPF result is
> + *		permanently (please note that, BPF_F_PERMANENT does not work with
> + *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
> + *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
> + *		automatically disabled).

There are some grammar mistakes here we need to fix. Hint - I find it
helpful to run the text through an online grammar checker or an AI
chatbot when unsure.

Either way, let's reword so the flags are clearly listed out, since we
now have two of them:

The following *flags* are supported:

**BPF_F_INGRESS**
        Both ingress and egress interfaces can be used for redirection.
        The **BPF_F_INGRESS** value in *flags* is used to make the
        distinction. Ingress path is selected if the flag is present,
        egress path otherwise.
**BPF_F_PERMANENT**
        Indicates that redirect verdict and the target socket should be
        remembered. The verdict program will not be run for subsequent
        packets, unless an error occurs when forwarding packets.

        **BPF_F_PERMANENT** cannot be use together with
        **bpf_msg_apply_bytes**\ () and **bpf_msg_cork_bytes**\ (). If
        either has been called, the **BPF_F_PERMANENT** flag is ignored.


Please check the formatting is correct with:

./scripts/bpf_doc.py --filename include/uapi/linux/bpf.h \
  | rst2man /dev/stdin | man /dev/stdin

That said, I'm not sure I like these semantics - flag being ignored
under some circumstances. It leads to a silent failure.

If I've asked for a permanent redirect, but it is cannot work in my BPF
program, I'd rather get an error from bpf_msg_redirect_map/hash(), than
be surprised that it is getting executed for every packet and have to
troubleshoot why.

>   * 	Return
>   * 		**SK_PASS** on success, or **SK_DROP** on error.
>   *
> @@ -3276,7 +3281,12 @@ union bpf_attr {
>   *		egress interfaces can be used for redirection. The
>   *		**BPF_F_INGRESS** value in *flags* is used to make the
>   *		distinction (ingress path is selected if the flag is present,
> - *		egress path otherwise). This is the only flag supported for now.
> + *		egress path otherwise). The **BPF_F_PERMANENT** value in
> + *		*flags* is used to indicates whether the eBPF result is
> + *		permanently (please note that, BPF_F_PERMANENT does not work with
> + *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
> + *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
> + *		automatically disabled).
>   *	Return
>   *		**SK_PASS** on success, or **SK_DROP** on error.
>   *
> @@ -5872,6 +5882,7 @@ enum {
>  /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
>  enum {
>  	BPF_F_INGRESS			= (1ULL << 0),
> +	BPF_F_PERMANENT			= (1ULL << 1),
>  };
>  
>  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index a29508e1ff35..df1443cf5fbd 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -885,6 +885,11 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
>  			goto out;
>  		}
>  		psock->redir_ingress = sk_msg_to_ingress(msg);
> +		if (!msg->apply_bytes && !msg->cork_bytes)
> +			psock->redir_permanent =
> +				msg->flags & BPF_F_PERMANENT;
> +		else
> +			psock->redir_permanent = false;

Above can be rewritten as:

		psock->redir_permanent = !msg->apply_bytes &&
					 !msg->cork_bytes &&
					 (msg->flags & BPF_F_PERMANENT);

But as I wrote earlier, I don't think it's a good idea to ignore the
flag. We can detect this conflict at the time the bpf_msg_sk_redirect_*
helper is called and return an error.

Naturally that means that that bpf_msg_{cork,apply}_bytes helpers need
to be adjusted to return an error if BPF_F_PERMANENT has been set.

>  		psock->sk_redir = msg->sk_redir;
>  		sock_hold(psock->sk_redir);
>  	}
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 08ab108206bf..35a361614f5e 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, msg,
>  {
>  	struct sock *sk;
>  
> -	if (unlikely(flags & ~(BPF_F_INGRESS)))
> +	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENT)))
>  		return SK_DROP;
>  
>  	sk = __sock_map_lookup_elem(map, key);
> @@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg *, msg,
>  {
>  	struct sock *sk;
>  
> -	if (unlikely(flags & ~(BPF_F_INGRESS)))
> +	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENT)))
>  		return SK_DROP;
>  
>  	sk = __sock_hash_lookup_elem(map, key);
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 81f0dff69e0b..b53e356562a6 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		if (!psock->apply_bytes) {
>  			/* Clean up before releasing the sock lock. */
>  			eval = psock->eval;
> -			psock->eval = __SK_NONE;
> -			psock->sk_redir = NULL;
> +			if (!psock->redir_permanent) {
> +				psock->eval = __SK_NONE;
> +				psock->sk_redir = NULL;
> +			}
>  		}
>  		if (psock->cork) {
>  			cork = true;
> @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  		ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
>  					    msg, tosend, flags);
>  		sent = origsize - msg->sg.size;
> +		/* disable the ability when something wrong */
> +		if (unlikely(ret < 0))
> +			psock->redir_permanent = 0;
>  
> -		if (eval == __SK_REDIRECT)
> +		if (!psock->redir_permanent && eval == __SK_REDIRECT) {

I believe eval == __SK_REDIRECT is always true here, and the eval local
variable is redundant. We will be in this switch branch only if
psock->eval == __SK_REDIRECT. It's something we missed during the review
of cd9733f5d75c ("tcp_bpf: Fix one concurrency problem in the
tcp_bpf_send_verdict function").

>  			sock_put(sk_redir);
> +			psock->sk_redir = NULL;
> +			psock->eval = __SK_NONE;
> +		}
>  
>  		lock_sock(sk);
>  		if (unlikely(ret < 0)) {
> @@ -460,8 +468,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  	}
>  
>  	if (likely(!ret)) {
> -		if (!psock->apply_bytes) {
> -			psock->eval =  __SK_NONE;
> +		if (!psock->apply_bytes && !psock->redir_permanent) {
> +			psock->eval = __SK_NONE;
>  			if (psock->sk_redir) {
>  				sock_put(psock->sk_redir);
>  				psock->sk_redir = NULL;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 70da85200695..f4de1ba390b4 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -3004,7 +3004,12 @@ union bpf_attr {
>   * 		egress interfaces can be used for redirection. The
>   * 		**BPF_F_INGRESS** value in *flags* is used to make the
>   * 		distinction (ingress path is selected if the flag is present,
> - * 		egress path otherwise). This is the only flag supported for now.
> + * 		egress path otherwise). The **BPF_F_PERMANENT** value in
> + *		*flags* is used to indicates whether the eBPF result is
> + *		permanently (please note that, BPF_F_PERMANENT does not work with
> + *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
> + *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
> + *		automatically disabled).
>   * 	Return
>   * 		**SK_PASS** on success, or **SK_DROP** on error.
>   *
> @@ -3276,7 +3281,12 @@ union bpf_attr {
>   *		egress interfaces can be used for redirection. The
>   *		**BPF_F_INGRESS** value in *flags* is used to make the
>   *		distinction (ingress path is selected if the flag is present,
> - *		egress path otherwise). This is the only flag supported for now.
> + *		egress path otherwise). The **BPF_F_PERMANENT** value in
> + *		*flags* is used to indicates whether the eBPF result is
> + *		permanently (please note that, BPF_F_PERMANENT does not work with
> + *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
> + *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
> + *		automatically disabled).
>   *	Return
>   *		**SK_PASS** on success, or **SK_DROP** on error.
>   *
> @@ -5872,6 +5882,7 @@ enum {
>  /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
>  enum {
>  	BPF_F_INGRESS			= (1ULL << 0),
> +	BPF_F_PERMANENT			= (1ULL << 1),
>  };
>  
>  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for skmsg redirect
  2023-08-25 13:04   ` Jakub Sitnicki
@ 2023-08-26  1:32     ` John Fastabend
  2023-08-26 11:54       ` liujian (CE)
  2023-08-28  8:13       ` Jakub Sitnicki
  2023-08-26 11:53     ` liujian (CE)
  1 sibling, 2 replies; 13+ messages in thread
From: John Fastabend @ 2023-08-26  1:32 UTC (permalink / raw)
  To: Jakub Sitnicki, Liu Jian
  Cc: john.fastabend, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern, netdev, bpf

Jakub Sitnicki wrote:
> On Thu, Aug 24, 2023 at 10:39 PM +08, Liu Jian wrote:
> > If the sockmap msg redirection function is used only to forward packets
> > and no other operation, the execution result of the BPF_SK_MSG_VERDICT
> > program is the same each time. In this case, the BPF program only needs to
> > be run once. Add BPF_F_PERMANENT flag to bpf_msg_redirect_map() and
> > bpf_msg_redirect_hash() to implement this ability.
> >
> > Then we can enable this function in the bpf program as follows:
> > bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENT);
> >
> > Test results using netperf  TCP_STREAM mode:
> > for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
> > netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
> > done
> >
> > before:
> > 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
> > after:
> > 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
> >
> > Signed-off-by: Liu Jian <liujian56@huawei.com>

[...]

> >  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index a29508e1ff35..df1443cf5fbd 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -885,6 +885,11 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
> >  			goto out;
> >  		}
> >  		psock->redir_ingress = sk_msg_to_ingress(msg);
> > +		if (!msg->apply_bytes && !msg->cork_bytes)
> > +			psock->redir_permanent =
> > +				msg->flags & BPF_F_PERMANENT;
> > +		else
> > +			psock->redir_permanent = false;
> 
> Above can be rewritten as:
> 
> 		psock->redir_permanent = !msg->apply_bytes &&
> 					 !msg->cork_bytes &&
> 					 (msg->flags & BPF_F_PERMANENT);
> 
> But as I wrote earlier, I don't think it's a good idea to ignore the
> flag. We can detect this conflict at the time the bpf_msg_sk_redirect_*
> helper is called and return an error.
> 
> Naturally that means that that bpf_msg_{cork,apply}_bytes helpers need
> to be adjusted to return an error if BPF_F_PERMANENT has been set.

So far we've not really done much to protect a user from doing
rather silly things. The following will all do something without
errors,

  bpf_msg_apply_bytes()
  bpf_msg_apply_bytes() <- reset apply bytes

  bpf_msg_cork_bytes()
  bpf_msg_cork_bytes() <- resets cork byte

also,

  bpf_msg_redirect(..., BPF_F_INGRESS);
  bpf_msg_redirect(..., 0); <- resets sk_redir and flags

maybe there is some valid reason to even do above if further parsing
identifies some reason to redirect to a alert socket or something.

My original thinking was in the interest of not having a bunch of
extra checks for performance reasons we shouldn't add guard rails
unless something really unexpected might happen like a kernel
panic or what not.

This does feel a bit different though because before we
didn't have calls that could impact other calls. My best idea
is to just create a precedence and follow it. I would propose,

'If BPF_F_PERMANENT is set apply_bytes and cork_bytes are
 ignored.'

The other direction (what is above?) has a bit of an inconsistency
where these two flows are different?

  bpf_apply_bytes()
  bpf_msg_redirect(..., BPF_F_PERMANENT)

and

  bpf_msg_redirect(..., BPF_F_PERMANENT)
  bpf_apply_bytes()

It would be best if order of operations doesn't change the
outcome because that starts to get really hard to reason about.

This avoids having to add checks all over the place and then
if users want we could give some mechanisms to read apply
and cork bytes so people could write macros over those if
they really want the hard error.

WDYT?

[...]

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for skmsg redirect
  2023-08-25 13:04   ` Jakub Sitnicki
  2023-08-26  1:32     ` John Fastabend
@ 2023-08-26 11:53     ` liujian (CE)
  1 sibling, 0 replies; 13+ messages in thread
From: liujian (CE) @ 2023-08-26 11:53 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: john.fastabend, ast, daniel, andrii, martin.lau, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet,
	kuba, pabeni, dsahern, netdev, bpf



On 2023/8/25 21:04, Jakub Sitnicki wrote:
> On Thu, Aug 24, 2023 at 10:39 PM +08, Liu Jian wrote:
>> If the sockmap msg redirection function is used only to forward packets
>> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
>> program is the same each time. In this case, the BPF program only needs to
>> be run once. Add BPF_F_PERMANENT flag to bpf_msg_redirect_map() and
>> bpf_msg_redirect_hash() to implement this ability.
>>
>> Then we can enable this function in the bpf program as follows:
>> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENT);
>>
>> Test results using netperf  TCP_STREAM mode:
>> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
>> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
>> done
>>
>> before:
>> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
>> after:
>> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
>>
>> Signed-off-by: Liu Jian <liujian56@huawei.com>
>> ---
>>   include/linux/skmsg.h          |  1 +
>>   include/uapi/linux/bpf.h       | 15 +++++++++++++--
>>   net/core/skmsg.c               |  5 +++++
>>   net/core/sock_map.c            |  4 ++--
>>   net/ipv4/tcp_bpf.c             | 18 +++++++++++++-----
>>   tools/include/uapi/linux/bpf.h | 15 +++++++++++++--
>>   6 files changed, 47 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
>> index 054d7911bfc9..2f4e9811ff85 100644
>> --- a/include/linux/skmsg.h
>> +++ b/include/linux/skmsg.h
>> @@ -82,6 +82,7 @@ struct sk_psock {
>>   	u32				cork_bytes;
>>   	u32				eval;
>>   	bool				redir_ingress; /* undefined if sk_redir is null */
>> +	bool				redir_permanent;
>>   	struct sk_msg			*cork;
>>   	struct sk_psock_progs		progs;
>>   #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 70da85200695..f4de1ba390b4 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3004,7 +3004,12 @@ union bpf_attr {
>>    * 		egress interfaces can be used for redirection. The
>>    * 		**BPF_F_INGRESS** value in *flags* is used to make the
>>    * 		distinction (ingress path is selected if the flag is present,
>> - * 		egress path otherwise). This is the only flag supported for now.
>> + * 		egress path otherwise). The **BPF_F_PERMANENT** value in
>> + *		*flags* is used to indicates whether the eBPF result is
>> + *		permanently (please note that, BPF_F_PERMANENT does not work with
>> + *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
>> + *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
>> + *		automatically disabled).
> 
> There are some grammar mistakes here we need to fix. Hint - I find it
> helpful to run the text through an online grammar checker or an AI
> chatbot when unsure.
> 
> Either way, let's reword so the flags are clearly listed out, since we
> now have two of them:
> 
> The following *flags* are supported:
> 
> **BPF_F_INGRESS**
>          Both ingress and egress interfaces can be used for redirection.
>          The **BPF_F_INGRESS** value in *flags* is used to make the
>          distinction. Ingress path is selected if the flag is present,
>          egress path otherwise.
> **BPF_F_PERMANENT**
>          Indicates that redirect verdict and the target socket should be
>          remembered. The verdict program will not be run for subsequent
>          packets, unless an error occurs when forwarding packets.
> 
>          **BPF_F_PERMANENT** cannot be use together with
>          **bpf_msg_apply_bytes**\ () and **bpf_msg_cork_bytes**\ (). If
>          either has been called, the **BPF_F_PERMANENT** flag is ignored.
> 
> 
Thanks a lot.
> Please check the formatting is correct with:
> 
> ./scripts/bpf_doc.py --filename include/uapi/linux/bpf.h \
>    | rst2man /dev/stdin | man /dev/stdin
> 
> That said, I'm not sure I like these semantics - flag being ignored
> under some circumstances. It leads to a silent failure.
> 
> If I've asked for a permanent redirect, but it is cannot work in my BPF
> program, I'd rather get an error from bpf_msg_redirect_map/hash(), than
> be surprised that it is getting executed for every packet and have to
> troubleshoot why.
> 
>>    * 	Return
>>    * 		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>> @@ -3276,7 +3281,12 @@ union bpf_attr {
>>    *		egress interfaces can be used for redirection. The
>>    *		**BPF_F_INGRESS** value in *flags* is used to make the
>>    *		distinction (ingress path is selected if the flag is present,
>> - *		egress path otherwise). This is the only flag supported for now.
>> + *		egress path otherwise). The **BPF_F_PERMANENT** value in
>> + *		*flags* is used to indicates whether the eBPF result is
>> + *		permanently (please note that, BPF_F_PERMANENT does not work with
>> + *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
>> + *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
>> + *		automatically disabled).
>>    *	Return
>>    *		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>> @@ -5872,6 +5882,7 @@ enum {
>>   /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
>>   enum {
>>   	BPF_F_INGRESS			= (1ULL << 0),
>> +	BPF_F_PERMANENT			= (1ULL << 1),
>>   };
>>   
>>   /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>> index a29508e1ff35..df1443cf5fbd 100644
>> --- a/net/core/skmsg.c
>> +++ b/net/core/skmsg.c
>> @@ -885,6 +885,11 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
>>   			goto out;
>>   		}
>>   		psock->redir_ingress = sk_msg_to_ingress(msg);
>> +		if (!msg->apply_bytes && !msg->cork_bytes)
>> +			psock->redir_permanent =
>> +				msg->flags & BPF_F_PERMANENT;
>> +		else
>> +			psock->redir_permanent = false;
> 
> Above can be rewritten as:
> 
> 		psock->redir_permanent = !msg->apply_bytes &&
> 					 !msg->cork_bytes &&
> 					 (msg->flags & BPF_F_PERMANENT);
> 
Will change it in v4. Thanks.

> But as I wrote earlier, I don't think it's a good idea to ignore the
> flag. We can detect this conflict at the time the bpf_msg_sk_redirect_*
> helper is called and return an error.
>
> Naturally that means that that bpf_msg_{cork,apply}_bytes helpers need
> to be adjusted to return an error if BPF_F_PERMANENT has been set.
OK. If so, is the configuration that is set first and has a higher priority?

> 
>>   		psock->sk_redir = msg->sk_redir;
>>   		sock_hold(psock->sk_redir);
>>   	}
>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> index 08ab108206bf..35a361614f5e 100644
>> --- a/net/core/sock_map.c
>> +++ b/net/core/sock_map.c
>> @@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, msg,
>>   {
>>   	struct sock *sk;
>>   
>> -	if (unlikely(flags & ~(BPF_F_INGRESS)))
>> +	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENT)))
>>   		return SK_DROP;
>>   
>>   	sk = __sock_map_lookup_elem(map, key);
>> @@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg *, msg,
>>   {
>>   	struct sock *sk;
>>   
>> -	if (unlikely(flags & ~(BPF_F_INGRESS)))
>> +	if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENT)))
>>   		return SK_DROP;
>>   
>>   	sk = __sock_hash_lookup_elem(map, key);
>> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
>> index 81f0dff69e0b..b53e356562a6 100644
>> --- a/net/ipv4/tcp_bpf.c
>> +++ b/net/ipv4/tcp_bpf.c
>> @@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>>   		if (!psock->apply_bytes) {
>>   			/* Clean up before releasing the sock lock. */
>>   			eval = psock->eval;
>> -			psock->eval = __SK_NONE;
>> -			psock->sk_redir = NULL;
>> +			if (!psock->redir_permanent) {
>> +				psock->eval = __SK_NONE;
>> +				psock->sk_redir = NULL;
>> +			}
>>   		}
>>   		if (psock->cork) {
>>   			cork = true;
>> @@ -433,9 +435,15 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>>   		ret = tcp_bpf_sendmsg_redir(sk_redir, redir_ingress,
>>   					    msg, tosend, flags);
>>   		sent = origsize - msg->sg.size;
>> +		/* disable the ability when something wrong */
>> +		if (unlikely(ret < 0))
>> +			psock->redir_permanent = 0;
>>   
>> -		if (eval == __SK_REDIRECT)
>> +		if (!psock->redir_permanent && eval == __SK_REDIRECT) {
> 
> I believe eval == __SK_REDIRECT is always true here, and the eval local
> variable is redundant. We will be in this switch branch only if
> psock->eval == __SK_REDIRECT. It's something we missed during the review
> of cd9733f5d75c ("tcp_bpf: Fix one concurrency problem in the
> tcp_bpf_send_verdict function").
> 
When apply_bytes is set, eval == __SK_REDIRECT is not true.
>>   			sock_put(sk_redir);
>> +			psock->sk_redir = NULL;
>> +			psock->eval = __SK_NONE;
>> +		}
>>   
>>   		lock_sock(sk);
>>   		if (unlikely(ret < 0)) {
>> @@ -460,8 +468,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>>   	}
>>   
>>   	if (likely(!ret)) {
>> -		if (!psock->apply_bytes) {
>> -			psock->eval =  __SK_NONE;
>> +		if (!psock->apply_bytes && !psock->redir_permanent) {
>> +			psock->eval = __SK_NONE;
>>   			if (psock->sk_redir) {
>>   				sock_put(psock->sk_redir);
>>   				psock->sk_redir = NULL;
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 70da85200695..f4de1ba390b4 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -3004,7 +3004,12 @@ union bpf_attr {
>>    * 		egress interfaces can be used for redirection. The
>>    * 		**BPF_F_INGRESS** value in *flags* is used to make the
>>    * 		distinction (ingress path is selected if the flag is present,
>> - * 		egress path otherwise). This is the only flag supported for now.
>> + * 		egress path otherwise). The **BPF_F_PERMANENT** value in
>> + *		*flags* is used to indicates whether the eBPF result is
>> + *		permanently (please note that, BPF_F_PERMANENT does not work with
>> + *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
>> + *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
>> + *		automatically disabled).
>>    * 	Return
>>    * 		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>> @@ -3276,7 +3281,12 @@ union bpf_attr {
>>    *		egress interfaces can be used for redirection. The
>>    *		**BPF_F_INGRESS** value in *flags* is used to make the
>>    *		distinction (ingress path is selected if the flag is present,
>> - *		egress path otherwise). This is the only flag supported for now.
>> + *		egress path otherwise). The **BPF_F_PERMANENT** value in
>> + *		*flags* is used to indicates whether the eBPF result is
>> + *		permanently (please note that, BPF_F_PERMANENT does not work with
>> + *		msg_apply_bytes() and msg_cork_bytes(), if msg_apply_bytes() or
>> + *		msg_cork_bytes() is configured, the BPF_F_PERMANENT function is
>> + *		automatically disabled).
>>    *	Return
>>    *		**SK_PASS** on success, or **SK_DROP** on error.
>>    *
>> @@ -5872,6 +5882,7 @@ enum {
>>   /* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
>>   enum {
>>   	BPF_F_INGRESS			= (1ULL << 0),
>> +	BPF_F_PERMANENT			= (1ULL << 1),
>>   };
>>   
>>   /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for skmsg redirect
  2023-08-26  1:32     ` John Fastabend
@ 2023-08-26 11:54       ` liujian (CE)
  2023-08-28  8:13       ` Jakub Sitnicki
  1 sibling, 0 replies; 13+ messages in thread
From: liujian (CE) @ 2023-08-26 11:54 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: ast, daniel, andrii, martin.lau, song, yonghong.song, kpsingh,
	sdf, haoluo, jolsa, davem, edumazet, kuba, pabeni, dsahern,
	netdev, bpf



On 2023/8/26 9:32, John Fastabend wrote:
> Jakub Sitnicki wrote:
>> On Thu, Aug 24, 2023 at 10:39 PM +08, Liu Jian wrote:
>>> If the sockmap msg redirection function is used only to forward packets
>>> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
>>> program is the same each time. In this case, the BPF program only needs to
>>> be run once. Add BPF_F_PERMANENT flag to bpf_msg_redirect_map() and
>>> bpf_msg_redirect_hash() to implement this ability.
>>>
>>> Then we can enable this function in the bpf program as follows:
>>> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENT);
>>>
>>> Test results using netperf  TCP_STREAM mode:
>>> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
>>> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
>>> done
>>>
>>> before:
>>> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
>>> after:
>>> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
>>>
>>> Signed-off-by: Liu Jian <liujian56@huawei.com>
> 
> [...]
> 
>>>   /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
>>> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>>> index a29508e1ff35..df1443cf5fbd 100644
>>> --- a/net/core/skmsg.c
>>> +++ b/net/core/skmsg.c
>>> @@ -885,6 +885,11 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
>>>   			goto out;
>>>   		}
>>>   		psock->redir_ingress = sk_msg_to_ingress(msg);
>>> +		if (!msg->apply_bytes && !msg->cork_bytes)
>>> +			psock->redir_permanent =
>>> +				msg->flags & BPF_F_PERMANENT;
>>> +		else
>>> +			psock->redir_permanent = false;
>>
>> Above can be rewritten as:
>>
>> 		psock->redir_permanent = !msg->apply_bytes &&
>> 					 !msg->cork_bytes &&
>> 					 (msg->flags & BPF_F_PERMANENT);
>>
>> But as I wrote earlier, I don't think it's a good idea to ignore the
>> flag. We can detect this conflict at the time the bpf_msg_sk_redirect_*
>> helper is called and return an error.
>>
>> Naturally that means that that bpf_msg_{cork,apply}_bytes helpers need
>> to be adjusted to return an error if BPF_F_PERMANENT has been set.
> 
> So far we've not really done much to protect a user from doing
> rather silly things. The following will all do something without
> errors,
> 
>    bpf_msg_apply_bytes()
>    bpf_msg_apply_bytes() <- reset apply bytes
> 
>    bpf_msg_cork_bytes()
>    bpf_msg_cork_bytes() <- resets cork byte
> 
> also,
> 
>    bpf_msg_redirect(..., BPF_F_INGRESS);
>    bpf_msg_redirect(..., 0); <- resets sk_redir and flags
> 
> maybe there is some valid reason to even do above if further parsing
> identifies some reason to redirect to a alert socket or something.
> 
> My original thinking was in the interest of not having a bunch of
> extra checks for performance reasons we shouldn't add guard rails
> unless something really unexpected might happen like a kernel
> panic or what not.
> 
> This does feel a bit different though because before we
> didn't have calls that could impact other calls. My best idea
> is to just create a precedence and follow it. I would propose,
> 
> 'If BPF_F_PERMANENT is set apply_bytes and cork_bytes are
>   ignored.'
> 
I think it's better.
Both low-priority or high-priority are ok for me. But I think it's 
better that BPF_F_PERMANENT has a low priority. Because BPF_F_PERMANEN 
is only for performance, and apply_bytes or cork_bytes may be used to a 
user logic function.

> The other direction (what is above?) has a bit of an inconsistency
> where these two flows are different?
> 
>    bpf_apply_bytes()
>    bpf_msg_redirect(..., BPF_F_PERMANENT)
> 
> and
> 
>    bpf_msg_redirect(..., BPF_F_PERMANENT)
>    bpf_apply_bytes()
> 
> It would be best if order of operations doesn't change the
> outcome because that starts to get really hard to reason about.
> 
> This avoids having to add checks all over the place and then
> if users want we could give some mechanisms to read apply
> and cork bytes so people could write macros over those if
> they really want the hard error.
> 
> WDYT?
> 
> [...]
> 
> Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for skmsg redirect
  2023-08-26  1:32     ` John Fastabend
  2023-08-26 11:54       ` liujian (CE)
@ 2023-08-28  8:13       ` Jakub Sitnicki
  1 sibling, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2023-08-28  8:13 UTC (permalink / raw)
  To: John Fastabend
  Cc: Liu Jian, ast, daniel, andrii, martin.lau, song, yonghong.song,
	kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba, pabeni,
	dsahern, netdev, bpf

On Fri, Aug 25, 2023 at 06:32 PM -07, John Fastabend wrote:
> Jakub Sitnicki wrote:

[...]

>> But as I wrote earlier, I don't think it's a good idea to ignore the
>> flag. We can detect this conflict at the time the bpf_msg_sk_redirect_*
>> helper is called and return an error.
>> 
>> Naturally that means that that bpf_msg_{cork,apply}_bytes helpers need
>> to be adjusted to return an error if BPF_F_PERMANENT has been set.
>
> So far we've not really done much to protect a user from doing
> rather silly things. The following will all do something without
> errors,
>
>   bpf_msg_apply_bytes()
>   bpf_msg_apply_bytes() <- reset apply bytes
>
>   bpf_msg_cork_bytes()
>   bpf_msg_cork_bytes() <- resets cork byte
>
> also,
>
>   bpf_msg_redirect(..., BPF_F_INGRESS);
>   bpf_msg_redirect(..., 0); <- resets sk_redir and flags
>
> maybe there is some valid reason to even do above if further parsing
> identifies some reason to redirect to a alert socket or something.
>
> My original thinking was in the interest of not having a bunch of
> extra checks for performance reasons we shouldn't add guard rails
> unless something really unexpected might happen like a kernel
> panic or what not.
>
> This does feel a bit different though because before we
> didn't have calls that could impact other calls. My best idea
> is to just create a precedence and follow it. I would propose,
>
> 'If BPF_F_PERMANENT is set apply_bytes and cork_bytes are
>  ignored.'
>
> The other direction (what is above?) has a bit of an inconsistency
> where these two flows are different?
>
>   bpf_apply_bytes()
>   bpf_msg_redirect(..., BPF_F_PERMANENT)
>
> and
>
>   bpf_msg_redirect(..., BPF_F_PERMANENT)
>   bpf_apply_bytes()
>
> It would be best if order of operations doesn't change the
> outcome because that starts to get really hard to reason about.
>
> This avoids having to add checks all over the place and then
> if users want we could give some mechanisms to read apply
> and cork bytes so people could write macros over those if
> they really want the hard error.
>
> WDYT?

These semantics sound sane to me. Easy to explain:

BPF_F_PERMANENT takes precedence over apply/cork_bytes.

Good point about order of operations.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-08-28  8:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24 14:39 [PATCH bpf-next v3 0/7] add BPF_F_PERMANENT flag for sockmap skmsg redirect Liu Jian
2023-08-24 14:39 ` [PATCH bpf-next v3 1/7] bpf, sockmap: add BPF_F_PERMANENT flag for " Liu Jian
2023-08-25 13:04   ` Jakub Sitnicki
2023-08-26  1:32     ` John Fastabend
2023-08-26 11:54       ` liujian (CE)
2023-08-28  8:13       ` Jakub Sitnicki
2023-08-26 11:53     ` liujian (CE)
2023-08-24 14:39 ` [PATCH bpf-next v3 2/7] selftests/bpf: Add txmsg permanently test for sockmap Liu Jian
2023-08-24 14:39 ` [PATCH bpf-next v3 3/7] selftests/bpf: Add txmsg redir " Liu Jian
2023-08-24 14:39 ` [PATCH bpf-next v3 4/7] selftests/bpf: add skmsg verdict tests Liu Jian
2023-08-24 14:39 ` [PATCH bpf-next v3 5/7] selftests/bpf: add two skmsg verdict tests for BPF_F_PERMANENT flag Liu Jian
2023-08-24 14:39 ` [PATCH bpf-next v3 6/7] selftests/bpf: add tests for verdict skmsg to itself Liu Jian
2023-08-24 14:39 ` [PATCH bpf-next v3 7/7] selftests/bpf: add tests for verdict skmsg to closed socket Liu Jian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).