All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/4] bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
@ 2021-08-24 17:30 Martin KaFai Lau
  2021-08-24 17:30 ` [PATCH v2 bpf-next 1/4] " Martin KaFai Lau
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2021-08-24 17:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, netdev

This set allows the bpf-tcp-cc to call bpf_setsockopt.  One use
case is to allow a bpf-tcp-cc switching to another cc during init().
For example, when the tcp flow is not ecn ready, the bpf_dctcp
can switch to another cc by calling setsockopt(TCP_CONGESTION).

bpf_getsockopt() is also added to have a symmetrical API, so
less usage surprise.
    
v2:
- Not allow switching to kernel's tcp_cdg because it is the only
  kernel tcp-cc that stores a pointer to icsk_ca_priv.
  Please see the commit log in patch 1 for details.
  Test is added in patch 4 to check switching to tcp_cdg.
- Refactor the logic finding the offset of a func ptr
  in the "struct tcp_congestion_ops" to prog_ops_moff()
  in patch 1.
- bpf_setsockopt() has been disabled in release() since v1 (please
  see commit log in patch 1 for reason).  bpf_getsockopt() is
  also disabled together in release() in v2 to avoid usage surprise
  because both of them are usually expected to be available together.
  bpf-tcp-cc can already use PTR_TO_BTF_ID to read from tcp_sock.

Martin KaFai Lau (4):
  bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
  bpf: selftests: Add sk_state to bpf_tcp_helpers.h
  bpf: selftests: Add connect_to_fd_opts to network_helpers
  bpf: selftests: Add dctcp fallback test

 kernel/bpf/bpf_struct_ops.c                   |  22 +++-
 net/core/filter.c                             |   6 +
 net/ipv4/bpf_tcp_ca.c                         |  41 ++++++-
 tools/testing/selftests/bpf/bpf_tcp_helpers.h |   1 +
 tools/testing/selftests/bpf/network_helpers.c |  23 +++-
 tools/testing/selftests/bpf/network_helpers.h |   6 +
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 106 ++++++++++++++----
 .../selftests/bpf/prog_tests/kfunc_call.c     |   2 +-
 tools/testing/selftests/bpf/progs/bpf_dctcp.c |  25 +++++
 .../selftests/bpf/progs/bpf_dctcp_release.c   |  26 +++++
 .../bpf/progs/kfunc_call_test_subprog.c       |   4 +-
 11 files changed, 230 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_dctcp_release.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 bpf-next 1/4] bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt
  2021-08-24 17:30 [PATCH v2 bpf-next 0/4] bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt Martin KaFai Lau
@ 2021-08-24 17:30 ` Martin KaFai Lau
  2021-08-24 17:30 ` [PATCH v2 bpf-next 2/4] bpf: selftests: Add sk_state to bpf_tcp_helpers.h Martin KaFai Lau
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2021-08-24 17:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, netdev

This patch allows the bpf-tcp-cc to call bpf_setsockopt.  One use
case is to allow a bpf-tcp-cc switching to another cc during init().
For example, when the tcp flow is not ecn ready, the bpf_dctcp
can switch to another cc by calling setsockopt(TCP_CONGESTION).

During setsockopt(TCP_CONGESTION), the new tcp-cc's init() will be
called and this could cause a recursion but it is stopped by the
current trampoline's logic (in the prog->active counter).

While retiring a bpf-tcp-cc (e.g. in tcp_v[46]_destroy_sock()),
the tcp stack calls bpf-tcp-cc's release().  To avoid the retiring
bpf-tcp-cc making further changes to the sk, bpf_setsockopt is not
available to the bpf-tcp-cc's release().  This will avoid release()
making setsockopt() call that will potentially allocate new resources.

Although the bpf-tcp-cc already has a more powerful way to read tcp_sock
from the PTR_TO_BTF_ID, it is usually expected that bpf_getsockopt and
bpf_setsockopt are available together.  Thus, bpf_getsockopt() is also
added to all tcp_congestion_ops except release().

When the old bpf-tcp-cc is calling setsockopt(TCP_CONGESTION)
to switch to a new cc, the old bpf-tcp-cc will be released by
bpf_struct_ops_put().  Thus, this patch also puts the bpf_struct_ops_map
after a rcu grace period because the trampoline's image cannot be freed
while the old bpf-tcp-cc is still running.

bpf-tcp-cc can only access icsk_ca_priv as SCALAR.  All kernel's
tcp-cc is also accessing the icsk_ca_priv as SCALAR.   The size
of icsk_ca_priv has already been raised a few times to avoid
extra kmalloc and memory referencing.  The only exception is the
kernel's tcp_cdg.c that stores a kmalloc()-ed pointer in icsk_ca_priv.
To avoid the old bpf-tcp-cc accidentally overriding this tcp_cdg's pointer
value stored in icsk_ca_priv after switching and without over-complicating
the bpf's verifier for this one exception in tcp_cdg, this patch does not
allow switching to tcp_cdg.  If there is a need, bpf_tcp_cdg can be
implemented and then use the bpf_sk_storage as the extended storage.

bpf_sk_setsockopt proto has only been recently added and used
in bpf-sockopt and bpf-iter-tcp, so impose the tcp_cdg limitation in the
same proto instead of adding a new proto specifically for bpf-tcp-cc.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 kernel/bpf/bpf_struct_ops.c | 22 +++++++++++++++++++-
 net/core/filter.c           |  6 ++++++
 net/ipv4/bpf_tcp_ca.c       | 41 ++++++++++++++++++++++++++++++++++---
 3 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 70f6fd4fa305..d6731c32864e 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -28,6 +28,7 @@ struct bpf_struct_ops_value {
 
 struct bpf_struct_ops_map {
 	struct bpf_map map;
+	struct rcu_head rcu;
 	const struct bpf_struct_ops *st_ops;
 	/* protect map_update */
 	struct mutex lock;
@@ -622,6 +623,14 @@ bool bpf_struct_ops_get(const void *kdata)
 	return refcount_inc_not_zero(&kvalue->refcnt);
 }
 
+static void bpf_struct_ops_put_rcu(struct rcu_head *head)
+{
+	struct bpf_struct_ops_map *st_map;
+
+	st_map = container_of(head, struct bpf_struct_ops_map, rcu);
+	bpf_map_put(&st_map->map);
+}
+
 void bpf_struct_ops_put(const void *kdata)
 {
 	struct bpf_struct_ops_value *kvalue;
@@ -632,6 +641,17 @@ void bpf_struct_ops_put(const void *kdata)
 
 		st_map = container_of(kvalue, struct bpf_struct_ops_map,
 				      kvalue);
-		bpf_map_put(&st_map->map);
+		/* The struct_ops's function may switch to another struct_ops.
+		 *
+		 * For example, bpf_tcp_cc_x->init() may switch to
+		 * another tcp_cc_y by calling
+		 * setsockopt(TCP_CONGESTION, "tcp_cc_y").
+		 * During the switch,  bpf_struct_ops_put(tcp_cc_x) is called
+		 * and its map->refcnt may reach 0 which then free its
+		 * trampoline image while tcp_cc_x is still running.
+		 *
+		 * Thus, a rcu grace period is needed here.
+		 */
+		call_rcu(&st_map->rcu, bpf_struct_ops_put_rcu);
 	}
 }
diff --git a/net/core/filter.c b/net/core/filter.c
index 59b8f5050180..a863fd6d9b9b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5039,6 +5039,12 @@ static int _bpf_getsockopt(struct sock *sk, int level, int optname,
 BPF_CALL_5(bpf_sk_setsockopt, struct sock *, sk, int, level,
 	   int, optname, char *, optval, int, optlen)
 {
+	if (level == SOL_TCP && optname == TCP_CONGESTION) {
+		if (optlen >= sizeof("cdg") - 1 &&
+		    !strncmp("cdg", optval, optlen))
+			return -ENOTSUPP;
+	}
+
 	return _bpf_setsockopt(sk, level, optname, optval, optlen);
 }
 
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 9e41eff4a685..0dcee9df1326 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -10,6 +10,9 @@
 #include <net/tcp.h>
 #include <net/bpf_sk_storage.h>
 
+/* "extern" is to avoid sparse warning.  It is only used in bpf_struct_ops.c. */
+extern struct bpf_struct_ops bpf_tcp_congestion_ops;
+
 static u32 optional_ops[] = {
 	offsetof(struct tcp_congestion_ops, init),
 	offsetof(struct tcp_congestion_ops, release),
@@ -163,6 +166,19 @@ static const struct bpf_func_proto bpf_tcp_send_ack_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+static u32 prog_ops_moff(const struct bpf_prog *prog)
+{
+	const struct btf_member *m;
+	const struct btf_type *t;
+	u32 midx;
+
+	midx = prog->expected_attach_type;
+	t = bpf_tcp_congestion_ops.type;
+	m = &btf_type_member(t)[midx];
+
+	return btf_member_bit_offset(t, m) / 8;
+}
+
 static const struct bpf_func_proto *
 bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 			  const struct bpf_prog *prog)
@@ -174,6 +190,28 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 		return &bpf_sk_storage_get_proto;
 	case BPF_FUNC_sk_storage_delete:
 		return &bpf_sk_storage_delete_proto;
+	case BPF_FUNC_setsockopt:
+		/* Does not allow release() to call setsockopt.
+		 * release() is called when the current bpf-tcp-cc
+		 * is retiring.  It is not allowed to call
+		 * setsockopt() to make further changes which
+		 * may potentially allocate new resources.
+		 */
+		if (prog_ops_moff(prog) !=
+		    offsetof(struct tcp_congestion_ops, release))
+			return &bpf_sk_setsockopt_proto;
+		return NULL;
+	case BPF_FUNC_getsockopt:
+		/* Since get/setsockopt is usually expected to
+		 * be available together, disable getsockopt for
+		 * release also to avoid usage surprise.
+		 * The bpf-tcp-cc already has a more powerful way
+		 * to read tcp_sock from the PTR_TO_BTF_ID.
+		 */
+		if (prog_ops_moff(prog) !=
+		    offsetof(struct tcp_congestion_ops, release))
+			return &bpf_sk_getsockopt_proto;
+		return NULL;
 	default:
 		return bpf_base_func_proto(func_id);
 	}
@@ -286,9 +324,6 @@ static void bpf_tcp_ca_unreg(void *kdata)
 	tcp_unregister_congestion_control(kdata);
 }
 
-/* Avoid sparse warning.  It is only used in bpf_struct_ops.c. */
-extern struct bpf_struct_ops bpf_tcp_congestion_ops;
-
 struct bpf_struct_ops bpf_tcp_congestion_ops = {
 	.verifier_ops = &bpf_tcp_ca_verifier_ops,
 	.reg = bpf_tcp_ca_reg,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 bpf-next 2/4] bpf: selftests: Add sk_state to bpf_tcp_helpers.h
  2021-08-24 17:30 [PATCH v2 bpf-next 0/4] bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt Martin KaFai Lau
  2021-08-24 17:30 ` [PATCH v2 bpf-next 1/4] " Martin KaFai Lau
@ 2021-08-24 17:30 ` Martin KaFai Lau
  2021-08-24 17:30 ` [PATCH v2 bpf-next 3/4] bpf: selftests: Add connect_to_fd_opts to network_helpers Martin KaFai Lau
  2021-08-24 17:30 ` [PATCH v2 bpf-next 4/4] bpf: selftests: Add dctcp fallback test Martin KaFai Lau
  3 siblings, 0 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2021-08-24 17:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, netdev

Add sk_state define to bpf_tcp_helpers.h.  Rename the existing
global variable "sk_state" in the kfunc_call test to "sk_state_res".

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/testing/selftests/bpf/bpf_tcp_helpers.h               | 1 +
 tools/testing/selftests/bpf/prog_tests/kfunc_call.c         | 2 +-
 tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c | 4 ++--
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_tcp_helpers.h b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
index c9f9bdad60c7..b1ede6f0b821 100644
--- a/tools/testing/selftests/bpf/bpf_tcp_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_tcp_helpers.h
@@ -31,6 +31,7 @@ enum sk_pacing {
 
 struct sock {
 	struct sock_common	__sk_common;
+#define sk_state		__sk_common.skc_state
 	unsigned long		sk_pacing_rate;
 	__u32			sk_pacing_status; /* see enum sk_pacing */
 } __attribute__((preserve_access_index));
diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
index 30a7b9b837bf..9611f2bc50df 100644
--- a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
+++ b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c
@@ -44,7 +44,7 @@ static void test_subprog(void)
 	ASSERT_OK(err, "bpf_prog_test_run(test1)");
 	ASSERT_EQ(retval, 10, "test1-retval");
 	ASSERT_NEQ(skel->data->active_res, -1, "active_res");
-	ASSERT_EQ(skel->data->sk_state, BPF_TCP_CLOSE, "sk_state");
+	ASSERT_EQ(skel->data->sk_state_res, BPF_TCP_CLOSE, "sk_state_res");
 
 	kfunc_call_test_subprog__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c b/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c
index b2dcb7d9cb03..5fbd9e232d44 100644
--- a/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c
+++ b/tools/testing/selftests/bpf/progs/kfunc_call_test_subprog.c
@@ -9,7 +9,7 @@ extern __u64 bpf_kfunc_call_test1(struct sock *sk, __u32 a, __u64 b,
 				  __u32 c, __u64 d) __ksym;
 extern struct sock *bpf_kfunc_call_test3(struct sock *sk) __ksym;
 int active_res = -1;
-int sk_state = -1;
+int sk_state_res = -1;
 
 int __noinline f1(struct __sk_buff *skb)
 {
@@ -28,7 +28,7 @@ int __noinline f1(struct __sk_buff *skb)
 	if (active)
 		active_res = *active;
 
-	sk_state = bpf_kfunc_call_test3((struct sock *)sk)->__sk_common.skc_state;
+	sk_state_res = bpf_kfunc_call_test3((struct sock *)sk)->sk_state;
 
 	return (__u32)bpf_kfunc_call_test1((struct sock *)sk, 1, 2, 3, 4);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 bpf-next 3/4] bpf: selftests: Add connect_to_fd_opts to network_helpers
  2021-08-24 17:30 [PATCH v2 bpf-next 0/4] bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt Martin KaFai Lau
  2021-08-24 17:30 ` [PATCH v2 bpf-next 1/4] " Martin KaFai Lau
  2021-08-24 17:30 ` [PATCH v2 bpf-next 2/4] bpf: selftests: Add sk_state to bpf_tcp_helpers.h Martin KaFai Lau
@ 2021-08-24 17:30 ` Martin KaFai Lau
  2021-08-24 17:30 ` [PATCH v2 bpf-next 4/4] bpf: selftests: Add dctcp fallback test Martin KaFai Lau
  3 siblings, 0 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2021-08-24 17:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, netdev

The next test requires to setsockopt(TCP_CONGESTION) before
connect(), so a new arg is needed for the connect_to_fd() to specify
the cc's name.

This patch adds a new "struct network_helper_opts" for the future
option needs.  It starts with the "cc" and "timeout_ms" option.
A new helper connect_to_fd_opts() is added to take the new
"const struct network_helper_opts *opts" as an arg.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 tools/testing/selftests/bpf/network_helpers.c | 23 +++++++++++++++++--
 tools/testing/selftests/bpf/network_helpers.h |  6 +++++
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
index d6857683397f..7e9f6375757a 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -218,13 +218,18 @@ static int connect_fd_to_addr(int fd,
 	return 0;
 }
 
-int connect_to_fd(int server_fd, int timeout_ms)
+static const struct network_helper_opts default_opts;
+
+int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts)
 {
 	struct sockaddr_storage addr;
 	struct sockaddr_in *addr_in;
 	socklen_t addrlen, optlen;
 	int fd, type;
 
+	if (!opts)
+		opts = &default_opts;
+
 	optlen = sizeof(type);
 	if (getsockopt(server_fd, SOL_SOCKET, SO_TYPE, &type, &optlen)) {
 		log_err("getsockopt(SOL_TYPE)");
@@ -244,7 +249,12 @@ int connect_to_fd(int server_fd, int timeout_ms)
 		return -1;
 	}
 
-	if (settimeo(fd, timeout_ms))
+	if (settimeo(fd, opts->timeout_ms))
+		goto error_close;
+
+	if (opts->cc && opts->cc[0] &&
+	    setsockopt(fd, SOL_TCP, TCP_CONGESTION, opts->cc,
+		       strlen(opts->cc) + 1))
 		goto error_close;
 
 	if (connect_fd_to_addr(fd, &addr, addrlen))
@@ -257,6 +267,15 @@ int connect_to_fd(int server_fd, int timeout_ms)
 	return -1;
 }
 
+int connect_to_fd(int server_fd, int timeout_ms)
+{
+	struct network_helper_opts opts = {
+		.timeout_ms = timeout_ms,
+	};
+
+	return connect_to_fd_opts(server_fd, &opts);
+}
+
 int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms)
 {
 	struct sockaddr_storage addr;
diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
index c59a8f6d770b..da7e132657d5 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -17,6 +17,11 @@ typedef __u16 __sum16;
 #define VIP_NUM 5
 #define MAGIC_BYTES 123
 
+struct network_helper_opts {
+	const char *cc;
+	int timeout_ms;
+};
+
 /* ipv4 test vector */
 struct ipv4_packet {
 	struct ethhdr eth;
@@ -41,6 +46,7 @@ int *start_reuseport_server(int family, int type, const char *addr_str,
 			    unsigned int nr_listens);
 void free_fds(int *fds, unsigned int nr_close_fds);
 int connect_to_fd(int server_fd, int timeout_ms);
+int connect_to_fd_opts(int server_fd, const struct network_helper_opts *opts);
 int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms);
 int fastopen_connect(int server_fd, const char *data, unsigned int data_len,
 		     int timeout_ms);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 bpf-next 4/4] bpf: selftests: Add dctcp fallback test
  2021-08-24 17:30 [PATCH v2 bpf-next 0/4] bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt Martin KaFai Lau
                   ` (2 preceding siblings ...)
  2021-08-24 17:30 ` [PATCH v2 bpf-next 3/4] bpf: selftests: Add connect_to_fd_opts to network_helpers Martin KaFai Lau
@ 2021-08-24 17:30 ` Martin KaFai Lau
  3 siblings, 0 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2021-08-24 17:30 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, netdev

This patch makes the bpf_dctcp test to fallback to cubic by
using setsockopt(TCP_CONGESTION) when the tcp flow is not
ecn ready.

It also checks setsockopt() is not available to release().

The settimeo() from the network_helpers.h is used, so the local
one is removed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 106 ++++++++++++++----
 tools/testing/selftests/bpf/progs/bpf_dctcp.c |  25 +++++
 .../selftests/bpf/progs/bpf_dctcp_release.c   |  26 +++++
 3 files changed, 134 insertions(+), 23 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_dctcp_release.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
index efe1e979affb..94e03df69d71 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
@@ -4,37 +4,22 @@
 #include <linux/err.h>
 #include <netinet/tcp.h>
 #include <test_progs.h>
+#include "network_helpers.h"
 #include "bpf_dctcp.skel.h"
 #include "bpf_cubic.skel.h"
 #include "bpf_tcp_nogpl.skel.h"
+#include "bpf_dctcp_release.skel.h"
 
 #define min(a, b) ((a) < (b) ? (a) : (b))
 
+#ifndef ENOTSUPP
+#define ENOTSUPP 524
+#endif
+
 static const unsigned int total_bytes = 10 * 1024 * 1024;
-static const struct timeval timeo_sec = { .tv_sec = 10 };
-static const size_t timeo_optlen = sizeof(timeo_sec);
 static int expected_stg = 0xeB9F;
 static int stop, duration;
 
-static int settimeo(int fd)
-{
-	int err;
-
-	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &timeo_sec,
-			 timeo_optlen);
-	if (CHECK(err == -1, "setsockopt(fd, SO_RCVTIMEO)", "errno:%d\n",
-		  errno))
-		return -1;
-
-	err = setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &timeo_sec,
-			 timeo_optlen);
-	if (CHECK(err == -1, "setsockopt(fd, SO_SNDTIMEO)", "errno:%d\n",
-		  errno))
-		return -1;
-
-	return 0;
-}
-
 static int settcpca(int fd, const char *tcp_ca)
 {
 	int err;
@@ -61,7 +46,7 @@ static void *server(void *arg)
 		goto done;
 	}
 
-	if (settimeo(fd)) {
+	if (settimeo(fd, 0)) {
 		err = -errno;
 		goto done;
 	}
@@ -114,7 +99,7 @@ static void do_test(const char *tcp_ca, const struct bpf_map *sk_stg_map)
 	}
 
 	if (settcpca(lfd, tcp_ca) || settcpca(fd, tcp_ca) ||
-	    settimeo(lfd) || settimeo(fd))
+	    settimeo(lfd, 0) || settimeo(fd, 0))
 		goto done;
 
 	/* bind, listen and start server thread to accept */
@@ -267,6 +252,77 @@ static void test_invalid_license(void)
 	libbpf_set_print(old_print_fn);
 }
 
+static void test_dctcp_fallback(void)
+{
+	int err, lfd = -1, cli_fd = -1, srv_fd = -1;
+	struct network_helper_opts opts = {
+		.cc = "cubic",
+	};
+	struct bpf_dctcp *dctcp_skel;
+	struct bpf_link *link = NULL;
+	char srv_cc[16];
+	socklen_t cc_len = sizeof(srv_cc);
+
+	dctcp_skel = bpf_dctcp__open();
+	if (!ASSERT_OK_PTR(dctcp_skel, "dctcp_skel"))
+		return;
+	strcpy(dctcp_skel->rodata->fallback, "cubic");
+	if (!ASSERT_OK(bpf_dctcp__load(dctcp_skel), "bpf_dctcp__load"))
+		goto done;
+
+	link = bpf_map__attach_struct_ops(dctcp_skel->maps.dctcp);
+	if (!ASSERT_OK_PTR(link, "dctcp link"))
+		goto done;
+
+	lfd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
+	if (!ASSERT_GE(lfd, 0, "lfd") ||
+	    !ASSERT_OK(settcpca(lfd, "bpf_dctcp"), "lfd=>bpf_dctcp"))
+		goto done;
+
+	cli_fd = connect_to_fd_opts(lfd, &opts);
+	if (!ASSERT_GE(cli_fd, 0, "cli_fd"))
+		goto done;
+
+	srv_fd = accept(lfd, NULL, 0);
+	if (!ASSERT_GE(srv_fd, 0, "srv_fd"))
+		goto done;
+	ASSERT_STREQ(dctcp_skel->bss->cc_res, "cubic", "cc_res");
+	ASSERT_EQ(dctcp_skel->bss->tcp_cdg_res, -ENOTSUPP, "tcp_cdg_res");
+
+	err = getsockopt(srv_fd, SOL_TCP, TCP_CONGESTION, srv_cc, &cc_len);
+	if (!ASSERT_OK(err, "getsockopt(srv_fd, TCP_CONGESTION)"))
+		goto done;
+	ASSERT_STREQ(srv_cc, "cubic", "srv_fd cc");
+
+done:
+	bpf_link__destroy(link);
+	bpf_dctcp__destroy(dctcp_skel);
+	if (lfd != -1)
+		close(lfd);
+	if (srv_fd != -1)
+		close(srv_fd);
+	if (cli_fd != -1)
+		close(cli_fd);
+}
+
+static void test_rel_setsockopt(void)
+{
+	struct bpf_dctcp_release *rel_skel;
+	libbpf_print_fn_t old_print_fn;
+
+	err_str = "unknown func bpf_setsockopt";
+	found = false;
+
+	old_print_fn = libbpf_set_print(libbpf_debug_print);
+	rel_skel = bpf_dctcp_release__open_and_load();
+	libbpf_set_print(old_print_fn);
+
+	ASSERT_ERR_PTR(rel_skel, "rel_skel");
+	ASSERT_TRUE(found, "expected_err_msg");
+
+	bpf_dctcp_release__destroy(rel_skel);
+}
+
 void test_bpf_tcp_ca(void)
 {
 	if (test__start_subtest("dctcp"))
@@ -275,4 +331,8 @@ void test_bpf_tcp_ca(void)
 		test_cubic();
 	if (test__start_subtest("invalid_license"))
 		test_invalid_license();
+	if (test__start_subtest("dctcp_fallback"))
+		test_dctcp_fallback();
+	if (test__start_subtest("rel_setsockopt"))
+		test_rel_setsockopt();
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_dctcp.c b/tools/testing/selftests/bpf/progs/bpf_dctcp.c
index fd42247da8b4..9573be6122be 100644
--- a/tools/testing/selftests/bpf/progs/bpf_dctcp.c
+++ b/tools/testing/selftests/bpf/progs/bpf_dctcp.c
@@ -17,6 +17,11 @@
 
 char _license[] SEC("license") = "GPL";
 
+volatile const char fallback[TCP_CA_NAME_MAX];
+const char bpf_dctcp[] = "bpf_dctcp";
+const char tcp_cdg[] = "cdg";
+char cc_res[TCP_CA_NAME_MAX];
+int tcp_cdg_res = 0;
 int stg_result = 0;
 
 struct {
@@ -57,6 +62,26 @@ void BPF_PROG(dctcp_init, struct sock *sk)
 	struct dctcp *ca = inet_csk_ca(sk);
 	int *stg;
 
+	if (!(tp->ecn_flags & TCP_ECN_OK) && fallback[0]) {
+		/* Switch to fallback */
+		bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+			       (void *)fallback, sizeof(fallback));
+		/* Switch back to myself which the bpf trampoline
+		 * stopped calling dctcp_init recursively.
+		 */
+		bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+			       (void *)bpf_dctcp, sizeof(bpf_dctcp));
+		/* Switch back to fallback */
+		bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+			       (void *)fallback, sizeof(fallback));
+		/* Expecting -ENOTSUPP for tcp_cdg_res */
+		tcp_cdg_res = bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+					     (void *)tcp_cdg, sizeof(tcp_cdg));
+		bpf_getsockopt(sk, SOL_TCP, TCP_CONGESTION,
+			       (void *)cc_res, sizeof(cc_res));
+		return;
+	}
+
 	ca->prior_rcv_nxt = tp->rcv_nxt;
 	ca->dctcp_alpha = min(dctcp_alpha_on_init, DCTCP_MAX_ALPHA);
 	ca->loss_cwnd = 0;
diff --git a/tools/testing/selftests/bpf/progs/bpf_dctcp_release.c b/tools/testing/selftests/bpf/progs/bpf_dctcp_release.c
new file mode 100644
index 000000000000..d836f7c372f0
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_dctcp_release.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <stddef.h>
+#include <linux/bpf.h>
+#include <linux/types.h>
+#include <linux/stddef.h>
+#include <linux/tcp.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_tcp_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+const char cubic[] = "cubic";
+
+void BPF_STRUCT_OPS(dctcp_nouse_release, struct sock *sk)
+{
+	bpf_setsockopt(sk, SOL_TCP, TCP_CONGESTION,
+		       (void *)cubic, sizeof(cubic));
+}
+
+SEC(".struct_ops")
+struct tcp_congestion_ops dctcp_rel = {
+	.release	= (void *)dctcp_nouse_release,
+	.name		= "bpf_dctcp_rel",
+};
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-24 17:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-24 17:30 [PATCH v2 bpf-next 0/4] bpf: tcp: Allow bpf-tcp-cc to call bpf_(get|set)sockopt Martin KaFai Lau
2021-08-24 17:30 ` [PATCH v2 bpf-next 1/4] " Martin KaFai Lau
2021-08-24 17:30 ` [PATCH v2 bpf-next 2/4] bpf: selftests: Add sk_state to bpf_tcp_helpers.h Martin KaFai Lau
2021-08-24 17:30 ` [PATCH v2 bpf-next 3/4] bpf: selftests: Add connect_to_fd_opts to network_helpers Martin KaFai Lau
2021-08-24 17:30 ` [PATCH v2 bpf-next 4/4] bpf: selftests: Add dctcp fallback test Martin KaFai Lau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.