netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 0/5] Align BPF TCP CCs implementing cong_control() with non-BPF CCs
@ 2022-06-14 10:44 Jörn-Thorben Hinz
  2022-06-14 10:44 ` [PATCH bpf-next v3 1/5] bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status Jörn-Thorben Hinz
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Jörn-Thorben Hinz @ 2022-06-14 10:44 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, netdev, Jörn-Thorben Hinz

Addressed feedback by Martin KaFai Lau in this v3 of the series. See the
details below.

This series corrects some inconveniences for a BPF TCP CC that
implements and uses tcp_congestion_ops.cong_control(). Until now, such a
CC did not have all necessary write access to struct sock and
unnecessarily needed to implement cong_avoid().

v3:
 - Add a selftest writing sk_pacing_*
 - Add a selftest with incomplete tcp_congestion_ops
 - Add a selftest with unsupported get_info()
 - Remove an unused variable
 - Reword a comment about reg() in bpf_struct_ops_map_update_elem()
v2:
 - Drop redundant check for required functions and just rely on
   tcp_register_congestion_control() (Martin KaFai Lau)

Jörn-Thorben Hinz (5):
  bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status
  bpf: Require only one of cong_avoid() and cong_control() from a TCP CC
  selftests/bpf: Test a BPF CC writing sk_pacing_*
  selftests/bpf: Test an incomplete BPF CC
  selftests/bpf: Test a BPF CC implementing the unsupported get_info()

 kernel/bpf/bpf_struct_ops.c                   |  7 +-
 net/ipv4/bpf_tcp_ca.c                         | 39 ++---------
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 66 +++++++++++++++++++
 .../bpf/progs/tcp_ca_incompl_cong_ops.c       | 35 ++++++++++
 .../bpf/progs/tcp_ca_unsupp_cong_op.c         | 21 ++++++
 .../bpf/progs/tcp_ca_write_sk_pacing.c        | 60 +++++++++++++++++
 6 files changed, 191 insertions(+), 37 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_incompl_cong_ops.c
 create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_unsupp_cong_op.c
 create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 1/5] bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status
  2022-06-14 10:44 [PATCH bpf-next v3 0/5] Align BPF TCP CCs implementing cong_control() with non-BPF CCs Jörn-Thorben Hinz
@ 2022-06-14 10:44 ` Jörn-Thorben Hinz
  2022-06-17 20:21   ` Martin KaFai Lau
  2022-06-14 10:44 ` [PATCH bpf-next v3 2/5] bpf: Require only one of cong_avoid() and cong_control() from a TCP CC Jörn-Thorben Hinz
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jörn-Thorben Hinz @ 2022-06-14 10:44 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, netdev, Jörn-Thorben Hinz

A CC that implements tcp_congestion_ops.cong_control() should be able to
control sk_pacing_rate and set sk_pacing_status, since
tcp_update_pacing_rate() is never called in this case. A built-in CC or
one from a kernel module is already able to write to both struct sock
members. For a BPF program, write access has not been allowed, yet.

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
---
 net/ipv4/bpf_tcp_ca.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index f79ab942f03b..1f5c53ede4e5 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -111,6 +111,12 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
 	}
 
 	switch (off) {
+	case offsetof(struct sock, sk_pacing_rate):
+		end = offsetofend(struct sock, sk_pacing_rate);
+		break;
+	case offsetof(struct sock, sk_pacing_status):
+		end = offsetofend(struct sock, sk_pacing_status);
+		break;
 	case bpf_ctx_range(struct inet_connection_sock, icsk_ca_priv):
 		end = offsetofend(struct inet_connection_sock, icsk_ca_priv);
 		break;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 2/5] bpf: Require only one of cong_avoid() and cong_control() from a TCP CC
  2022-06-14 10:44 [PATCH bpf-next v3 0/5] Align BPF TCP CCs implementing cong_control() with non-BPF CCs Jörn-Thorben Hinz
  2022-06-14 10:44 ` [PATCH bpf-next v3 1/5] bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status Jörn-Thorben Hinz
@ 2022-06-14 10:44 ` Jörn-Thorben Hinz
  2022-06-17 20:26   ` Martin KaFai Lau
  2022-06-14 10:44 ` [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_* Jörn-Thorben Hinz
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Jörn-Thorben Hinz @ 2022-06-14 10:44 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, netdev, Jörn-Thorben Hinz

Remove the check for required and optional functions in a struct
tcp_congestion_ops from bpf_tcp_ca.c. Rely on
tcp_register_congestion_control() to reject a BPF CC that does not
implement all required functions, as it will do for a non-BPF CC.

When a CC implements tcp_congestion_ops.cong_control(), the alternate
cong_avoid() is not in use in the TCP stack. Previously, a BPF CC was
still forced to implement cong_avoid() as a no-op since it was
non-optional in bpf_tcp_ca.c.

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
---
 kernel/bpf/bpf_struct_ops.c |  7 +++----
 net/ipv4/bpf_tcp_ca.c       | 33 ---------------------------------
 2 files changed, 3 insertions(+), 37 deletions(-)

diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index d9a3c9207240..7e0068c3399c 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -503,10 +503,9 @@ static int bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
 		goto unlock;
 	}
 
-	/* Error during st_ops->reg().  It is very unlikely since
-	 * the above init_member() should have caught it earlier
-	 * before reg().  The only possibility is if there was a race
-	 * in registering the struct_ops (under the same name) to
+	/* Error during st_ops->reg(). Can happen if this struct_ops needs to be
+	 * verified as a whole, after all init_member() calls. Can also happen if
+	 * there was a race in registering the struct_ops (under the same name) to
 	 * a sub-system through different struct_ops's maps.
 	 */
 	set_memory_nx((long)st_map->image, 1);
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 1f5c53ede4e5..7a181631b995 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -14,18 +14,6 @@
 /* "extern" is to avoid sparse warning.  It is only used in bpf_struct_ops.c. */
 extern struct bpf_struct_ops bpf_tcp_congestion_ops;
 
-static u32 optional_ops[] = {
-	offsetof(struct tcp_congestion_ops, init),
-	offsetof(struct tcp_congestion_ops, release),
-	offsetof(struct tcp_congestion_ops, set_state),
-	offsetof(struct tcp_congestion_ops, cwnd_event),
-	offsetof(struct tcp_congestion_ops, in_ack_event),
-	offsetof(struct tcp_congestion_ops, pkts_acked),
-	offsetof(struct tcp_congestion_ops, min_tso_segs),
-	offsetof(struct tcp_congestion_ops, sndbuf_expand),
-	offsetof(struct tcp_congestion_ops, cong_control),
-};
-
 static u32 unsupported_ops[] = {
 	offsetof(struct tcp_congestion_ops, get_info),
 };
@@ -51,18 +39,6 @@ static int bpf_tcp_ca_init(struct btf *btf)
 	return 0;
 }
 
-static bool is_optional(u32 member_offset)
-{
-	unsigned int i;
-
-	for (i = 0; i < ARRAY_SIZE(optional_ops); i++) {
-		if (member_offset == optional_ops[i])
-			return true;
-	}
-
-	return false;
-}
-
 static bool is_unsupported(u32 member_offset)
 {
 	unsigned int i;
@@ -246,7 +222,6 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t,
 {
 	const struct tcp_congestion_ops *utcp_ca;
 	struct tcp_congestion_ops *tcp_ca;
-	int prog_fd;
 	u32 moff;
 
 	utcp_ca = (const struct tcp_congestion_ops *)udata;
@@ -268,14 +243,6 @@ static int bpf_tcp_ca_init_member(const struct btf_type *t,
 		return 1;
 	}
 
-	if (!btf_type_resolve_func_ptr(btf_vmlinux, member->type, NULL))
-		return 0;
-
-	/* Ensure bpf_prog is provided for compulsory func ptr */
-	prog_fd = (int)(*(unsigned long *)(udata + moff));
-	if (!prog_fd && !is_optional(moff) && !is_unsupported(moff))
-		return -EINVAL;
-
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_*
  2022-06-14 10:44 [PATCH bpf-next v3 0/5] Align BPF TCP CCs implementing cong_control() with non-BPF CCs Jörn-Thorben Hinz
  2022-06-14 10:44 ` [PATCH bpf-next v3 1/5] bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status Jörn-Thorben Hinz
  2022-06-14 10:44 ` [PATCH bpf-next v3 2/5] bpf: Require only one of cong_avoid() and cong_control() from a TCP CC Jörn-Thorben Hinz
@ 2022-06-14 10:44 ` Jörn-Thorben Hinz
  2022-06-17 21:04   ` Martin KaFai Lau
  2022-06-14 10:44 ` [PATCH bpf-next v3 4/5] selftests/bpf: Test an incomplete BPF CC Jörn-Thorben Hinz
  2022-06-14 10:44 ` [PATCH bpf-next v3 5/5] selftests/bpf: Test a BPF CC implementing the unsupported get_info() Jörn-Thorben Hinz
  4 siblings, 1 reply; 14+ messages in thread
From: Jörn-Thorben Hinz @ 2022-06-14 10:44 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, netdev, Jörn-Thorben Hinz

Test whether a TCP CC implemented in BPF is allowed to write
sk_pacing_rate and sk_pacing_status in struct sock. This is needed when
cong_control() is implemented and used.

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
---
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 21 +++++++
 .../bpf/progs/tcp_ca_write_sk_pacing.c        | 60 +++++++++++++++++++
 2 files changed, 81 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
index e9a9a31b2ffe..a797497e2864 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
@@ -9,6 +9,7 @@
 #include "bpf_cubic.skel.h"
 #include "bpf_tcp_nogpl.skel.h"
 #include "bpf_dctcp_release.skel.h"
+#include "tcp_ca_write_sk_pacing.skel.h"
 
 #ifndef ENOTSUPP
 #define ENOTSUPP 524
@@ -322,6 +323,24 @@ static void test_rel_setsockopt(void)
 	bpf_dctcp_release__destroy(rel_skel);
 }
 
+static void test_write_sk_pacing(void)
+{
+	struct tcp_ca_write_sk_pacing *skel;
+	struct bpf_link *link;
+
+	skel = tcp_ca_write_sk_pacing__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load")) {
+		return;
+	}
+
+	link = bpf_map__attach_struct_ops(skel->maps.write_sk_pacing);
+	if (ASSERT_OK_PTR(link, "attach_struct_ops")) {
+		bpf_link__destroy(link);
+	}
+
+	tcp_ca_write_sk_pacing__destroy(skel);
+}
+
 void test_bpf_tcp_ca(void)
 {
 	if (test__start_subtest("dctcp"))
@@ -334,4 +353,6 @@ void test_bpf_tcp_ca(void)
 		test_dctcp_fallback();
 	if (test__start_subtest("rel_setsockopt"))
 		test_rel_setsockopt();
+	if (test__start_subtest("write_sk_pacing"))
+		test_write_sk_pacing();
 }
diff --git a/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
new file mode 100644
index 000000000000..43447704cf0e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#define USEC_PER_SEC 1000000UL
+
+#define min(a, b) ((a) < (b) ? (a) : (b))
+
+static inline struct tcp_sock *tcp_sk(const struct sock *sk)
+{
+	return (struct tcp_sock *)sk;
+}
+
+SEC("struct_ops/write_sk_pacing_init")
+void BPF_PROG(write_sk_pacing_init, struct sock *sk)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+	__sync_bool_compare_and_swap(&sk->sk_pacing_status, SK_PACING_NONE,
+				     SK_PACING_NEEDED);
+#else
+	sk->sk_pacing_status = SK_PACING_NEEDED;
+#endif
+}
+
+SEC("struct_ops/write_sk_pacing_cong_control")
+void BPF_PROG(write_sk_pacing_cong_control, struct sock *sk,
+	      const struct rate_sample *rs)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	unsigned long rate =
+		((tp->snd_cwnd * tp->mss_cache * USEC_PER_SEC) << 3) /
+		(tp->srtt_us ?: 1U << 3);
+	sk->sk_pacing_rate = min(rate, sk->sk_max_pacing_rate);
+}
+
+SEC("struct_ops/write_sk_pacing_ssthresh")
+__u32 BPF_PROG(write_sk_pacing_ssthresh, struct sock *sk)
+{
+	return tcp_sk(sk)->snd_ssthresh;
+}
+
+SEC("struct_ops/write_sk_pacing_undo_cwnd")
+__u32 BPF_PROG(write_sk_pacing_undo_cwnd, struct sock *sk)
+{
+	return tcp_sk(sk)->snd_cwnd;
+}
+
+SEC(".struct_ops")
+struct tcp_congestion_ops write_sk_pacing = {
+	.init = (void *)write_sk_pacing_init,
+	.cong_control = (void *)write_sk_pacing_cong_control,
+	.ssthresh = (void *)write_sk_pacing_ssthresh,
+	.undo_cwnd = (void *)write_sk_pacing_undo_cwnd,
+	.name = "bpf_w_sk_pacing",
+};
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 4/5] selftests/bpf: Test an incomplete BPF CC
  2022-06-14 10:44 [PATCH bpf-next v3 0/5] Align BPF TCP CCs implementing cong_control() with non-BPF CCs Jörn-Thorben Hinz
                   ` (2 preceding siblings ...)
  2022-06-14 10:44 ` [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_* Jörn-Thorben Hinz
@ 2022-06-14 10:44 ` Jörn-Thorben Hinz
  2022-06-14 10:44 ` [PATCH bpf-next v3 5/5] selftests/bpf: Test a BPF CC implementing the unsupported get_info() Jörn-Thorben Hinz
  4 siblings, 0 replies; 14+ messages in thread
From: Jörn-Thorben Hinz @ 2022-06-14 10:44 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, netdev, Jörn-Thorben Hinz

Test whether a TCP CC implemented in BPF providing neither cong_avoid()
nor cong_control() is correctly rejected. This check solely depends on
tcp_register_congestion_control() now, which is invoked during
bpf_map__attach_struct_ops().

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
---
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 25 +++++++++++++
 .../bpf/progs/tcp_ca_incompl_cong_ops.c       | 35 +++++++++++++++++++
 2 files changed, 60 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_incompl_cong_ops.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
index a797497e2864..cc2a8a12ca67 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
@@ -10,6 +10,7 @@
 #include "bpf_tcp_nogpl.skel.h"
 #include "bpf_dctcp_release.skel.h"
 #include "tcp_ca_write_sk_pacing.skel.h"
+#include "tcp_ca_incompl_cong_ops.skel.h"
 
 #ifndef ENOTSUPP
 #define ENOTSUPP 524
@@ -341,6 +342,28 @@ static void test_write_sk_pacing(void)
 	tcp_ca_write_sk_pacing__destroy(skel);
 }
 
+static void test_incompl_cong_ops(void)
+{
+	struct tcp_ca_incompl_cong_ops *skel;
+	struct bpf_link *link;
+
+	skel = tcp_ca_incompl_cong_ops__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load")) {
+		return;
+	}
+
+	/* That cong_avoid() and cong_control() are missing is only reported at
+	 * this point:
+	 */
+	link = bpf_map__attach_struct_ops(skel->maps.incompl_cong_ops);
+	if (!ASSERT_ERR_PTR(link, "attach_struct_ops")) {
+		/* Unexpected success?! */
+		bpf_link__destroy(link);
+	}
+
+	tcp_ca_incompl_cong_ops__destroy(skel);
+}
+
 void test_bpf_tcp_ca(void)
 {
 	if (test__start_subtest("dctcp"))
@@ -355,4 +378,6 @@ void test_bpf_tcp_ca(void)
 		test_rel_setsockopt();
 	if (test__start_subtest("write_sk_pacing"))
 		test_write_sk_pacing();
+	if (test__start_subtest("incompl_cong_ops"))
+		test_incompl_cong_ops();
 }
diff --git a/tools/testing/selftests/bpf/progs/tcp_ca_incompl_cong_ops.c b/tools/testing/selftests/bpf/progs/tcp_ca_incompl_cong_ops.c
new file mode 100644
index 000000000000..7bb872fb22dd
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tcp_ca_incompl_cong_ops.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+static inline struct tcp_sock *tcp_sk(const struct sock *sk)
+{
+	return (struct tcp_sock *)sk;
+}
+
+SEC("struct_ops/incompl_cong_ops_ssthresh")
+__u32 BPF_PROG(incompl_cong_ops_ssthresh, struct sock *sk)
+{
+	return tcp_sk(sk)->snd_ssthresh;
+}
+
+SEC("struct_ops/incompl_cong_ops_undo_cwnd")
+__u32 BPF_PROG(incompl_cong_ops_undo_cwnd, struct sock *sk)
+{
+	return tcp_sk(sk)->snd_cwnd;
+}
+
+SEC(".struct_ops")
+struct tcp_congestion_ops incompl_cong_ops = {
+	/* Intentionally leaving out any of the required cong_avoid() and
+	 * cong_control() here.
+	 */
+	.ssthresh = (void *)incompl_cong_ops_ssthresh,
+	.undo_cwnd = (void *)incompl_cong_ops_undo_cwnd,
+	.name = "bpf_incompl_ops",
+};
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH bpf-next v3 5/5] selftests/bpf: Test a BPF CC implementing the unsupported get_info()
  2022-06-14 10:44 [PATCH bpf-next v3 0/5] Align BPF TCP CCs implementing cong_control() with non-BPF CCs Jörn-Thorben Hinz
                   ` (3 preceding siblings ...)
  2022-06-14 10:44 ` [PATCH bpf-next v3 4/5] selftests/bpf: Test an incomplete BPF CC Jörn-Thorben Hinz
@ 2022-06-14 10:44 ` Jörn-Thorben Hinz
  2022-06-17 21:22   ` Martin KaFai Lau
  4 siblings, 1 reply; 14+ messages in thread
From: Jörn-Thorben Hinz @ 2022-06-14 10:44 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, netdev, Jörn-Thorben Hinz

Test whether a TCP CC implemented in BPF providing get_info() is
rejected correctly. get_info() is unsupported in a BPF CC. The check for
required functions in a BPF CC has moved, this test ensures unsupported
functions are still rejected correctly.

Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
---
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 20 ++++++++++++++++++
 .../bpf/progs/tcp_ca_unsupp_cong_op.c         | 21 +++++++++++++++++++
 2 files changed, 41 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_unsupp_cong_op.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
index cc2a8a12ca67..b15a4b6ea4ae 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
@@ -11,6 +11,7 @@
 #include "bpf_dctcp_release.skel.h"
 #include "tcp_ca_write_sk_pacing.skel.h"
 #include "tcp_ca_incompl_cong_ops.skel.h"
+#include "tcp_ca_unsupp_cong_op.skel.h"
 
 #ifndef ENOTSUPP
 #define ENOTSUPP 524
@@ -364,6 +365,23 @@ static void test_incompl_cong_ops(void)
 	tcp_ca_incompl_cong_ops__destroy(skel);
 }
 
+static void test_unsupp_cong_op(void)
+{
+	libbpf_print_fn_t old_print_fn;
+	struct tcp_ca_unsupp_cong_op *skel;
+
+	err_str = "attach to unsupported member get_info";
+	found = false;
+	old_print_fn = libbpf_set_print(libbpf_debug_print);
+
+	skel = tcp_ca_unsupp_cong_op__open_and_load();
+	ASSERT_NULL(skel, "open_and_load");
+	ASSERT_EQ(found, true, "expected_err_msg");
+
+	tcp_ca_unsupp_cong_op__destroy(skel);
+	libbpf_set_print(old_print_fn);
+}
+
 void test_bpf_tcp_ca(void)
 {
 	if (test__start_subtest("dctcp"))
@@ -380,4 +398,6 @@ void test_bpf_tcp_ca(void)
 		test_write_sk_pacing();
 	if (test__start_subtest("incompl_cong_ops"))
 		test_incompl_cong_ops();
+	if (test__start_subtest("unsupp_cong_op"))
+		test_unsupp_cong_op();
 }
diff --git a/tools/testing/selftests/bpf/progs/tcp_ca_unsupp_cong_op.c b/tools/testing/selftests/bpf/progs/tcp_ca_unsupp_cong_op.c
new file mode 100644
index 000000000000..c06f4a41c21a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tcp_ca_unsupp_cong_op.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("struct_ops/unsupp_cong_op_get_info")
+size_t BPF_PROG(unsupp_cong_op_get_info, struct sock *sk, u32 ext, int *attr,
+		union tcp_cc_info *info)
+{
+	return 0;
+}
+
+SEC(".struct_ops")
+struct tcp_congestion_ops unsupp_cong_op = {
+	.get_info = (void *)unsupp_cong_op_get_info,
+	.name = "bpf_unsupp_op",
+};
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 1/5] bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status
  2022-06-14 10:44 ` [PATCH bpf-next v3 1/5] bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status Jörn-Thorben Hinz
@ 2022-06-17 20:21   ` Martin KaFai Lau
  0 siblings, 0 replies; 14+ messages in thread
From: Martin KaFai Lau @ 2022-06-17 20:21 UTC (permalink / raw)
  To: Jörn-Thorben Hinz
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev

On Tue, Jun 14, 2022 at 12:44:48PM +0200, Jörn-Thorben Hinz wrote:
> A CC that implements tcp_congestion_ops.cong_control() should be able to
> control sk_pacing_rate and set sk_pacing_status, since
> tcp_update_pacing_rate() is never called in this case. A built-in CC or
> one from a kernel module is already able to write to both struct sock
> members. For a BPF program, write access has not been allowed, yet.
> 
> Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 2/5] bpf: Require only one of cong_avoid() and cong_control() from a TCP CC
  2022-06-14 10:44 ` [PATCH bpf-next v3 2/5] bpf: Require only one of cong_avoid() and cong_control() from a TCP CC Jörn-Thorben Hinz
@ 2022-06-17 20:26   ` Martin KaFai Lau
  0 siblings, 0 replies; 14+ messages in thread
From: Martin KaFai Lau @ 2022-06-17 20:26 UTC (permalink / raw)
  To: Jörn-Thorben Hinz
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev

On Tue, Jun 14, 2022 at 12:44:49PM +0200, Jörn-Thorben Hinz wrote:
> Remove the check for required and optional functions in a struct
> tcp_congestion_ops from bpf_tcp_ca.c. Rely on
> tcp_register_congestion_control() to reject a BPF CC that does not
> implement all required functions, as it will do for a non-BPF CC.
> 
> When a CC implements tcp_congestion_ops.cong_control(), the alternate
> cong_avoid() is not in use in the TCP stack. Previously, a BPF CC was
> still forced to implement cong_avoid() as a no-op since it was
> non-optional in bpf_tcp_ca.c.
> 
> Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_*
  2022-06-14 10:44 ` [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_* Jörn-Thorben Hinz
@ 2022-06-17 21:04   ` Martin KaFai Lau
  2022-06-18 16:43     ` Jörn-Thorben Hinz
  0 siblings, 1 reply; 14+ messages in thread
From: Martin KaFai Lau @ 2022-06-17 21:04 UTC (permalink / raw)
  To: Jörn-Thorben Hinz
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev

On Tue, Jun 14, 2022 at 12:44:50PM +0200, Jörn-Thorben Hinz wrote:
> Test whether a TCP CC implemented in BPF is allowed to write
> sk_pacing_rate and sk_pacing_status in struct sock. This is needed when
> cong_control() is implemented and used.
> 
> Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
> ---
>  .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 21 +++++++
>  .../bpf/progs/tcp_ca_write_sk_pacing.c        | 60 +++++++++++++++++++
>  2 files changed, 81 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
> index e9a9a31b2ffe..a797497e2864 100644
> --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
> @@ -9,6 +9,7 @@
>  #include "bpf_cubic.skel.h"
>  #include "bpf_tcp_nogpl.skel.h"
>  #include "bpf_dctcp_release.skel.h"
> +#include "tcp_ca_write_sk_pacing.skel.h"
>  
>  #ifndef ENOTSUPP
>  #define ENOTSUPP 524
> @@ -322,6 +323,24 @@ static void test_rel_setsockopt(void)
>  	bpf_dctcp_release__destroy(rel_skel);
>  }
>  
> +static void test_write_sk_pacing(void)
> +{
> +	struct tcp_ca_write_sk_pacing *skel;
> +	struct bpf_link *link;
> +
> +	skel = tcp_ca_write_sk_pacing__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "open_and_load")) {
nit. Remove this single line '{'.

./scripts/checkpatch.pl has reported that also:
WARNING: braces {} are not necessary for single statement blocks
#43: FILE: tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c:332:
+	if (!ASSERT_OK_PTR(skel, "open_and_load")) {
+		return;
+	}


> +		return;
> +	}
> +
> +	link = bpf_map__attach_struct_ops(skel->maps.write_sk_pacing);
> +	if (ASSERT_OK_PTR(link, "attach_struct_ops")) {
Same here.

and no need to check the link before bpf_link__destroy.
bpf_link__destroy can handle error link.  Something like:

	ASSERT_OK_PTR(link, "attach_struct_ops");
	bpf_link__destroy(link);
	tcp_ca_write_sk_pacing__destroy(skel);

The earlier examples in test_cubic and test_dctcp were
written before bpf_link__destroy can handle error link.

> +		bpf_link__destroy(link);
> +	}
> +
> +	tcp_ca_write_sk_pacing__destroy(skel);
> +}
> +
>  void test_bpf_tcp_ca(void)
>  {
>  	if (test__start_subtest("dctcp"))
> @@ -334,4 +353,6 @@ void test_bpf_tcp_ca(void)
>  		test_dctcp_fallback();
>  	if (test__start_subtest("rel_setsockopt"))
>  		test_rel_setsockopt();
> +	if (test__start_subtest("write_sk_pacing"))
> +		test_write_sk_pacing();
>  }
> diff --git a/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> new file mode 100644
> index 000000000000..43447704cf0e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> @@ -0,0 +1,60 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include "vmlinux.h"
> +
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +#define USEC_PER_SEC 1000000UL
> +
> +#define min(a, b) ((a) < (b) ? (a) : (b))
> +
> +static inline struct tcp_sock *tcp_sk(const struct sock *sk)
> +{
This helper is already available in bpf_tcp_helpers.h.
Is there a reason not to use that one and redefine
it in both patch 3 and 4?  The mss_cache and srtt_us can be added
to bpf_tcp_helpers.h.  It will need another effort to move
all selftest's bpf-cc to vmlinux.h.

> +	return (struct tcp_sock *)sk;
> +}
> +
> +SEC("struct_ops/write_sk_pacing_init")
> +void BPF_PROG(write_sk_pacing_init, struct sock *sk)
> +{
> +#ifdef ENABLE_ATOMICS_TESTS
> +	__sync_bool_compare_and_swap(&sk->sk_pacing_status, SK_PACING_NONE,
> +				     SK_PACING_NEEDED);
> +#else
> +	sk->sk_pacing_status = SK_PACING_NEEDED;
> +#endif
> +}
> +
> +SEC("struct_ops/write_sk_pacing_cong_control")
> +void BPF_PROG(write_sk_pacing_cong_control, struct sock *sk,
> +	      const struct rate_sample *rs)
> +{
> +	const struct tcp_sock *tp = tcp_sk(sk);
> +	unsigned long rate =
> +		((tp->snd_cwnd * tp->mss_cache * USEC_PER_SEC) << 3) /
> +		(tp->srtt_us ?: 1U << 3);
> +	sk->sk_pacing_rate = min(rate, sk->sk_max_pacing_rate);
> +}
> +
> +SEC("struct_ops/write_sk_pacing_ssthresh")
> +__u32 BPF_PROG(write_sk_pacing_ssthresh, struct sock *sk)
> +{
> +	return tcp_sk(sk)->snd_ssthresh;
> +}
> +
> +SEC("struct_ops/write_sk_pacing_undo_cwnd")
> +__u32 BPF_PROG(write_sk_pacing_undo_cwnd, struct sock *sk)
> +{
> +	return tcp_sk(sk)->snd_cwnd;
> +}
> +
> +SEC(".struct_ops")
> +struct tcp_congestion_ops write_sk_pacing = {
> +	.init = (void *)write_sk_pacing_init,
> +	.cong_control = (void *)write_sk_pacing_cong_control,
> +	.ssthresh = (void *)write_sk_pacing_ssthresh,
> +	.undo_cwnd = (void *)write_sk_pacing_undo_cwnd,
> +	.name = "bpf_w_sk_pacing",
> +};
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 5/5] selftests/bpf: Test a BPF CC implementing the unsupported get_info()
  2022-06-14 10:44 ` [PATCH bpf-next v3 5/5] selftests/bpf: Test a BPF CC implementing the unsupported get_info() Jörn-Thorben Hinz
@ 2022-06-17 21:22   ` Martin KaFai Lau
  0 siblings, 0 replies; 14+ messages in thread
From: Martin KaFai Lau @ 2022-06-17 21:22 UTC (permalink / raw)
  To: Jörn-Thorben Hinz
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev

On Tue, Jun 14, 2022 at 12:44:52PM +0200, Jörn-Thorben Hinz wrote:
> Test whether a TCP CC implemented in BPF providing get_info() is
> rejected correctly. get_info() is unsupported in a BPF CC. The check for
> required functions in a BPF CC has moved, this test ensures unsupported
> functions are still rejected correctly.
> 
> Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_*
  2022-06-17 21:04   ` Martin KaFai Lau
@ 2022-06-18 16:43     ` Jörn-Thorben Hinz
  2022-06-20 16:06       ` Yonghong Song
  0 siblings, 1 reply; 14+ messages in thread
From: Jörn-Thorben Hinz @ 2022-06-18 16:43 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev

On Fri, 2022-06-17 at 14:04 -0700, Martin KaFai Lau wrote:
> On Tue, Jun 14, 2022 at 12:44:50PM +0200, Jörn-Thorben Hinz wrote:
> > Test whether a TCP CC implemented in BPF is allowed to write
> > sk_pacing_rate and sk_pacing_status in struct sock. This is needed
> > when
> > cong_control() is implemented and used.
> > 
> > Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
> > ---
> >  .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 21 +++++++
> >  .../bpf/progs/tcp_ca_write_sk_pacing.c        | 60
> > +++++++++++++++++++
> >  2 files changed, 81 insertions(+)
> >  create mode 100644
> > tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > 
> > diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
> > b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
> > index e9a9a31b2ffe..a797497e2864 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
> > @@ -9,6 +9,7 @@
> >  #include "bpf_cubic.skel.h"
> >  #include "bpf_tcp_nogpl.skel.h"
> >  #include "bpf_dctcp_release.skel.h"
> > +#include "tcp_ca_write_sk_pacing.skel.h"
> >  
> >  #ifndef ENOTSUPP
> >  #define ENOTSUPP 524
> > @@ -322,6 +323,24 @@ static void test_rel_setsockopt(void)
> >         bpf_dctcp_release__destroy(rel_skel);
> >  }
> >  
> > +static void test_write_sk_pacing(void)
> > +{
> > +       struct tcp_ca_write_sk_pacing *skel;
> > +       struct bpf_link *link;
> > +
> > +       skel = tcp_ca_write_sk_pacing__open_and_load();
> > +       if (!ASSERT_OK_PTR(skel, "open_and_load")) {
> nit. Remove this single line '{'.
> 
> ./scripts/checkpatch.pl has reported that also:
> WARNING: braces {} are not necessary for single statement blocks
> #43: FILE: tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c:332:
> +       if (!ASSERT_OK_PTR(skel, "open_and_load")) {
> +               return;
> +       }
Have to admit I knowingly disregarded that warning as more of a
recommendation. Out of habit and since I personally don’t see any
compelling reason to generally use single-line statements after ifs,
only multiple disadvantages.

But wrong place to argue here, of course. Will bow to the warning.

> 
> 
> > +               return;
> > +       }
> > +
> > +       link = bpf_map__attach_struct_ops(skel-
> > >maps.write_sk_pacing);
> > +       if (ASSERT_OK_PTR(link, "attach_struct_ops")) {
> Same here.
> 
> and no need to check the link before bpf_link__destroy.
> bpf_link__destroy can handle error link.  Something like:
> 
>         ASSERT_OK_PTR(link, "attach_struct_ops");
>         bpf_link__destroy(link);
>         tcp_ca_write_sk_pacing__destroy(skel);
> 
> The earlier examples in test_cubic and test_dctcp were
> written before bpf_link__destroy can handle error link.
You are right, I followed the other two test_*() functions there. Good
to know that it behaves similar to (k)free() and others. Will remove
the ifs around bpf_link__destroy().

> 
> > +               bpf_link__destroy(link);
> > +       }
> > +
> > +       tcp_ca_write_sk_pacing__destroy(skel);
> > +}
> > +
> >  void test_bpf_tcp_ca(void)
> >  {
> >         if (test__start_subtest("dctcp"))
> > @@ -334,4 +353,6 @@ void test_bpf_tcp_ca(void)
> >                 test_dctcp_fallback();
> >         if (test__start_subtest("rel_setsockopt"))
> >                 test_rel_setsockopt();
> > +       if (test__start_subtest("write_sk_pacing"))
> > +               test_write_sk_pacing();
> >  }
> > diff --git
> > a/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > new file mode 100644
> > index 000000000000..43447704cf0e
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > @@ -0,0 +1,60 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include "vmlinux.h"
> > +
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +#define USEC_PER_SEC 1000000UL
> > +
> > +#define min(a, b) ((a) < (b) ? (a) : (b))
> > +
> > +static inline struct tcp_sock *tcp_sk(const struct sock *sk)
> > +{
> This helper is already available in bpf_tcp_helpers.h.
> Is there a reason not to use that one and redefine
> it in both patch 3 and 4?  The mss_cache and srtt_us can be added
> to bpf_tcp_helpers.h.  It will need another effort to move
> all selftest's bpf-cc to vmlinux.h.
I fully agree it’s not elegant to redefine tcp_sk() twice more.

It was between either using bpf_tcp_helpers.h and adding and
maintaining additional struct members there. Or using the (as I
understood it) more “modern” approach with vmlinux.h and redefining the
trivial tcp_sk(). I chose the later. Didn’t see a reason not to slowly
introduce vmlinux.h into the CA tests.

I had the same dilemma for the algorithm I’m implementing: Reuse
bpf_tcp_helpers.h from the kernel tree and extend it. Or use vmlinux.h
and copy only some of the (mostly trivial) helper functions. Also chose
the later here.

While doing the above, I also considered extracting the type
declarations from bpf_tcp_helpers.h into an, e.g.,
bpf_tcp_types_helper.h, keeping only the functions in
bpf_tcp_helpers.h. bpf_tcp_helpers.h could have been a base helper for
any BPF CA implementation then and used with either vmlinux.h or the
“old-school” includes. Similar to the way bpf_helpers.h is used. But at
that point, a bpf_tcp_types_helper.h could have probably just been
dropped for good and in favor of vmlinux.h. So I didn’t continue with
that.

Do you insist to use bpf_tcp_helpers.h instead of vmlinux.h? Or could
the described split into two headers make sense after all?

(Will wait for your reply here before sending a v4.)

> 
> > +       return (struct tcp_sock *)sk;
> > +}
> > +
> > +SEC("struct_ops/write_sk_pacing_init")
> > +void BPF_PROG(write_sk_pacing_init, struct sock *sk)
> > +{
> > +#ifdef ENABLE_ATOMICS_TESTS
> > +       __sync_bool_compare_and_swap(&sk->sk_pacing_status,
> > SK_PACING_NONE,
> > +                                    SK_PACING_NEEDED);
> > +#else
> > +       sk->sk_pacing_status = SK_PACING_NEEDED;
> > +#endif
> > +}
> > +
> > +SEC("struct_ops/write_sk_pacing_cong_control")
> > +void BPF_PROG(write_sk_pacing_cong_control, struct sock *sk,
> > +             const struct rate_sample *rs)
> > +{
> > +       const struct tcp_sock *tp = tcp_sk(sk);
> > +       unsigned long rate =
> > +               ((tp->snd_cwnd * tp->mss_cache * USEC_PER_SEC) <<
> > 3) /
> > +               (tp->srtt_us ?: 1U << 3);
> > +       sk->sk_pacing_rate = min(rate, sk->sk_max_pacing_rate);
> > +}
> > +
> > +SEC("struct_ops/write_sk_pacing_ssthresh")
> > +__u32 BPF_PROG(write_sk_pacing_ssthresh, struct sock *sk)
> > +{
> > +       return tcp_sk(sk)->snd_ssthresh;
> > +}
> > +
> > +SEC("struct_ops/write_sk_pacing_undo_cwnd")
> > +__u32 BPF_PROG(write_sk_pacing_undo_cwnd, struct sock *sk)
> > +{
> > +       return tcp_sk(sk)->snd_cwnd;
> > +}
> > +
> > +SEC(".struct_ops")
> > +struct tcp_congestion_ops write_sk_pacing = {
> > +       .init = (void *)write_sk_pacing_init,
> > +       .cong_control = (void *)write_sk_pacing_cong_control,
> > +       .ssthresh = (void *)write_sk_pacing_ssthresh,
> > +       .undo_cwnd = (void *)write_sk_pacing_undo_cwnd,
> > +       .name = "bpf_w_sk_pacing",
> > +};
> > -- 
> > 2.30.2
> > 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_*
  2022-06-18 16:43     ` Jörn-Thorben Hinz
@ 2022-06-20 16:06       ` Yonghong Song
  2022-06-20 18:08         ` Martin KaFai Lau
  0 siblings, 1 reply; 14+ messages in thread
From: Yonghong Song @ 2022-06-20 16:06 UTC (permalink / raw)
  To: Jörn-Thorben Hinz, Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev



On 6/18/22 9:43 AM, Jörn-Thorben Hinz wrote:
> On Fri, 2022-06-17 at 14:04 -0700, Martin KaFai Lau wrote:
>> On Tue, Jun 14, 2022 at 12:44:50PM +0200, Jörn-Thorben Hinz wrote:
>>> Test whether a TCP CC implemented in BPF is allowed to write
>>> sk_pacing_rate and sk_pacing_status in struct sock. This is needed
>>> when
>>> cong_control() is implemented and used.
>>>
>>> Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de>
>>> ---
>>>   .../selftests/bpf/prog_tests/bpf_tcp_ca.c     | 21 +++++++
>>>   .../bpf/progs/tcp_ca_write_sk_pacing.c        | 60
>>> +++++++++++++++++++
>>>   2 files changed, 81 insertions(+)
>>>   create mode 100644
>>> tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
>>>
>>> diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
>>> b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
>>> index e9a9a31b2ffe..a797497e2864 100644
>>> --- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
>>> +++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
>>> @@ -9,6 +9,7 @@
>>>   #include "bpf_cubic.skel.h"
>>>   #include "bpf_tcp_nogpl.skel.h"
>>>   #include "bpf_dctcp_release.skel.h"
>>> +#include "tcp_ca_write_sk_pacing.skel.h"
>>>   
>>>   #ifndef ENOTSUPP
>>>   #define ENOTSUPP 524
>>> @@ -322,6 +323,24 @@ static void test_rel_setsockopt(void)
>>>          bpf_dctcp_release__destroy(rel_skel);
>>>   }
>>>   
>>> +static void test_write_sk_pacing(void)
>>> +{
>>> +       struct tcp_ca_write_sk_pacing *skel;
>>> +       struct bpf_link *link;
>>> +
>>> +       skel = tcp_ca_write_sk_pacing__open_and_load();
>>> +       if (!ASSERT_OK_PTR(skel, "open_and_load")) {
>> nit. Remove this single line '{'.
>>
>> ./scripts/checkpatch.pl has reported that also:
>> WARNING: braces {} are not necessary for single statement blocks
>> #43: FILE: tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c:332:
>> +       if (!ASSERT_OK_PTR(skel, "open_and_load")) {
>> +               return;
>> +       }
> Have to admit I knowingly disregarded that warning as more of a
> recommendation. Out of habit and since I personally don’t see any
> compelling reason to generally use single-line statements after ifs,
> only multiple disadvantages.
> 
> But wrong place to argue here, of course. Will bow to the warning.
> 
>>
>>
>>> +               return;
>>> +       }
>>> +
>>> +       link = bpf_map__attach_struct_ops(skel-
>>>> maps.write_sk_pacing);
>>> +       if (ASSERT_OK_PTR(link, "attach_struct_ops")) {
>> Same here.
>>
>> and no need to check the link before bpf_link__destroy.
>> bpf_link__destroy can handle error link.  Something like:
>>
>>          ASSERT_OK_PTR(link, "attach_struct_ops");
>>          bpf_link__destroy(link);
>>          tcp_ca_write_sk_pacing__destroy(skel);
>>
>> The earlier examples in test_cubic and test_dctcp were
>> written before bpf_link__destroy can handle error link.
> You are right, I followed the other two test_*() functions there. Good
> to know that it behaves similar to (k)free() and others. Will remove
> the ifs around bpf_link__destroy().
> 
>>
>>> +               bpf_link__destroy(link);
>>> +       }
>>> +
>>> +       tcp_ca_write_sk_pacing__destroy(skel);
>>> +}
>>> +
>>>   void test_bpf_tcp_ca(void)
>>>   {
>>>          if (test__start_subtest("dctcp"))
>>> @@ -334,4 +353,6 @@ void test_bpf_tcp_ca(void)
>>>                  test_dctcp_fallback();
>>>          if (test__start_subtest("rel_setsockopt"))
>>>                  test_rel_setsockopt();
>>> +       if (test__start_subtest("write_sk_pacing"))
>>> +               test_write_sk_pacing();
>>>   }
>>> diff --git
>>> a/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
>>> b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
>>> new file mode 100644
>>> index 000000000000..43447704cf0e
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
>>> @@ -0,0 +1,60 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +#include "vmlinux.h"
>>> +
>>> +#include <bpf/bpf_helpers.h>
>>> +#include <bpf/bpf_tracing.h>
>>> +
>>> +char _license[] SEC("license") = "GPL";
>>> +
>>> +#define USEC_PER_SEC 1000000UL
>>> +
>>> +#define min(a, b) ((a) < (b) ? (a) : (b))
>>> +
>>> +static inline struct tcp_sock *tcp_sk(const struct sock *sk)
>>> +{
>> This helper is already available in bpf_tcp_helpers.h.
>> Is there a reason not to use that one and redefine
>> it in both patch 3 and 4?  The mss_cache and srtt_us can be added
>> to bpf_tcp_helpers.h.  It will need another effort to move
>> all selftest's bpf-cc to vmlinux.h.
> I fully agree it’s not elegant to redefine tcp_sk() twice more.
> 
> It was between either using bpf_tcp_helpers.h and adding and
> maintaining additional struct members there. Or using the (as I
> understood it) more “modern” approach with vmlinux.h and redefining the
> trivial tcp_sk(). I chose the later. Didn’t see a reason not to slowly
> introduce vmlinux.h into the CA tests.
> 
> I had the same dilemma for the algorithm I’m implementing: Reuse
> bpf_tcp_helpers.h from the kernel tree and extend it. Or use vmlinux.h
> and copy only some of the (mostly trivial) helper functions. Also chose
> the later here.
> 
> While doing the above, I also considered extracting the type
> declarations from bpf_tcp_helpers.h into an, e.g.,
> bpf_tcp_types_helper.h, keeping only the functions in
> bpf_tcp_helpers.h. bpf_tcp_helpers.h could have been a base helper for
> any BPF CA implementation then and used with either vmlinux.h or the
> “old-school” includes. Similar to the way bpf_helpers.h is used. But at
> that point, a bpf_tcp_types_helper.h could have probably just been
> dropped for good and in favor of vmlinux.h. So I didn’t continue with
> that.
> 
> Do you insist to use bpf_tcp_helpers.h instead of vmlinux.h? Or could
> the described split into two headers make sense after all?

I prefer to use vmlinux.h. Eventually we would like to use vmlinux.h
for progs which include bpf_tcp_healpers.h. Basically remove the struct
definitions in bpf_tcp_helpers.h and replacing "bpf.h, stddef.h, tcp.h 
..." with vmlinux.h. We may not be there yet, but that is the goal.

> 
> (Will wait for your reply here before sending a v4.)
> 
>>
>>> +       return (struct tcp_sock *)sk;
>>> +}
>>> +
>>> +SEC("struct_ops/write_sk_pacing_init")
>>> +void BPF_PROG(write_sk_pacing_init, struct sock *sk)
>>> +{
>>> +#ifdef ENABLE_ATOMICS_TESTS
>>> +       __sync_bool_compare_and_swap(&sk->sk_pacing_status,
>>> SK_PACING_NONE,
>>> +                                    SK_PACING_NEEDED);
>>> +#else
>>> +       sk->sk_pacing_status = SK_PACING_NEEDED;
>>> +#endif
>>> +}
>>> +
>>> +SEC("struct_ops/write_sk_pacing_cong_control")
>>> +void BPF_PROG(write_sk_pacing_cong_control, struct sock *sk,
>>> +             const struct rate_sample *rs)
>>> +{
>>> +       const struct tcp_sock *tp = tcp_sk(sk);
>>> +       unsigned long rate =
>>> +               ((tp->snd_cwnd * tp->mss_cache * USEC_PER_SEC) <<
>>> 3) /
>>> +               (tp->srtt_us ?: 1U << 3);
>>> +       sk->sk_pacing_rate = min(rate, sk->sk_max_pacing_rate);
>>> +}
>>> +
>>> +SEC("struct_ops/write_sk_pacing_ssthresh")
>>> +__u32 BPF_PROG(write_sk_pacing_ssthresh, struct sock *sk)
>>> +{
>>> +       return tcp_sk(sk)->snd_ssthresh;
>>> +}
>>> +
>>> +SEC("struct_ops/write_sk_pacing_undo_cwnd")
>>> +__u32 BPF_PROG(write_sk_pacing_undo_cwnd, struct sock *sk)
>>> +{
>>> +       return tcp_sk(sk)->snd_cwnd;
>>> +}
>>> +
>>> +SEC(".struct_ops")
>>> +struct tcp_congestion_ops write_sk_pacing = {
>>> +       .init = (void *)write_sk_pacing_init,
>>> +       .cong_control = (void *)write_sk_pacing_cong_control,
>>> +       .ssthresh = (void *)write_sk_pacing_ssthresh,
>>> +       .undo_cwnd = (void *)write_sk_pacing_undo_cwnd,
>>> +       .name = "bpf_w_sk_pacing",
>>> +};
>>> -- 
>>> 2.30.2
>>>
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_*
  2022-06-20 16:06       ` Yonghong Song
@ 2022-06-20 18:08         ` Martin KaFai Lau
  2022-06-22 18:48           ` Jörn-Thorben Hinz
  0 siblings, 1 reply; 14+ messages in thread
From: Martin KaFai Lau @ 2022-06-20 18:08 UTC (permalink / raw)
  To: Jörn-Thorben Hinz, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev

On Mon, Jun 20, 2022 at 09:06:13AM -0700, Yonghong Song wrote:
[ ... ]
> > > > a/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > > > b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > > > new file mode 100644
> > > > index 000000000000..43447704cf0e
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > > > @@ -0,0 +1,60 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +#include "vmlinux.h"
> > > > +
> > > > +#include <bpf/bpf_helpers.h>
> > > > +#include <bpf/bpf_tracing.h>
> > > > +
> > > > +char _license[] SEC("license") = "GPL";
> > > > +
> > > > +#define USEC_PER_SEC 1000000UL
> > > > +
> > > > +#define min(a, b) ((a) < (b) ? (a) : (b))
> > > > +
> > > > +static inline struct tcp_sock *tcp_sk(const struct sock *sk)
> > > > +{
> > > This helper is already available in bpf_tcp_helpers.h.
> > > Is there a reason not to use that one and redefine
> > > it in both patch 3 and 4?  The mss_cache and srtt_us can be added
> > > to bpf_tcp_helpers.h.  It will need another effort to move
> > > all selftest's bpf-cc to vmlinux.h.
> > I fully agree it’s not elegant to redefine tcp_sk() twice more.
> > 
> > It was between either using bpf_tcp_helpers.h and adding and
> > maintaining additional struct members there. Or using the (as I
> > understood it) more “modern” approach with vmlinux.h and redefining the
> > trivial tcp_sk(). I chose the later. Didn’t see a reason not to slowly
> > introduce vmlinux.h into the CA tests.
> > 
> > I had the same dilemma for the algorithm I’m implementing: Reuse
> > bpf_tcp_helpers.h from the kernel tree and extend it. Or use vmlinux.h
> > and copy only some of the (mostly trivial) helper functions. Also chose
> > the later here.
> > 
> > While doing the above, I also considered extracting the type
> > declarations from bpf_tcp_helpers.h into an, e.g.,
> > bpf_tcp_types_helper.h, keeping only the functions in
> > bpf_tcp_helpers.h. bpf_tcp_helpers.h could have been a base helper for
> > any BPF CA implementation then and used with either vmlinux.h or the
> > “old-school” includes. Similar to the way bpf_helpers.h is used. But at
> > that point, a bpf_tcp_types_helper.h could have probably just been
> > dropped for good and in favor of vmlinux.h. So I didn’t continue with
> > that.
I think a trimmed down bpf_tcp_helpers.h + vmlinux.h is good.  Basically
what Yonghong has suggested.  Not sure what you meant by 'old-school' includes.
I don't think it needs a new bpf_tcp_types_helper.h also. 

I think it makes sense to remove everything from bpf_tcp_helpers.h
that is already available from vmlinux.h.  bpf_tcp_helpers.h
should only have some macros and helpers left.  Then move
bpf_dctcp.c, bpf_cubic.c, and a few others under progs/ to
use vmlinux.h.  I haven't tried but it should be doable
from a quick look at bpf_cubic.c and bpf_dctcp.c.

I agree it is better to directly use the struct tcp_sock, inet_connection_sock,
inet_sock... from the vmlinux.h.  However, bpf_tcp_helpers.h does not
only have the struct and enum defined in the kernel.  It has some
helpers and macros (e.g. TCP_CONG_NEEDS_ECN, TCP_ECN_*) that are missing
from vmlinux.h.  These are actually used by the realistic bpf-tcp-cc like
bpf_cubic.c and bpf_dctcp.c.

The simple test in this patch is not a fully implemented bpf-tcp-cc and
it only needs to duplicate the tcp_sk() helper which looks ok at the beginning.
Later, this simple test will be copied-and-pasted to create another test.
These new tests may then need to duplicate more helpers and macros.
It was what I meant it needs separate patches to migrate all bpf-tcp-cc
tests to vmlinux.h.  Otherwise, when they are migrated to vmlinux.h later,
we have another pattern of tests that can be cleared up to remove
these helpers/macros duplication.

I don't mind to keep a duplicate tcp_sk() in this set for now
if you can do a follow up to move all bpf-tcp-cc tests
to this path (vmlinux.h + a trimmed down bpf_tcp_helpers.h) and then
remove the tcp_sk() duplication here.  This will be very useful.

> > 
> > Do you insist to use bpf_tcp_helpers.h instead of vmlinux.h? Or could
> > the described split into two headers make sense after all?
> 
> I prefer to use vmlinux.h. Eventually we would like to use vmlinux.h
> for progs which include bpf_tcp_healpers.h. Basically remove the struct
> definitions in bpf_tcp_helpers.h and replacing "bpf.h, stddef.h, tcp.h ..."
> with vmlinux.h. We may not be there yet, but that is the goal.
> 
> > 
> > (Will wait for your reply here before sending a v4.)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_*
  2022-06-20 18:08         ` Martin KaFai Lau
@ 2022-06-22 18:48           ` Jörn-Thorben Hinz
  0 siblings, 0 replies; 14+ messages in thread
From: Jörn-Thorben Hinz @ 2022-06-22 18:48 UTC (permalink / raw)
  To: Martin KaFai Lau, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, netdev

On Mon, 2022-06-20 at 11:08 -0700, Martin KaFai Lau wrote:
> On Mon, Jun 20, 2022 at 09:06:13AM -0700, Yonghong Song wrote:
> [ ... ]
> > > > > a/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > > > > b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > > > > new file mode 100644
> > > > > index 000000000000..43447704cf0e
> > > > > --- /dev/null
> > > > > +++
> > > > > b/tools/testing/selftests/bpf/progs/tcp_ca_write_sk_pacing.c
> > > > > @@ -0,0 +1,60 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0
> > > > > +
> > > > > +#include "vmlinux.h"
> > > > > +
> > > > > +#include <bpf/bpf_helpers.h>
> > > > > +#include <bpf/bpf_tracing.h>
> > > > > +
> > > > > +char _license[] SEC("license") = "GPL";
> > > > > +
> > > > > +#define USEC_PER_SEC 1000000UL
> > > > > +
> > > > > +#define min(a, b) ((a) < (b) ? (a) : (b))
> > > > > +
> > > > > +static inline struct tcp_sock *tcp_sk(const struct sock *sk)
> > > > > +{
> > > > This helper is already available in bpf_tcp_helpers.h.
> > > > Is there a reason not to use that one and redefine
> > > > it in both patch 3 and 4?  The mss_cache and srtt_us can be
> > > > added
> > > > to bpf_tcp_helpers.h.  It will need another effort to move
> > > > all selftest's bpf-cc to vmlinux.h.
> > > I fully agree it’s not elegant to redefine tcp_sk() twice more.
> > > 
> > > It was between either using bpf_tcp_helpers.h and adding and
> > > maintaining additional struct members there. Or using the (as I
> > > understood it) more “modern” approach with vmlinux.h and
> > > redefining the
> > > trivial tcp_sk(). I chose the later. Didn’t see a reason not to
> > > slowly
> > > introduce vmlinux.h into the CA tests.
> > > 
> > > I had the same dilemma for the algorithm I’m implementing: Reuse
> > > bpf_tcp_helpers.h from the kernel tree and extend it. Or use
> > > vmlinux.h
> > > and copy only some of the (mostly trivial) helper functions. Also
> > > chose
> > > the later here.
> > > 
> > > While doing the above, I also considered extracting the type
> > > declarations from bpf_tcp_helpers.h into an, e.g.,
> > > bpf_tcp_types_helper.h, keeping only the functions in
> > > bpf_tcp_helpers.h. bpf_tcp_helpers.h could have been a base
> > > helper for
> > > any BPF CA implementation then and used with either vmlinux.h or
> > > the
> > > “old-school” includes. Similar to the way bpf_helpers.h is used.
> > > But at
> > > that point, a bpf_tcp_types_helper.h could have probably just
> > > been
> > > dropped for good and in favor of vmlinux.h. So I didn’t continue
> > > with
> > > that.
> I think a trimmed down bpf_tcp_helpers.h + vmlinux.h is good. 
> Basically
> what Yonghong has suggested.  Not sure what you meant by 'old-school'
> includes.
That was badly worded by me. I was referring to linux/types.h and co.

> I don't think it needs a new bpf_tcp_types_helper.h also. 
I thought that might be helpful as a first step towards using
vmlinux.h, without fully migrating all the users of bpf_tcp_helpers.h.
But that’s probably unnecessary.

> 
> I think it makes sense to remove everything from bpf_tcp_helpers.h
> that is already available from vmlinux.h.  bpf_tcp_helpers.h
> should only have some macros and helpers left.  Then move
> bpf_dctcp.c, bpf_cubic.c, and a few others under progs/ to
> use vmlinux.h.  I haven't tried but it should be doable
> from a quick look at bpf_cubic.c and bpf_dctcp.c.
> 
> I agree it is better to directly use the struct tcp_sock,
> inet_connection_sock,
> inet_sock... from the vmlinux.h.  However, bpf_tcp_helpers.h does not
> only have the struct and enum defined in the kernel.  It has some
> helpers and macros (e.g. TCP_CONG_NEEDS_ECN, TCP_ECN_*) that are
> missing
> from vmlinux.h.
It’s unfortunate that especially those many, tiny non-macro helpers
like tcp_sk() and e.g. tcp_stamp_us_delta() still have to be redefined.
Could it make sense to provide those, in the same way tcp_slow_start(),
tcp_cong_avoid_ai(), and tcp_reno_*() are “exported” as kfuncs by
bpf_tcp_ca.c (if I read that correctly)? Even though the list of these
functions could grow quickly.


> These are actually used by the realistic bpf-tcp-cc like
> bpf_cubic.c and bpf_dctcp.c.
> 
> The simple test in this patch is not a fully implemented bpf-tcp-cc
> and
> it only needs to duplicate the tcp_sk() helper which looks ok at the
> beginning.
> Later, this simple test will be copied-and-pasted to create another
> test.
> These new tests may then need to duplicate more helpers and macros.
> It was what I meant it needs separate patches to migrate all bpf-tcp-
> cc
> tests to vmlinux.h.  Otherwise, when they are migrated to vmlinux.h
> later,
> we have another pattern of tests that can be cleared up to remove
> these helpers/macros duplication.
I agree with you there and also don’t favor copying stuff around. Only
did it here, since something had to be copied: either a field into one
of the structs in bpf_tcp_helpers or tcp_sk.

> 
> I don't mind to keep a duplicate tcp_sk() in this set for now
> if you can do a follow up to move all bpf-tcp-cc tests
> to this path (vmlinux.h + a trimmed down bpf_tcp_helpers.h) and then
> remove the tcp_sk() duplication here.  This will be very useful.
Took a look at this over the last few days, see [1]. Happy about
feedback.

[1]
https://lore.kernel.org/bpf/20220622181015.892445-1-jthinz@mailbox.tu-berlin.de/

> 
> > > 
> > > Do you insist to use bpf_tcp_helpers.h instead of vmlinux.h? Or
> > > could
> > > the described split into two headers make sense after all?
> > 
> > I prefer to use vmlinux.h. Eventually we would like to use
> > vmlinux.h
> > for progs which include bpf_tcp_healpers.h. Basically remove the
> > struct
> > definitions in bpf_tcp_helpers.h and replacing "bpf.h, stddef.h,
> > tcp.h ..."
> > with vmlinux.h. We may not be there yet, but that is the goal.
> > 
> > > 
> > > (Will wait for your reply here before sending a v4.)



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-22 18:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14 10:44 [PATCH bpf-next v3 0/5] Align BPF TCP CCs implementing cong_control() with non-BPF CCs Jörn-Thorben Hinz
2022-06-14 10:44 ` [PATCH bpf-next v3 1/5] bpf: Allow a TCP CC to write sk_pacing_rate and sk_pacing_status Jörn-Thorben Hinz
2022-06-17 20:21   ` Martin KaFai Lau
2022-06-14 10:44 ` [PATCH bpf-next v3 2/5] bpf: Require only one of cong_avoid() and cong_control() from a TCP CC Jörn-Thorben Hinz
2022-06-17 20:26   ` Martin KaFai Lau
2022-06-14 10:44 ` [PATCH bpf-next v3 3/5] selftests/bpf: Test a BPF CC writing sk_pacing_* Jörn-Thorben Hinz
2022-06-17 21:04   ` Martin KaFai Lau
2022-06-18 16:43     ` Jörn-Thorben Hinz
2022-06-20 16:06       ` Yonghong Song
2022-06-20 18:08         ` Martin KaFai Lau
2022-06-22 18:48           ` Jörn-Thorben Hinz
2022-06-14 10:44 ` [PATCH bpf-next v3 4/5] selftests/bpf: Test an incomplete BPF CC Jörn-Thorben Hinz
2022-06-14 10:44 ` [PATCH bpf-next v3 5/5] selftests/bpf: Test a BPF CC implementing the unsupported get_info() Jörn-Thorben Hinz
2022-06-17 21:22   ` Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).