All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] Split bpf_sock dst_port field
@ 2022-01-27 17:24 Jakub Sitnicki
  2022-01-27 17:24 ` [PATCH bpf-next 1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide Jakub Sitnicki
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2022-01-27 17:24 UTC (permalink / raw)
  To: bpf
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Menglong Dong, Martin KaFai Lau

This is a follow-up to discussion around the idea of making dst_port in struct
bpf_sock a 16-bit field that happened in [1]. I have fleshed it out further:

v1:
- keep dst_field offset unchanged to prevent existing BPF program breakage
  (Martin)
- allow 8-bit loads from dst_port[0] and [1]
- add test coverage for the verifier and the context access converter

[1] https://lore.kernel.org/bpf/87sftbobys.fsf@cloudflare.com/

Jakub Sitnicki (2):
  bpf: Make dst_port field in struct bpf_sock 16-bit wide
  selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads

 include/uapi/linux/bpf.h                      |  3 +-
 net/core/filter.c                             |  9 ++-
 tools/include/uapi/linux/bpf.h                |  3 +-
 .../selftests/bpf/prog_tests/sock_fields.c    | 58 +++++++++----
 .../selftests/bpf/progs/test_sock_fields.c    | 41 ++++++++++
 tools/testing/selftests/bpf/verifier/sock.c   | 81 ++++++++++++++++++-
 6 files changed, 172 insertions(+), 23 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH bpf-next 1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide
  2022-01-27 17:24 [PATCH bpf-next 0/2] Split bpf_sock dst_port field Jakub Sitnicki
@ 2022-01-27 17:24 ` Jakub Sitnicki
  2022-01-28 18:17   ` Alexei Starovoitov
  2022-01-27 17:24 ` [PATCH bpf-next 2/2] selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads Jakub Sitnicki
  2022-01-28  6:31 ` [PATCH bpf-next 0/2] Split bpf_sock dst_port field Martin KaFai Lau
  2 siblings, 1 reply; 6+ messages in thread
From: Jakub Sitnicki @ 2022-01-27 17:24 UTC (permalink / raw)
  To: bpf
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Menglong Dong, Martin KaFai Lau

Menglong Dong reports that the documentation for the dst_port field in
struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
field is a zero-padded 16-bit integer in network byte order. The value
appears to the BPF user as if laid out in memory as so:

  offsetof(struct bpf_sock, dst_port) + 0  <port MSB>
                                      + 8  <port LSB>
                                      +16  0x00
                                      +24  0x00

32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
the offset into the field is 0.

32-bit wide loads from dst_port are especially confusing. The loaded value,
after converting to host byte order with bpf_ntohl(dst_port), contains the
port number in the upper 16-bits.

Remove the confusion by splitting the field into two 16-bit fields. For
backward compatibility, allow 32-bit wide loads from offsetof(struct
bpf_sock, dst_port).

While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.

Reported-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/uapi/linux/bpf.h | 3 ++-
 net/core/filter.c        | 9 ++++++++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4a2f7041ebae..027e84b18b51 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5574,7 +5574,8 @@ struct bpf_sock {
 	__u32 src_ip4;
 	__u32 src_ip6[4];
 	__u32 src_port;		/* host byte order */
-	__u32 dst_port;		/* network byte order */
+	__be16 dst_port;	/* network byte order */
+	__u16 zero_padding;
 	__u32 dst_ip4;
 	__u32 dst_ip6[4];
 	__u32 state;
diff --git a/net/core/filter.c b/net/core/filter.c
index a06931c27eeb..ef8473163696 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8265,6 +8265,7 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 			      struct bpf_insn_access_aux *info)
 {
 	const int size_default = sizeof(__u32);
+	int field_size;
 
 	if (off < 0 || off >= sizeof(struct bpf_sock))
 		return false;
@@ -8276,7 +8277,6 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 	case offsetof(struct bpf_sock, family):
 	case offsetof(struct bpf_sock, type):
 	case offsetof(struct bpf_sock, protocol):
-	case offsetof(struct bpf_sock, dst_port):
 	case offsetof(struct bpf_sock, src_port):
 	case offsetof(struct bpf_sock, rx_queue_mapping):
 	case bpf_ctx_range(struct bpf_sock, src_ip4):
@@ -8285,6 +8285,13 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 	case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
 		bpf_ctx_record_field_size(info, size_default);
 		return bpf_ctx_narrow_access_ok(off, size, size_default);
+	case bpf_ctx_range(struct bpf_sock, dst_port):
+		field_size = size == size_default ?
+			size_default : sizeof_field(struct bpf_sock, dst_port);
+		bpf_ctx_record_field_size(info, field_size);
+		return bpf_ctx_narrow_access_ok(off, size, field_size);
+	case bpf_ctx_range(struct bpf_sock, zero_padding):
+		return false;
 	}
 
 	return size == size_default;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf-next 2/2] selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads
  2022-01-27 17:24 [PATCH bpf-next 0/2] Split bpf_sock dst_port field Jakub Sitnicki
  2022-01-27 17:24 ` [PATCH bpf-next 1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide Jakub Sitnicki
@ 2022-01-27 17:24 ` Jakub Sitnicki
  2022-01-28  6:31 ` [PATCH bpf-next 0/2] Split bpf_sock dst_port field Martin KaFai Lau
  2 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2022-01-27 17:24 UTC (permalink / raw)
  To: bpf
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Menglong Dong, Martin KaFai Lau

Add coverage to the verifier tests and tests for reading bpf_sock fields to
ensure that 32-bit, 16-bit, and 8-bit loads from dst_port field are allowed
only at intended offsets and produce expected values.

While 16-bit and 8-bit access to dst_port field is straight-forward, 32-bit
wide loads need be allowed and produce a zero-padded 16-bit value for
backward compatibility.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/include/uapi/linux/bpf.h                |  3 +-
 .../selftests/bpf/prog_tests/sock_fields.c    | 58 +++++++++----
 .../selftests/bpf/progs/test_sock_fields.c    | 41 ++++++++++
 tools/testing/selftests/bpf/verifier/sock.c   | 81 ++++++++++++++++++-
 4 files changed, 162 insertions(+), 21 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4a2f7041ebae..027e84b18b51 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5574,7 +5574,8 @@ struct bpf_sock {
 	__u32 src_ip4;
 	__u32 src_ip6[4];
 	__u32 src_port;		/* host byte order */
-	__u32 dst_port;		/* network byte order */
+	__be16 dst_port;	/* network byte order */
+	__u16 zero_padding;
 	__u32 dst_ip4;
 	__u32 dst_ip6[4];
 	__u32 state;
diff --git a/tools/testing/selftests/bpf/prog_tests/sock_fields.c b/tools/testing/selftests/bpf/prog_tests/sock_fields.c
index 9fc040eaa482..9d211b5c22c4 100644
--- a/tools/testing/selftests/bpf/prog_tests/sock_fields.c
+++ b/tools/testing/selftests/bpf/prog_tests/sock_fields.c
@@ -1,9 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2019 Facebook */
 
+#define _GNU_SOURCE
 #include <netinet/in.h>
 #include <arpa/inet.h>
 #include <unistd.h>
+#include <sched.h>
 #include <stdlib.h>
 #include <string.h>
 #include <errno.h>
@@ -20,6 +22,7 @@
 enum bpf_linum_array_idx {
 	EGRESS_LINUM_IDX,
 	INGRESS_LINUM_IDX,
+	READ_SK_DST_PORT_LINUM_IDX,
 	__NR_BPF_LINUM_ARRAY_IDX,
 };
 
@@ -42,8 +45,16 @@ static __u64 child_cg_id;
 static int linum_map_fd;
 static __u32 duration;
 
-static __u32 egress_linum_idx = EGRESS_LINUM_IDX;
-static __u32 ingress_linum_idx = INGRESS_LINUM_IDX;
+static bool create_netns(void)
+{
+	if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns"))
+		return false;
+
+	if (!ASSERT_OK(system("ip link set dev lo up"), "bring up lo"))
+		return false;
+
+	return true;
+}
 
 static void print_sk(const struct bpf_sock *sk, const char *prefix)
 {
@@ -91,19 +102,24 @@ static void check_result(void)
 {
 	struct bpf_tcp_sock srv_tp, cli_tp, listen_tp;
 	struct bpf_sock srv_sk, cli_sk, listen_sk;
-	__u32 ingress_linum, egress_linum;
+	__u32 idx, ingress_linum, egress_linum, linum;
 	int err;
 
-	err = bpf_map_lookup_elem(linum_map_fd, &egress_linum_idx,
-				  &egress_linum);
+	idx = EGRESS_LINUM_IDX;
+	err = bpf_map_lookup_elem(linum_map_fd, &idx, &egress_linum);
 	CHECK(err < 0, "bpf_map_lookup_elem(linum_map_fd)",
 	      "err:%d errno:%d\n", err, errno);
 
-	err = bpf_map_lookup_elem(linum_map_fd, &ingress_linum_idx,
-				  &ingress_linum);
+	idx = INGRESS_LINUM_IDX;
+	err = bpf_map_lookup_elem(linum_map_fd, &idx, &ingress_linum);
 	CHECK(err < 0, "bpf_map_lookup_elem(linum_map_fd)",
 	      "err:%d errno:%d\n", err, errno);
 
+	idx = READ_SK_DST_PORT_LINUM_IDX;
+	err = bpf_map_lookup_elem(linum_map_fd, &idx, &linum);
+	ASSERT_OK(err, "bpf_map_lookup_elem(linum_map_fd, READ_SK_DST_PORT_IDX)");
+	ASSERT_EQ(linum, 0, "failure in read_sk_dst_port on line");
+
 	memcpy(&srv_sk, &skel->bss->srv_sk, sizeof(srv_sk));
 	memcpy(&srv_tp, &skel->bss->srv_tp, sizeof(srv_tp));
 	memcpy(&cli_sk, &skel->bss->cli_sk, sizeof(cli_sk));
@@ -262,7 +278,7 @@ static void test(void)
 	char buf[DATA_LEN];
 
 	/* Prepare listen_fd */
-	listen_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
+	listen_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0xcafe, 0);
 	/* start_server() has logged the error details */
 	if (CHECK_FAIL(listen_fd == -1))
 		goto done;
@@ -330,8 +346,12 @@ static void test(void)
 
 void serial_test_sock_fields(void)
 {
-	struct bpf_link *egress_link = NULL, *ingress_link = NULL;
 	int parent_cg_fd = -1, child_cg_fd = -1;
+	struct bpf_link *link;
+
+	/* Use a dedicated netns to have a fixed listen port */
+	if (!create_netns())
+		return;
 
 	/* Create a cgroup, get fd, and join it */
 	parent_cg_fd = test__join_cgroup(PARENT_CGROUP);
@@ -352,15 +372,20 @@ void serial_test_sock_fields(void)
 	if (CHECK(!skel, "test_sock_fields__open_and_load", "failed\n"))
 		goto done;
 
-	egress_link = bpf_program__attach_cgroup(skel->progs.egress_read_sock_fields,
-						 child_cg_fd);
-	if (!ASSERT_OK_PTR(egress_link, "attach_cgroup(egress)"))
+	link = bpf_program__attach_cgroup(skel->progs.egress_read_sock_fields, child_cg_fd);
+	if (!ASSERT_OK_PTR(link, "attach_cgroup(egress_read_sock_fields)"))
+		goto done;
+	skel->links.egress_read_sock_fields = link;
+
+	link = bpf_program__attach_cgroup(skel->progs.ingress_read_sock_fields, child_cg_fd);
+	if (!ASSERT_OK_PTR(link, "attach_cgroup(ingress_read_sock_fields)"))
 		goto done;
+	skel->links.ingress_read_sock_fields = link;
 
-	ingress_link = bpf_program__attach_cgroup(skel->progs.ingress_read_sock_fields,
-						  child_cg_fd);
-	if (!ASSERT_OK_PTR(ingress_link, "attach_cgroup(ingress)"))
+	link = bpf_program__attach_cgroup(skel->progs.read_sk_dst_port, child_cg_fd);
+	if (!ASSERT_OK_PTR(link, "attach_cgroup(read_sk_dst_port"))
 		goto done;
+	skel->links.read_sk_dst_port = link;
 
 	linum_map_fd = bpf_map__fd(skel->maps.linum_map);
 	sk_pkt_out_cnt_fd = bpf_map__fd(skel->maps.sk_pkt_out_cnt);
@@ -369,8 +394,7 @@ void serial_test_sock_fields(void)
 	test();
 
 done:
-	bpf_link__destroy(egress_link);
-	bpf_link__destroy(ingress_link);
+	test_sock_fields__detach(skel);
 	test_sock_fields__destroy(skel);
 	if (child_cg_fd >= 0)
 		close(child_cg_fd);
diff --git a/tools/testing/selftests/bpf/progs/test_sock_fields.c b/tools/testing/selftests/bpf/progs/test_sock_fields.c
index 81b57b9aaaea..246f1f001813 100644
--- a/tools/testing/selftests/bpf/progs/test_sock_fields.c
+++ b/tools/testing/selftests/bpf/progs/test_sock_fields.c
@@ -12,6 +12,7 @@
 enum bpf_linum_array_idx {
 	EGRESS_LINUM_IDX,
 	INGRESS_LINUM_IDX,
+	READ_SK_DST_PORT_LINUM_IDX,
 	__NR_BPF_LINUM_ARRAY_IDX,
 };
 
@@ -250,4 +251,44 @@ int ingress_read_sock_fields(struct __sk_buff *skb)
 	return CG_OK;
 }
 
+static __noinline bool sk_dst_port__load_word(struct bpf_sock *sk)
+{
+	__u32 *word = (__u32 *)&sk->dst_port;
+	return word[0] == bpf_htonl(0xcafe0000);
+}
+
+static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk)
+{
+	__u16 *half = (__u16 *)&sk->dst_port;
+	return half[0] == bpf_htons(0xcafe);
+}
+
+static __noinline bool sk_dst_port__load_byte(struct bpf_sock *sk)
+{
+	__u8 *byte = (__u8 *)&sk->dst_port;
+	return byte[0] == 0xca && byte[1] == 0xfe;
+}
+
+SEC("cgroup_skb/egress")
+int read_sk_dst_port(struct __sk_buff *skb)
+{
+	__u32 linum, linum_idx;
+	struct bpf_sock *sk;
+
+	linum_idx = READ_SK_DST_PORT_LINUM_IDX;
+
+	sk = skb->sk;
+	if (!sk)
+		RET_LOG();
+
+	if (!sk_dst_port__load_word(sk))
+		RET_LOG();
+	if (!sk_dst_port__load_half(sk))
+		RET_LOG();
+	if (!sk_dst_port__load_byte(sk))
+		RET_LOG();
+
+	return CG_OK;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c
index ce13ece08d51..3d9d89c5fe65 100644
--- a/tools/testing/selftests/bpf/verifier/sock.c
+++ b/tools/testing/selftests/bpf/verifier/sock.c
@@ -121,7 +121,25 @@
 	.result = ACCEPT,
 },
 {
-	"sk_fullsock(skb->sk): sk->dst_port [narrow load]",
+	"sk_fullsock(skb->sk): sk->dst_port [word load] (backward compatibility)",
+	.insns = {
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, dst_port)),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	.result = ACCEPT,
+},
+{
+	"sk_fullsock(skb->sk): sk->dst_port [half load]",
 	.insns = {
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
@@ -139,7 +157,64 @@
 	.result = ACCEPT,
 },
 {
-	"sk_fullsock(skb->sk): sk->dst_port [load 2nd byte]",
+	"sk_fullsock(skb->sk): sk->dst_port [half load] (invalid)",
+	.insns = {
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, dst_port) + 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	.result = REJECT,
+	.errstr = "invalid sock access",
+},
+{
+	"sk_fullsock(skb->sk): sk->dst_port [byte load]",
+	.insns = {
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_B, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, dst_port)),
+	BPF_LDX_MEM(BPF_B, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, dst_port) + 1),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	.result = ACCEPT,
+},
+{
+	"sk_fullsock(skb->sk): sk->dst_port [byte load] (invalid)",
+	.insns = {
+	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	BPF_EMIT_CALL(BPF_FUNC_sk_fullsock),
+	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, dst_port) + 2),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	.result = REJECT,
+	.errstr = "invalid sock access",
+},
+{
+	"sk_fullsock(skb->sk): sk->zero_padding [half load] (invalid)",
 	.insns = {
 	BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)),
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2),
@@ -149,7 +224,7 @@
 	BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
-	BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, dst_port) + 1),
+	BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_0, offsetof(struct bpf_sock, zero_padding)),
 	BPF_MOV64_IMM(BPF_REG_0, 0),
 	BPF_EXIT_INSN(),
 	},
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next 0/2] Split bpf_sock dst_port field
  2022-01-27 17:24 [PATCH bpf-next 0/2] Split bpf_sock dst_port field Jakub Sitnicki
  2022-01-27 17:24 ` [PATCH bpf-next 1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide Jakub Sitnicki
  2022-01-27 17:24 ` [PATCH bpf-next 2/2] selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads Jakub Sitnicki
@ 2022-01-28  6:31 ` Martin KaFai Lau
  2 siblings, 0 replies; 6+ messages in thread
From: Martin KaFai Lau @ 2022-01-28  6:31 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Menglong Dong

On Thu, Jan 27, 2022 at 06:24:46PM +0100, Jakub Sitnicki wrote:
> This is a follow-up to discussion around the idea of making dst_port in struct
> bpf_sock a 16-bit field that happened in [1]. I have fleshed it out further:
> 
> v1:
> - keep dst_field offset unchanged to prevent existing BPF program breakage
>   (Martin)
> - allow 8-bit loads from dst_port[0] and [1]
> - add test coverage for the verifier and the context access converter
lgtm. Thanks for the patches.

Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide
  2022-01-27 17:24 ` [PATCH bpf-next 1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide Jakub Sitnicki
@ 2022-01-28 18:17   ` Alexei Starovoitov
  2022-01-30 11:58     ` Jakub Sitnicki
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2022-01-28 18:17 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Menglong Dong, Martin KaFai Lau

On Thu, Jan 27, 2022 at 9:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Menglong Dong reports that the documentation for the dst_port field in
> struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
> field is a zero-padded 16-bit integer in network byte order. The value
> appears to the BPF user as if laid out in memory as so:
>
>   offsetof(struct bpf_sock, dst_port) + 0  <port MSB>
>                                       + 8  <port LSB>
>                                       +16  0x00
>                                       +24  0x00
>
> 32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
> the offset into the field is 0.
>
> 32-bit wide loads from dst_port are especially confusing. The loaded value,
> after converting to host byte order with bpf_ntohl(dst_port), contains the
> port number in the upper 16-bits.
>
> Remove the confusion by splitting the field into two 16-bit fields. For
> backward compatibility, allow 32-bit wide loads from offsetof(struct
> bpf_sock, dst_port).
>
> While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
>
> Reported-by: Menglong Dong <imagedong@tencent.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/uapi/linux/bpf.h | 3 ++-
>  net/core/filter.c        | 9 ++++++++-
>  2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4a2f7041ebae..027e84b18b51 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5574,7 +5574,8 @@ struct bpf_sock {
>         __u32 src_ip4;
>         __u32 src_ip6[4];
>         __u32 src_port;         /* host byte order */
> -       __u32 dst_port;         /* network byte order */
> +       __be16 dst_port;        /* network byte order */
> +       __u16 zero_padding;

I was wondering can we do '__u16 :16' here ?

Should we do the same for bpf_sk_lookup->remote_port as well
for consistency?

Thanks for the idea and the patches!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide
  2022-01-28 18:17   ` Alexei Starovoitov
@ 2022-01-30 11:58     ` Jakub Sitnicki
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Sitnicki @ 2022-01-30 11:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Menglong Dong, Martin KaFai Lau

On Fri, Jan 28, 2022 at 07:17 PM CET, Alexei Starovoitov wrote:
> On Thu, Jan 27, 2022 at 9:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Menglong Dong reports that the documentation for the dst_port field in
>> struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
>> field is a zero-padded 16-bit integer in network byte order. The value
>> appears to the BPF user as if laid out in memory as so:
>>
>>   offsetof(struct bpf_sock, dst_port) + 0  <port MSB>
>>                                       + 8  <port LSB>
>>                                       +16  0x00
>>                                       +24  0x00
>>
>> 32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
>> the offset into the field is 0.
>>
>> 32-bit wide loads from dst_port are especially confusing. The loaded value,
>> after converting to host byte order with bpf_ntohl(dst_port), contains the
>> port number in the upper 16-bits.
>>
>> Remove the confusion by splitting the field into two 16-bit fields. For
>> backward compatibility, allow 32-bit wide loads from offsetof(struct
>> bpf_sock, dst_port).
>>
>> While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
>>
>> Reported-by: Menglong Dong <imagedong@tencent.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>  include/uapi/linux/bpf.h | 3 ++-
>>  net/core/filter.c        | 9 ++++++++-
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 4a2f7041ebae..027e84b18b51 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -5574,7 +5574,8 @@ struct bpf_sock {
>>         __u32 src_ip4;
>>         __u32 src_ip6[4];
>>         __u32 src_port;         /* host byte order */
>> -       __u32 dst_port;         /* network byte order */
>> +       __be16 dst_port;        /* network byte order */
>> +       __u16 zero_padding;
>
> I was wondering can we do '__u16 :16' here ?

Great idea. Now done in v2:

https://lore.kernel.org/bpf/20220130115518.213259-1-jakub@cloudflare.com/

> Should we do the same for bpf_sk_lookup->remote_port as well
> for consistency?

I can tend to that this upcoming week.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-30 11:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-27 17:24 [PATCH bpf-next 0/2] Split bpf_sock dst_port field Jakub Sitnicki
2022-01-27 17:24 ` [PATCH bpf-next 1/2] bpf: Make dst_port field in struct bpf_sock 16-bit wide Jakub Sitnicki
2022-01-28 18:17   ` Alexei Starovoitov
2022-01-30 11:58     ` Jakub Sitnicki
2022-01-27 17:24 ` [PATCH bpf-next 2/2] selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads Jakub Sitnicki
2022-01-28  6:31 ` [PATCH bpf-next 0/2] Split bpf_sock dst_port field Martin KaFai Lau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.