linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key
@ 2023-02-09 19:23 Kees Cook
  2023-02-09 19:52 ` Andrii Nakryiko
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2023-02-09 19:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kees Cook, Stanislav Fomichev, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko, Shuah Khan,
	Haowen Bai, bpf, linux-kselftest, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, linux-kernel, netdev, llvm,
	linux-hardening

The kernel is globally removing the ambiguous 0-length and 1-element
arrays in favor of flexible arrays, so that we can gain both compile-time
and run-time array bounds checking[1]. Most cases of these changes are
trivial, but this case in BPF is not. It faces some difficulties:

1) struct bpf_lpm_trie_key is part of UAPI so changes can be fragile in
   the sense that projects external to Linux may be impacted.

2) The struct is intended to be used as a header, which means it may
   be within another structure, resulting in the "data" array member
   overlapping with the surrounding structure's following members. When
   converting from [0]-style to []-style, this overlap elicits warnings
   under Clang, and GCC considers it a deprecated extension (and similarly
   warns under -pedantic): https://godbolt.org/z/vWzqs41h6

3) Both the kernel and userspace access the existing "data" member
   for more than just static initializers and offsetof() calculations.
   For example:

      cilium:
        struct egress_gw_policy_key in_key = {
                .lpm_key = { 32 + 24, {} },
                .saddr   = CLIENT_IP,
                .daddr   = EXTERNAL_SVC_IP & 0Xffffff,
        };

      systemd:
        ipv6_map_fd = bpf_map_new(
                        BPF_MAP_TYPE_LPM_TRIE,
                        offsetof(struct bpf_lpm_trie_key, data) + sizeof(uint32_t)*4,
                        sizeof(uint64_t), ...);
	...
        struct bpf_lpm_trie_key *key_ipv4, *key_ipv6;
	...
	memcpy(key_ipv4->data, &a->address, sizeof(uint32_t));

   Searching for other uses in Debian Code Search seem to be just copies
   of UAPI headers:
   https://codesearch.debian.net/search?q=struct+bpf_lpm_trie_key&literal=1&perpkg=1

Introduce struct bpf_lpm_trie_key_u8 for the kernel (and future userspace)
to use for walking the individual bytes following the header, and leave
the "data" member of struct bpf_lpm_trie_key as-is (i.e. a [0]-style
array). This will allow existing userspace code to continue to use "data"
as a fake flexible array. The kernel (and future userspace code) building
with -fstrict-flex-arrays=3 will see struct bpf_lpm_trie_key::data has
having 0 bytes so there will be no overlap warnings, and can instead
use struct bpf_lpm_trie_key_u8::data for accessing the actual byte
array contents. The definition of struct bpf_lpm_trie_key_u8 uses a
union with struct bpf_lpm_trie_key so that things like container_of()
can be used instead of doing explicit casting, all while leaving the
member names un-namespaced (i.e. key->prefixlen == key_u8->prefixlen,
key->data == key_u8->data), allowing for trivial drop-in replacement
without collateral member renaming.

This will avoid structure overlap warnings and array bounds warnings
while enabling run-time array bounds checking under CONFIG_UBSAN_BOUNDS=y
and -fstrict-flex-arrays=3.

For reference, the current warning under GCC 13 with -fstrict-flex-arrays=3
and -Warray-bounds is:

../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
  207 |                                        *(__be16 *)&key->data[i]);
      |                                                   ^~~~~~~~~~~~~
../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
  102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
      |                                                      ^
../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
   97 | #define be16_to_cpu __be16_to_cpu
      |                     ^~~~~~~~~~~~~
../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
  206 |                 u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
^
      |                            ^~~~~~~~~~~
In file included from ../include/linux/bpf.h:7:
../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
   82 |         __u8    data[0];        /* Arbitrary size */
      |                 ^~~~

Additionally update the samples and selftests to use the new structure,
for demonstrating best practices.

[1] For lots of details, see both:
    https://docs.kernel.org/process/deprecated.html#zero-length-and-one-element-arrays
    https://people.kernel.org/kees/bounded-flexible-arrays-in-c

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mykola Lysenko <mykolal@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Haowen Bai <baihaowen@meizu.com>
Cc: bpf@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/uapi/linux/bpf.h                   | 15 +++++++++++++--
 kernel/bpf/lpm_trie.c                      | 16 +++++++++-------
 samples/bpf/map_perf_test_user.c           |  2 +-
 samples/bpf/xdp_router_ipv4_user.c         |  2 +-
 tools/testing/selftests/bpf/test_lpm_map.c | 14 +++++++-------
 5 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ba0f0cfb5e42..f843a7582456 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -76,10 +76,21 @@ struct bpf_insn {
 	__s32	imm;		/* signed immediate constant */
 };
 
-/* Key of an a BPF_MAP_TYPE_LPM_TRIE entry */
+/* Header for a key of a BPF_MAP_TYPE_LPM_TRIE entry */
 struct bpf_lpm_trie_key {
 	__u32	prefixlen;	/* up to 32 for AF_INET, 128 for AF_INET6 */
-	__u8	data[0];	/* Arbitrary size */
+	__u8	data[0];	/* Deprecated field: use struct bpf_lpm_trie_key_u8 */
+};
+
+/* Raw (u8 byte array) key of a BPF_MAP_TYPE_LPM_TRIE entry */
+struct bpf_lpm_trie_key_u8 {
+	union {
+		struct bpf_lpm_trie_key hdr;
+		struct {
+			__u32	prefixlen;
+			__u8	data[];
+		};
+	};
 };
 
 struct bpf_cgroup_storage_key {
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index d833496e9e42..3a93ace62c87 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -164,13 +164,15 @@ static inline int extract_bit(const u8 *data, size_t index)
  */
 static size_t longest_prefix_match(const struct lpm_trie *trie,
 				   const struct lpm_trie_node *node,
-				   const struct bpf_lpm_trie_key *key)
+				   const struct bpf_lpm_trie_key_u8 *key)
 {
 	u32 limit = min(node->prefixlen, key->prefixlen);
 	u32 prefixlen = 0, i = 0;
 
-	BUILD_BUG_ON(offsetof(struct lpm_trie_node, data) % sizeof(u32));
-	BUILD_BUG_ON(offsetof(struct bpf_lpm_trie_key, data) % sizeof(u32));
+	BUILD_BUG_ON(offsetof(typeof(*node), data) % sizeof(u32));
+	BUILD_BUG_ON(offsetof(typeof(*key), data) % sizeof(u32));
+	BUILD_BUG_ON(offsetof(typeof(*key), data) !=
+		     offsetof(typeof(key->hdr), data));
 
 #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && defined(CONFIG_64BIT)
 
@@ -229,7 +231,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 {
 	struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
 	struct lpm_trie_node *node, *found = NULL;
-	struct bpf_lpm_trie_key *key = _key;
+	struct bpf_lpm_trie_key_u8 *key = _key;
 
 	/* Start walking the trie from the root node ... */
 
@@ -306,7 +308,7 @@ static int trie_update_elem(struct bpf_map *map,
 	struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
 	struct lpm_trie_node *node, *im_node = NULL, *new_node = NULL;
 	struct lpm_trie_node __rcu **slot;
-	struct bpf_lpm_trie_key *key = _key;
+	struct bpf_lpm_trie_key_u8 *key = _key;
 	unsigned long irq_flags;
 	unsigned int next_bit;
 	size_t matchlen = 0;
@@ -434,7 +436,7 @@ static int trie_update_elem(struct bpf_map *map,
 static int trie_delete_elem(struct bpf_map *map, void *_key)
 {
 	struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
-	struct bpf_lpm_trie_key *key = _key;
+	struct bpf_lpm_trie_key_u8 *key = _key;
 	struct lpm_trie_node __rcu **trim, **trim2;
 	struct lpm_trie_node *node, *parent;
 	unsigned long irq_flags;
@@ -616,7 +618,7 @@ static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)
 {
 	struct lpm_trie_node *node, *next_node = NULL, *parent, *search_root;
 	struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
-	struct bpf_lpm_trie_key *key = _key, *next_key = _next_key;
+	struct bpf_lpm_trie_key_u8 *key = _key, *next_key = _next_key;
 	struct lpm_trie_node **node_stack = NULL;
 	int err = 0, stack_ptr = -1;
 	unsigned int next_bit;
diff --git a/samples/bpf/map_perf_test_user.c b/samples/bpf/map_perf_test_user.c
index d2fbcf963cdf..07ff471ed6ae 100644
--- a/samples/bpf/map_perf_test_user.c
+++ b/samples/bpf/map_perf_test_user.c
@@ -370,7 +370,7 @@ static void run_perf_test(int tasks)
 
 static void fill_lpm_trie(void)
 {
-	struct bpf_lpm_trie_key *key;
+	struct bpf_lpm_trie_key_u8 *key;
 	unsigned long value = 0;
 	unsigned int i;
 	int r;
diff --git a/samples/bpf/xdp_router_ipv4_user.c b/samples/bpf/xdp_router_ipv4_user.c
index 9d41db09c480..266fdd0b025d 100644
--- a/samples/bpf/xdp_router_ipv4_user.c
+++ b/samples/bpf/xdp_router_ipv4_user.c
@@ -91,7 +91,7 @@ static int recv_msg(struct sockaddr_nl sock_addr, int sock)
 static void read_route(struct nlmsghdr *nh, int nll)
 {
 	char dsts[24], gws[24], ifs[16], dsts_len[24], metrics[24];
-	struct bpf_lpm_trie_key *prefix_key;
+	struct bpf_lpm_trie_key_u8 *prefix_key;
 	struct rtattr *rt_attr;
 	struct rtmsg *rt_msg;
 	int rtm_family;
diff --git a/tools/testing/selftests/bpf/test_lpm_map.c b/tools/testing/selftests/bpf/test_lpm_map.c
index c028d621c744..e2e822759e13 100644
--- a/tools/testing/selftests/bpf/test_lpm_map.c
+++ b/tools/testing/selftests/bpf/test_lpm_map.c
@@ -211,7 +211,7 @@ static void test_lpm_map(int keysize)
 	volatile size_t n_matches, n_matches_after_delete;
 	size_t i, j, n_nodes, n_lookups;
 	struct tlpm_node *t, *list = NULL;
-	struct bpf_lpm_trie_key *key;
+	struct bpf_lpm_trie_key_u8 *key;
 	uint8_t *data, *value;
 	int r, map;
 
@@ -331,8 +331,8 @@ static void test_lpm_map(int keysize)
 static void test_lpm_ipaddr(void)
 {
 	LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC);
-	struct bpf_lpm_trie_key *key_ipv4;
-	struct bpf_lpm_trie_key *key_ipv6;
+	struct bpf_lpm_trie_key_u8 *key_ipv4;
+	struct bpf_lpm_trie_key_u8 *key_ipv6;
 	size_t key_size_ipv4;
 	size_t key_size_ipv6;
 	int map_fd_ipv4;
@@ -423,7 +423,7 @@ static void test_lpm_ipaddr(void)
 static void test_lpm_delete(void)
 {
 	LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC);
-	struct bpf_lpm_trie_key *key;
+	struct bpf_lpm_trie_key_u8 *key;
 	size_t key_size;
 	int map_fd;
 	__u64 value;
@@ -532,7 +532,7 @@ static void test_lpm_delete(void)
 static void test_lpm_get_next_key(void)
 {
 	LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_NO_PREALLOC);
-	struct bpf_lpm_trie_key *key_p, *next_key_p;
+	struct bpf_lpm_trie_key_u8 *key_p, *next_key_p;
 	size_t key_size;
 	__u32 value = 0;
 	int map_fd;
@@ -693,7 +693,7 @@ static void *lpm_test_command(void *arg)
 {
 	int i, j, ret, iter, key_size;
 	struct lpm_mt_test_info *info = arg;
-	struct bpf_lpm_trie_key *key_p;
+	struct bpf_lpm_trie_key_u8 *key_p;
 
 	key_size = sizeof(struct bpf_lpm_trie_key) + sizeof(__u32);
 	key_p = alloca(key_size);
@@ -717,7 +717,7 @@ static void *lpm_test_command(void *arg)
 				ret = bpf_map_lookup_elem(info->map_fd, key_p, &value);
 				assert(ret == 0 || errno == ENOENT);
 			} else {
-				struct bpf_lpm_trie_key *next_key_p = alloca(key_size);
+				struct bpf_lpm_trie_key_u8 *next_key_p = alloca(key_size);
 				ret = bpf_map_get_next_key(info->map_fd, key_p, next_key_p);
 				assert(ret == 0 || errno == ENOENT || errno == ENOMEM);
 			}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key
  2023-02-09 19:23 [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key Kees Cook
@ 2023-02-09 19:52 ` Andrii Nakryiko
  2023-02-09 20:05   ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Andrii Nakryiko @ 2023-02-09 19:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Stanislav Fomichev, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, Haowen Bai, bpf, linux-kselftest, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, linux-kernel, netdev, llvm,
	linux-hardening

On Thu, Feb 9, 2023 at 11:23 AM Kees Cook <keescook@chromium.org> wrote:
>
> The kernel is globally removing the ambiguous 0-length and 1-element
> arrays in favor of flexible arrays, so that we can gain both compile-time
> and run-time array bounds checking[1]. Most cases of these changes are
> trivial, but this case in BPF is not. It faces some difficulties:
>
> 1) struct bpf_lpm_trie_key is part of UAPI so changes can be fragile in
>    the sense that projects external to Linux may be impacted.
>
> 2) The struct is intended to be used as a header, which means it may
>    be within another structure, resulting in the "data" array member
>    overlapping with the surrounding structure's following members. When
>    converting from [0]-style to []-style, this overlap elicits warnings
>    under Clang, and GCC considers it a deprecated extension (and similarly
>    warns under -pedantic): https://godbolt.org/z/vWzqs41h6
>
> 3) Both the kernel and userspace access the existing "data" member
>    for more than just static initializers and offsetof() calculations.
>    For example:
>
>       cilium:
>         struct egress_gw_policy_key in_key = {
>                 .lpm_key = { 32 + 24, {} },
>                 .saddr   = CLIENT_IP,
>                 .daddr   = EXTERNAL_SVC_IP & 0Xffffff,
>         };
>
>       systemd:
>         ipv6_map_fd = bpf_map_new(
>                         BPF_MAP_TYPE_LPM_TRIE,
>                         offsetof(struct bpf_lpm_trie_key, data) + sizeof(uint32_t)*4,
>                         sizeof(uint64_t), ...);
>         ...
>         struct bpf_lpm_trie_key *key_ipv4, *key_ipv6;
>         ...
>         memcpy(key_ipv4->data, &a->address, sizeof(uint32_t));
>
>    Searching for other uses in Debian Code Search seem to be just copies
>    of UAPI headers:
>    https://codesearch.debian.net/search?q=struct+bpf_lpm_trie_key&literal=1&perpkg=1
>
> Introduce struct bpf_lpm_trie_key_u8 for the kernel (and future userspace)
> to use for walking the individual bytes following the header, and leave
> the "data" member of struct bpf_lpm_trie_key as-is (i.e. a [0]-style
> array). This will allow existing userspace code to continue to use "data"
> as a fake flexible array. The kernel (and future userspace code) building
> with -fstrict-flex-arrays=3 will see struct bpf_lpm_trie_key::data has
> having 0 bytes so there will be no overlap warnings, and can instead
> use struct bpf_lpm_trie_key_u8::data for accessing the actual byte
> array contents. The definition of struct bpf_lpm_trie_key_u8 uses a
> union with struct bpf_lpm_trie_key so that things like container_of()
> can be used instead of doing explicit casting, all while leaving the
> member names un-namespaced (i.e. key->prefixlen == key_u8->prefixlen,
> key->data == key_u8->data), allowing for trivial drop-in replacement
> without collateral member renaming.
>
> This will avoid structure overlap warnings and array bounds warnings
> while enabling run-time array bounds checking under CONFIG_UBSAN_BOUNDS=y
> and -fstrict-flex-arrays=3.
>
> For reference, the current warning under GCC 13 with -fstrict-flex-arrays=3
> and -Warray-bounds is:
>
> ../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=]
>   207 |                                        *(__be16 *)&key->data[i]);
>       |                                                   ^~~~~~~~~~~~~
> ../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16'
>   102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
>       |                                                      ^
> ../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu'
>    97 | #define be16_to_cpu __be16_to_cpu
>       |                     ^~~~~~~~~~~~~
> ../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu'
>   206 |                 u16 diff = be16_to_cpu(*(__be16 *)&node->data[i]
> ^
>       |                            ^~~~~~~~~~~
> In file included from ../include/linux/bpf.h:7:
> ../include/uapi/linux/bpf.h:82:17: note: while referencing 'data'
>    82 |         __u8    data[0];        /* Arbitrary size */
>       |                 ^~~~
>
> Additionally update the samples and selftests to use the new structure,
> for demonstrating best practices.
>
> [1] For lots of details, see both:
>     https://docs.kernel.org/process/deprecated.html#zero-length-and-one-element-arrays
>     https://people.kernel.org/kees/bounded-flexible-arrays-in-c
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Stanislav Fomichev <sdf@google.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: Martin KaFai Lau <martin.lau@linux.dev>
> Cc: Song Liu <song@kernel.org>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: Hao Luo <haoluo@google.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Mykola Lysenko <mykolal@fb.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Haowen Bai <baihaowen@meizu.com>
> Cc: bpf@vger.kernel.org
> Cc: linux-kselftest@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  include/uapi/linux/bpf.h                   | 15 +++++++++++++--
>  kernel/bpf/lpm_trie.c                      | 16 +++++++++-------
>  samples/bpf/map_perf_test_user.c           |  2 +-
>  samples/bpf/xdp_router_ipv4_user.c         |  2 +-
>  tools/testing/selftests/bpf/test_lpm_map.c | 14 +++++++-------
>  5 files changed, 31 insertions(+), 18 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index ba0f0cfb5e42..f843a7582456 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -76,10 +76,21 @@ struct bpf_insn {
>         __s32   imm;            /* signed immediate constant */
>  };
>
> -/* Key of an a BPF_MAP_TYPE_LPM_TRIE entry */
> +/* Header for a key of a BPF_MAP_TYPE_LPM_TRIE entry */
>  struct bpf_lpm_trie_key {
>         __u32   prefixlen;      /* up to 32 for AF_INET, 128 for AF_INET6 */
> -       __u8    data[0];        /* Arbitrary size */
> +       __u8    data[0];        /* Deprecated field: use struct bpf_lpm_trie_key_u8 */
> +};
> +
> +/* Raw (u8 byte array) key of a BPF_MAP_TYPE_LPM_TRIE entry */
> +struct bpf_lpm_trie_key_u8 {
> +       union {
> +               struct bpf_lpm_trie_key hdr;
> +               struct {
> +                       __u32   prefixlen;
> +                       __u8    data[];
> +               };
> +       };
>  };

Do we need to add a new type to UAPI at all here? We can make this new
struct internal to kernel code (e.g. struct bpf_lpm_trie_key_kern) and
point out that it should match the layout of struct bpf_lpm_trie_key.
User-space can decide whether to use bpf_lpm_trie_key as-is, or if
just to ensure their custom struct has the same layout (I see some
internal users at Meta do just this, just make sure that they have
__u32 prefixlen as first member).

This whole union work-around seems like just extra cruft that we don't
really need in UAPI.

Or did I miss anything?


[...]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key
  2023-02-09 19:52 ` Andrii Nakryiko
@ 2023-02-09 20:05   ` Kees Cook
  2023-02-09 20:50     ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2023-02-09 20:05 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Stanislav Fomichev, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, Haowen Bai, bpf, linux-kselftest, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, linux-kernel, netdev, llvm,
	linux-hardening

On Thu, Feb 09, 2023 at 11:52:10AM -0800, Andrii Nakryiko wrote:
> Do we need to add a new type to UAPI at all here? We can make this new
> struct internal to kernel code (e.g. struct bpf_lpm_trie_key_kern) and
> point out that it should match the layout of struct bpf_lpm_trie_key.
> User-space can decide whether to use bpf_lpm_trie_key as-is, or if
> just to ensure their custom struct has the same layout (I see some
> internal users at Meta do just this, just make sure that they have
> __u32 prefixlen as first member).

The uses outside the kernel seemed numerous enough to justify a new UAPI
struct (samples, selftests, etc). It also paves a single way forward
when the userspace projects start using modern compiler options (e.g.
systemd is usually pretty quick to adopt new features).

> This whole union work-around seems like just extra cruft that we don't
> really need in UAPI.

The union is really only there so that possible uses of container_of()
would be happy. But I did add a BUILD_BUG_ON() test for member offset
equality, so a hard cast would be safe too. I'm happy to drop it if
that's preferred?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key
  2023-02-09 20:05   ` Kees Cook
@ 2023-02-09 20:50     ` Alexei Starovoitov
  2023-02-09 21:12       ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2023-02-09 20:50 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, Alexei Starovoitov, Stanislav Fomichev,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Hao Luo, Jiri Olsa,
	Mykola Lysenko, Shuah Khan, Haowen Bai, bpf,
	open list:KERNEL SELFTEST FRAMEWORK, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, LKML, Network Development,
	clang-built-linux, linux-hardening

On Thu, Feb 9, 2023 at 12:05 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Feb 09, 2023 at 11:52:10AM -0800, Andrii Nakryiko wrote:
> > Do we need to add a new type to UAPI at all here? We can make this new
> > struct internal to kernel code (e.g. struct bpf_lpm_trie_key_kern) and
> > point out that it should match the layout of struct bpf_lpm_trie_key.
> > User-space can decide whether to use bpf_lpm_trie_key as-is, or if
> > just to ensure their custom struct has the same layout (I see some
> > internal users at Meta do just this, just make sure that they have
> > __u32 prefixlen as first member).
>
> The uses outside the kernel seemed numerous enough to justify a new UAPI
> struct (samples, selftests, etc). It also paves a single way forward
> when the userspace projects start using modern compiler options (e.g.
> systemd is usually pretty quick to adopt new features).

I don't understand how the new uapi struct bpf_lpm_trie_key_u8 helps.
cilium progs and progs/map_ptr_kern.c
cannot do s/bpf_lpm_trie_key/bpf_lpm_trie_key_u8/.
They will fail to build, so they're stuck with bpf_lpm_trie_key.

Can we do just
struct bpf_lpm_trie_key_kern {
  __u32   prefixlen;
  __u8    data[];
};
and use it in the kernel?
What is the disadvantage?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key
  2023-02-09 20:50     ` Alexei Starovoitov
@ 2023-02-09 21:12       ` Kees Cook
  2023-02-09 22:01         ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Kees Cook @ 2023-02-09 21:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Alexei Starovoitov, Stanislav Fomichev,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Hao Luo, Jiri Olsa,
	Mykola Lysenko, Shuah Khan, Haowen Bai, bpf,
	open list:KERNEL SELFTEST FRAMEWORK, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, LKML, Network Development,
	clang-built-linux, linux-hardening

On Thu, Feb 09, 2023 at 12:50:28PM -0800, Alexei Starovoitov wrote:
> On Thu, Feb 9, 2023 at 12:05 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Thu, Feb 09, 2023 at 11:52:10AM -0800, Andrii Nakryiko wrote:
> > > Do we need to add a new type to UAPI at all here? We can make this new
> > > struct internal to kernel code (e.g. struct bpf_lpm_trie_key_kern) and
> > > point out that it should match the layout of struct bpf_lpm_trie_key.
> > > User-space can decide whether to use bpf_lpm_trie_key as-is, or if
> > > just to ensure their custom struct has the same layout (I see some
> > > internal users at Meta do just this, just make sure that they have
> > > __u32 prefixlen as first member).
> >
> > The uses outside the kernel seemed numerous enough to justify a new UAPI
> > struct (samples, selftests, etc). It also paves a single way forward
> > when the userspace projects start using modern compiler options (e.g.
> > systemd is usually pretty quick to adopt new features).
> 
> I don't understand how the new uapi struct bpf_lpm_trie_key_u8 helps.
> cilium progs and progs/map_ptr_kern.c
> cannot do s/bpf_lpm_trie_key/bpf_lpm_trie_key_u8/.
> They will fail to build, so they're stuck with bpf_lpm_trie_key.

Right -- I'm proposing not changing bpf_lpm_trie_key. I'm proposing
_adding_ bpf_lpm_trie_key_u8 for new users who will be using modern
compiler options (i.e. where "data[0]" is nonsense).

> Can we do just
> struct bpf_lpm_trie_key_kern {
>   __u32   prefixlen;
>   __u8    data[];
> };
> and use it in the kernel?

Yeah, I can do that if that's preferred, but it leaves userspace hanging
when they eventually trip over this in their code when they enable
-fstrict-flex-arrays=3 too.

> What is the disadvantage?

It seemed better to give a working example of how to migrate this code.

Regardless, I can just make this specific to the kernel code if that's
what's wanted.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key
  2023-02-09 21:12       ` Kees Cook
@ 2023-02-09 22:01         ` Alexei Starovoitov
  2023-02-11 17:55           ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2023-02-09 22:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrii Nakryiko, Alexei Starovoitov, Stanislav Fomichev,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Hao Luo, Jiri Olsa,
	Mykola Lysenko, Shuah Khan, Haowen Bai, bpf,
	open list:KERNEL SELFTEST FRAMEWORK, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, LKML, Network Development,
	clang-built-linux, linux-hardening

On Thu, Feb 9, 2023 at 1:12 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Feb 09, 2023 at 12:50:28PM -0800, Alexei Starovoitov wrote:
> > On Thu, Feb 9, 2023 at 12:05 PM Kees Cook <keescook@chromium.org> wrote:
> > >
> > > On Thu, Feb 09, 2023 at 11:52:10AM -0800, Andrii Nakryiko wrote:
> > > > Do we need to add a new type to UAPI at all here? We can make this new
> > > > struct internal to kernel code (e.g. struct bpf_lpm_trie_key_kern) and
> > > > point out that it should match the layout of struct bpf_lpm_trie_key.
> > > > User-space can decide whether to use bpf_lpm_trie_key as-is, or if
> > > > just to ensure their custom struct has the same layout (I see some
> > > > internal users at Meta do just this, just make sure that they have
> > > > __u32 prefixlen as first member).
> > >
> > > The uses outside the kernel seemed numerous enough to justify a new UAPI
> > > struct (samples, selftests, etc). It also paves a single way forward
> > > when the userspace projects start using modern compiler options (e.g.
> > > systemd is usually pretty quick to adopt new features).
> >
> > I don't understand how the new uapi struct bpf_lpm_trie_key_u8 helps.
> > cilium progs and progs/map_ptr_kern.c
> > cannot do s/bpf_lpm_trie_key/bpf_lpm_trie_key_u8/.
> > They will fail to build, so they're stuck with bpf_lpm_trie_key.
>
> Right -- I'm proposing not changing bpf_lpm_trie_key. I'm proposing
> _adding_ bpf_lpm_trie_key_u8 for new users who will be using modern
> compiler options (i.e. where "data[0]" is nonsense).
>
> > Can we do just
> > struct bpf_lpm_trie_key_kern {
> >   __u32   prefixlen;
> >   __u8    data[];
> > };
> > and use it in the kernel?
>
> Yeah, I can do that if that's preferred, but it leaves userspace hanging
> when they eventually trip over this in their code when they enable
> -fstrict-flex-arrays=3 too.
>
> > What is the disadvantage?
>
> It seemed better to give a working example of how to migrate this code.

I understand and agree with intent, but I'm still missing
how you're going to achieve this migration.
bpf_lpm_trie_key_u8 doesn't provide a migration path to cilium progs
and pretty much all bpf progs that use LPM map.
Sure, one can change the user space part, like you did in test_lpm_map.c,
but it doesn't address the full scope.
imo half way is worse than not doing it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key
  2023-02-09 22:01         ` Alexei Starovoitov
@ 2023-02-11 17:55           ` Kees Cook
  0 siblings, 0 replies; 7+ messages in thread
From: Kees Cook @ 2023-02-11 17:55 UTC (permalink / raw)
  To: Alexei Starovoitov, Kees Cook
  Cc: Andrii Nakryiko, Alexei Starovoitov, Stanislav Fomichev,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Hao Luo, Jiri Olsa,
	Mykola Lysenko, Shuah Khan, Haowen Bai, bpf,
	open list:KERNEL SELFTEST FRAMEWORK, David S. Miller,
	Jakub Kicinski, Jesper Dangaard Brouer, Nathan Chancellor,
	Nick Desaulniers, Tom Rix, LKML, Network Development,
	clang-built-linux, linux-hardening

On February 9, 2023 2:01:15 PM PST, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>On Thu, Feb 9, 2023 at 1:12 PM Kees Cook <keescook@chromium.org> wrote:
>>
>> On Thu, Feb 09, 2023 at 12:50:28PM -0800, Alexei Starovoitov wrote:
>> > On Thu, Feb 9, 2023 at 12:05 PM Kees Cook <keescook@chromium.org> wrote:
>> > >
>> > > On Thu, Feb 09, 2023 at 11:52:10AM -0800, Andrii Nakryiko wrote:
>> > > > Do we need to add a new type to UAPI at all here? We can make this new
>> > > > struct internal to kernel code (e.g. struct bpf_lpm_trie_key_kern) and
>> > > > point out that it should match the layout of struct bpf_lpm_trie_key.
>> > > > User-space can decide whether to use bpf_lpm_trie_key as-is, or if
>> > > > just to ensure their custom struct has the same layout (I see some
>> > > > internal users at Meta do just this, just make sure that they have
>> > > > __u32 prefixlen as first member).
>> > >
>> > > The uses outside the kernel seemed numerous enough to justify a new UAPI
>> > > struct (samples, selftests, etc). It also paves a single way forward
>> > > when the userspace projects start using modern compiler options (e.g.
>> > > systemd is usually pretty quick to adopt new features).
>> >
>> > I don't understand how the new uapi struct bpf_lpm_trie_key_u8 helps.
>> > cilium progs and progs/map_ptr_kern.c
>> > cannot do s/bpf_lpm_trie_key/bpf_lpm_trie_key_u8/.
>> > They will fail to build, so they're stuck with bpf_lpm_trie_key.
>>
>> Right -- I'm proposing not changing bpf_lpm_trie_key. I'm proposing
>> _adding_ bpf_lpm_trie_key_u8 for new users who will be using modern
>> compiler options (i.e. where "data[0]" is nonsense).
>>
>> > Can we do just
>> > struct bpf_lpm_trie_key_kern {
>> >   __u32   prefixlen;
>> >   __u8    data[];
>> > };
>> > and use it in the kernel?
>>
>> Yeah, I can do that if that's preferred, but it leaves userspace hanging
>> when they eventually trip over this in their code when they enable
>> -fstrict-flex-arrays=3 too.
>>
>> > What is the disadvantage?
>>
>> It seemed better to give a working example of how to migrate this code.
>
>I understand and agree with intent, but I'm still missing
>how you're going to achieve this migration.
>bpf_lpm_trie_key_u8 doesn't provide a migration path to cilium progs
>and pretty much all bpf progs that use LPM map.
>Sure, one can change the user space part, like you did in test_lpm_map.c,
>but it doesn't address the full scope.
>imo half way is worse than not doing it.

Maybe I'm missing something, but if a program isn't building with -fstrict-flex-arrays=3, it can keep on using struct bpf_lpm_trie_key as before. If/when it starts using -fsfa, if can use struct bpf_lpm_trie_key in composite structs as a header just like before, but if it has places using the "data" member as an array of u8, it can switch to something using struct bpf_lpm_trie_key_u8, either directly or as a union with whatever ever struct they have. (And this replacement is what I did for all the samples/selftests.)



-- 
Kees Cook

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-11 17:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 19:23 [PATCH] bpf: Deprecate "data" member of bpf_lpm_trie_key Kees Cook
2023-02-09 19:52 ` Andrii Nakryiko
2023-02-09 20:05   ` Kees Cook
2023-02-09 20:50     ` Alexei Starovoitov
2023-02-09 21:12       ` Kees Cook
2023-02-09 22:01         ` Alexei Starovoitov
2023-02-11 17:55           ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).