* [PATCH bpf-next 01/10] bpf: Export bpf_dynptr_{get|set}_size
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 02/10] bpf: Add helper btf_type_is_bpf_dynptr() Hou Tao
` (8 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
For map with bpf_dynptr-typed key, lookup and update procedures will use
bpf_dynptr_get_size() to get the length of key, and iteration procedure
will use bpf_dynptr_set_size() to set the length of returned key.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 2 ++
kernel/bpf/helpers.c | 9 ++++++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e0dbe0c0a17e..8da4a8190413 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2647,6 +2647,8 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
enum bpf_dynptr_type type, u32 offset, u32 size);
void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
int bpf_dynptr_check_size(u32 size);
+u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr);
+void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size);
#ifdef CONFIG_BPF_LSM
void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 41aeaf3862ec..7dbda4acb2c2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1408,11 +1408,18 @@ static void bpf_dynptr_set_type(struct bpf_dynptr_kern *ptr, enum bpf_dynptr_typ
ptr->size |= type << DYNPTR_TYPE_SHIFT;
}
-static u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr)
+u32 bpf_dynptr_get_size(const struct bpf_dynptr_kern *ptr)
{
return ptr->size & DYNPTR_SIZE_MASK;
}
+void bpf_dynptr_set_size(struct bpf_dynptr_kern *ptr, u32 new_size)
+{
+ u32 metadata = ptr->size & ~DYNPTR_SIZE_MASK;
+
+ ptr->size = new_size | metadata;
+}
+
int bpf_dynptr_check_size(u32 size)
{
return size > DYNPTR_MAX_SIZE ? -E2BIG : 0;
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 02/10] bpf: Add helper btf_type_is_bpf_dynptr()
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 01/10] bpf: Export bpf_dynptr_{get|set}_size Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall Hou Tao
` (7 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
Add helper btf_type_is_bpf_dynptr() to check whether or not the passed
btf type is bpf_dynptr. It will be used by bpf map (e.g. qp-trie) which
uses bpf_dynptr as map key.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/btf.h | 1 +
kernel/bpf/btf.c | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 1fcc833a8690..d52f9979ea49 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -156,6 +156,7 @@ int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
int btf_find_timer(const struct btf *btf, const struct btf_type *t);
struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
const struct btf_type *t);
+bool btf_type_is_bpf_dynptr(const struct btf *btf, const struct btf_type *t);
bool btf_type_is_void(const struct btf_type *t);
s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index b3940c605aac..f94b6112a649 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3522,6 +3522,13 @@ struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf,
return ERR_PTR(ret);
}
+bool btf_type_is_bpf_dynptr(const struct btf *btf, const struct btf_type *t)
+{
+ return __btf_type_is_struct(t) &&
+ t->size == sizeof(struct bpf_dynptr) &&
+ !strcmp("bpf_dynptr", __btf_name_by_offset(btf, t->name_off));
+}
+
static void __btf_struct_show(const struct btf *btf, const struct btf_type *t,
u32 type_id, void *data, u8 bits_offset,
struct btf_show *show)
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 01/10] bpf: Export bpf_dynptr_{get|set}_size Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 02/10] bpf: Add helper btf_type_is_bpf_dynptr() Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 18:18 ` kernel test robot
2022-09-17 18:29 ` kernel test robot
2022-09-17 15:31 ` [PATCH bpf-next 04/10] bpf: Support bpf_dynptr-typed map key for bpf program Hou Tao
` (6 subsequent siblings)
9 siblings, 2 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
Userspace application uses bpf syscall to lookup or update bpf map. It
passes a pointer of fixed-size buffer to kernel to represent the map
key. To support map with variable-length key, introduce bpf_dynptr_user
to allow userspace to pass a pointer of bpf_dynptr_user to specify the
address and the length of key buffer. And in order to represent dynptr
from userspace, adding a new dynptr type: BPF_DYNPTR_TYPE_USER. Because
BPF_DYNPTR_TYPE_USER-typed dynptr is not available for bpf program, so
verifier doesn't need update.
To distinguish map with fixed-size key from map with variable-length
one, add a new map flag: BPF_F_DYNPTR_KEY to do that. For map which
enables BPF_F_DYNPTR_KEY, key btf type must be bpf_dynptr and the lower
32-bits of map_extra is used to specify the maximum size of dynptr key.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 2 +
include/uapi/linux/bpf.h | 9 +++
kernel/bpf/syscall.c | 108 ++++++++++++++++++++++++++++-----
tools/include/uapi/linux/bpf.h | 8 +++
4 files changed, 113 insertions(+), 14 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8da4a8190413..5060d7aee08c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2641,6 +2641,8 @@ enum bpf_dynptr_type {
BPF_DYNPTR_TYPE_LOCAL,
/* Underlying data is a ringbuf record */
BPF_DYNPTR_TYPE_RINGBUF,
+ /* Points to memory copied from/to userspace */
+ BPF_DYNPTR_TYPE_USER,
};
void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3df78c56c1bf..77a2828f8148 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1246,6 +1246,9 @@ enum {
/* Create a map that is suitable to be an inner map with dynamic max entries */
BPF_F_INNER_MAP = (1U << 12),
+
+/* Map with bpf_dynptr-typed key */
+ BPF_F_DYNPTR_KEY = (1U << 13),
};
/* Flags for BPF_PROG_QUERY. */
@@ -6775,6 +6778,12 @@ struct bpf_timer {
__u64 :64;
} __attribute__((aligned(8)));
+struct bpf_dynptr_user {
+ __u64 data;
+ __u32 size;
+ __u32 :32;
+} __attribute__((aligned(8)));
+
struct bpf_dynptr {
__u64 :64;
__u64 :64;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index dab156f09f8d..fd15c13cef24 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -151,6 +151,11 @@ bool bpf_map_write_active(const struct bpf_map *map)
return atomic64_read(&map->writecnt) != 0;
}
+static inline bool is_dynptr_key(const struct bpf_map *map)
+{
+ return map->map_flags & BPF_F_DYNPTR_KEY;
+}
+
static u32 bpf_map_value_size(const struct bpf_map *map)
{
if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
@@ -994,7 +999,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
/* Some maps allow key to be unspecified. */
if (btf_key_id) {
key_type = btf_type_id_size(btf, &btf_key_id, &key_size);
- if (!key_type || key_size != map->key_size)
+ if (!key_type || key_size != map->key_size ||
+ (is_dynptr_key(map) && !btf_type_is_bpf_dynptr(btf, key_type)))
return -EINVAL;
} else {
key_type = btf_type_by_id(btf, 0);
@@ -1089,9 +1095,16 @@ static int map_create(union bpf_attr *attr)
return -EINVAL;
}
- if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER &&
- attr->map_extra != 0)
+ if (attr->map_flags & BPF_F_DYNPTR_KEY) {
+ /* The lower 32-bits of map_extra specifies the maximum size
+ * of bpf_dynptr-typed key
+ */
+ if (!attr->btf_key_type_id || !attr->map_extra || (attr->map_extra >> 32) ||
+ bpf_dynptr_check_size(attr->map_extra))
+ return -EINVAL;
+ } else if (attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER && attr->map_extra != 0) {
return -EINVAL;
+ }
f_flags = bpf_get_file_flag(attr->map_flags);
if (f_flags < 0)
@@ -1280,8 +1293,39 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
return -ENOTSUPP;
}
-static void *__bpf_copy_key(void __user *ukey, u64 key_size)
+static void *bpf_copy_from_dynptr_ukey(bpfptr_t ukey)
{
+ struct bpf_dynptr_kern *kptr;
+ struct bpf_dynptr_user uptr;
+ bpfptr_t data;
+
+ if (copy_from_bpfptr(&uptr, ukey, sizeof(uptr)))
+ return ERR_PTR(-EFAULT);
+
+ if (!uptr.size || bpf_dynptr_check_size(uptr.size))
+ return ERR_PTR(-EINVAL);
+
+ /* Allocate and free bpf_dynptr_kern and its data together */
+ kptr = kvmalloc(sizeof(*kptr) + uptr.size, GFP_USER | __GFP_NOWARN);
+ if (!kptr)
+ return ERR_PTR(-ENOMEM);
+
+ data = make_bpfptr(uptr.data, bpfptr_is_kernel(ukey));
+ if (copy_from_bpfptr(&kptr[1], data, uptr.size)) {
+ kvfree(kptr);
+ return ERR_PTR(-EFAULT);
+ }
+
+ bpf_dynptr_init(kptr, &kptr[1], BPF_DYNPTR_TYPE_USER, 0, uptr.size);
+
+ return kptr;
+}
+
+static void *__bpf_copy_key(void __user *ukey, u64 key_size, bool dynptr)
+{
+ if (dynptr)
+ return bpf_copy_from_dynptr_ukey(USER_BPFPTR(ukey));
+
if (key_size)
return vmemdup_user(ukey, key_size);
@@ -1291,8 +1335,11 @@ static void *__bpf_copy_key(void __user *ukey, u64 key_size)
return NULL;
}
-static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size)
+static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size, bool dynptr)
{
+ if (dynptr)
+ return bpf_copy_from_dynptr_ukey(ukey);
+
if (key_size)
return kvmemdup_bpfptr(ukey, key_size);
@@ -1302,6 +1349,34 @@ static void *___bpf_copy_key(bpfptr_t ukey, u64 key_size)
return NULL;
}
+static void *bpf_new_dynptr_key(u32 key_size)
+{
+ struct bpf_dynptr_kern *kptr;
+
+ kptr = kvmalloc(sizeof(*kptr) + key_size, GFP_USER | __GFP_NOWARN);
+ if (kptr)
+ bpf_dynptr_init(kptr, &kptr[1], BPF_DYNPTR_TYPE_USER, 0, key_size);
+ return kptr;
+}
+
+static int bpf_copy_to_dynptr_ukey(struct bpf_dynptr_user __user *uptr,
+ struct bpf_dynptr_kern *kptr)
+{
+ unsigned int size;
+ u64 udata;
+
+ if (get_user(udata, &uptr->data))
+ return -EFAULT;
+
+ /* Also zeroing the reserved field in uptr */
+ size = bpf_dynptr_get_size(kptr);
+ if (copy_to_user(u64_to_user_ptr(udata), kptr->data + kptr->offset, size) ||
+ put_user(size, &uptr->size) || put_user(0, &uptr->size + 1))
+ return -EFAULT;
+
+ return 0;
+}
+
/* last field in 'union bpf_attr' used by this command */
#define BPF_MAP_LOOKUP_ELEM_LAST_FIELD flags
@@ -1337,7 +1412,7 @@ static int map_lookup_elem(union bpf_attr *attr)
goto err_put;
}
- key = __bpf_copy_key(ukey, map->key_size);
+ key = __bpf_copy_key(ukey, map->key_size, is_dynptr_key(map));
if (IS_ERR(key)) {
err = PTR_ERR(key);
goto err_put;
@@ -1377,7 +1452,6 @@ static int map_lookup_elem(union bpf_attr *attr)
return err;
}
-
#define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
@@ -1410,7 +1484,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
goto err_put;
}
- key = ___bpf_copy_key(ukey, map->key_size);
+ key = ___bpf_copy_key(ukey, map->key_size, is_dynptr_key(map));
if (IS_ERR(key)) {
err = PTR_ERR(key);
goto err_put;
@@ -1458,7 +1532,7 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
goto err_put;
}
- key = ___bpf_copy_key(ukey, map->key_size);
+ key = ___bpf_copy_key(ukey, map->key_size, is_dynptr_key(map));
if (IS_ERR(key)) {
err = PTR_ERR(key);
goto err_put;
@@ -1514,7 +1588,7 @@ static int map_get_next_key(union bpf_attr *attr)
}
if (ukey) {
- key = __bpf_copy_key(ukey, map->key_size);
+ key = __bpf_copy_key(ukey, map->key_size, is_dynptr_key(map));
if (IS_ERR(key)) {
err = PTR_ERR(key);
goto err_put;
@@ -1524,7 +1598,10 @@ static int map_get_next_key(union bpf_attr *attr)
}
err = -ENOMEM;
- next_key = kvmalloc(map->key_size, GFP_USER);
+ if (is_dynptr_key(map))
+ next_key = bpf_new_dynptr_key(map->map_extra);
+ else
+ next_key = kvmalloc(map->key_size, GFP_USER | __GFP_NOWARN);
if (!next_key)
goto free_key;
@@ -1540,8 +1617,11 @@ static int map_get_next_key(union bpf_attr *attr)
if (err)
goto free_next_key;
- err = -EFAULT;
- if (copy_to_user(unext_key, next_key, map->key_size) != 0)
+ if (is_dynptr_key(map))
+ err = bpf_copy_to_dynptr_ukey(unext_key, next_key);
+ else
+ err = copy_to_user(unext_key, next_key, map->key_size) != 0 ? -EFAULT : 0;
+ if (err)
goto free_next_key;
err = 0;
@@ -1815,7 +1895,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
goto err_put;
}
- key = __bpf_copy_key(ukey, map->key_size);
+ key = __bpf_copy_key(ukey, map->key_size, is_dynptr_key(map));
if (IS_ERR(key)) {
err = PTR_ERR(key);
goto err_put;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 3df78c56c1bf..600c3fcee37a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1246,6 +1246,8 @@ enum {
/* Create a map that is suitable to be an inner map with dynamic max entries */
BPF_F_INNER_MAP = (1U << 12),
+/* Map with bpf_dynptr-typed key */
+ BPF_F_DYNPTR_KEY = (1U << 13),
};
/* Flags for BPF_PROG_QUERY. */
@@ -6775,6 +6777,12 @@ struct bpf_timer {
__u64 :64;
} __attribute__((aligned(8)));
+struct bpf_dynptr_user {
+ __u64 data;
+ __u32 size;
+ __u32 :32;
+} __attribute__((aligned(8)));
+
struct bpf_dynptr {
__u64 :64;
__u64 :64;
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall
2022-09-17 15:31 ` [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall Hou Tao
@ 2022-09-17 18:18 ` kernel test robot
2022-09-17 18:29 ` kernel test robot
1 sibling, 0 replies; 13+ messages in thread
From: kernel test robot @ 2022-09-17 18:18 UTC (permalink / raw)
To: Hou Tao, bpf, Andrii Nakryiko
Cc: llvm, kbuild-all, Song Liu, Hao Luo, Yonghong Song,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, KP Singh,
David S . Miller, Jakub Kicinski, Stanislav Fomichev, Jiri Olsa,
John Fastabend, Lorenz Bauer, Paul E . McKenney, houtao1
Hi Hou,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Hou-Tao/Add-support-for-qp-trie-map/20220917-231450
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arm-randconfig-r016-20220917 (https://download.01.org/0day-ci/archive/20220918/202209180234.jN58ZaBL-lkp@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 791a7ae1ba3efd6bca96338e10ffde557ba83920)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/intel-lab-lkp/linux/commit/1995ae6874ef9089b4eee12c00ba9c5af264b8bc
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Hou-Tao/Add-support-for-qp-trie-map/20220917-231450
git checkout 1995ae6874ef9089b4eee12c00ba9c5af264b8bc
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> ld.lld: error: undefined symbol: __get_user_bad
>>> referenced by syscall.c:1368 (kernel/bpf/syscall.c:1368)
>>> bpf/syscall.o:(bpf_copy_to_dynptr_ukey) in archive kernel/built-in.a
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall
2022-09-17 15:31 ` [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall Hou Tao
2022-09-17 18:18 ` kernel test robot
@ 2022-09-17 18:29 ` kernel test robot
1 sibling, 0 replies; 13+ messages in thread
From: kernel test robot @ 2022-09-17 18:29 UTC (permalink / raw)
To: Hou Tao, bpf, Andrii Nakryiko
Cc: kbuild-all, Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
Hi Hou,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Hou-Tao/Add-support-for-qp-trie-map/20220917-231450
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arm-randconfig-r002-20220917 (https://download.01.org/0day-ci/archive/20220918/202209180207.PvS9ya46-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/1995ae6874ef9089b4eee12c00ba9c5af264b8bc
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Hou-Tao/Add-support-for-qp-trie-map/20220917-231450
git checkout 1995ae6874ef9089b4eee12c00ba9c5af264b8bc
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
arm-linux-gnueabi-ld: kernel/bpf/syscall.o: in function `bpf_copy_to_dynptr_ukey':
>> kernel/bpf/syscall.c:1368: undefined reference to `__get_user_bad'
pahole: .tmp_vmlinux.btf: No such file or directory
.btf.vmlinux.bin.o: file not recognized: file format not recognized
vim +1368 kernel/bpf/syscall.c
1361
1362 static int bpf_copy_to_dynptr_ukey(struct bpf_dynptr_user __user *uptr,
1363 struct bpf_dynptr_kern *kptr)
1364 {
1365 unsigned int size;
1366 u64 udata;
1367
> 1368 if (get_user(udata, &uptr->data))
1369 return -EFAULT;
1370
1371 /* Also zeroing the reserved field in uptr */
1372 size = bpf_dynptr_get_size(kptr);
1373 if (copy_to_user(u64_to_user_ptr(udata), kptr->data + kptr->offset, size) ||
1374 put_user(size, &uptr->size) || put_user(0, &uptr->size + 1))
1375 return -EFAULT;
1376
1377 return 0;
1378 }
1379
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf-next 04/10] bpf: Support bpf_dynptr-typed map key for bpf program
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
` (2 preceding siblings ...)
2022-09-17 15:31 ` [PATCH bpf-next 03/10] bpf: Support bpf_dynptr-typed map key for bpf syscall Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 05/10] libbpf: Add helpers for bpf_dynptr_user Hou Tao
` (5 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
When bpf program manipulates a BPF_F_DYNPTR_KEY-enabled map, only allow
it to use a bpf_dynptr as map key.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
kernel/bpf/verifier.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8c6fbcd0afaf..169c0b3e8002 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6000,9 +6000,21 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
verbose(env, "invalid map_ptr to access map->key\n");
return -EACCES;
}
- err = check_helper_mem_access(env, regno,
- meta->map_ptr->key_size, false,
- NULL);
+ /* For BPF_F_DYNPTR_KEY-enabled map, only allow bpf_dynptr
+ * to be used as map key
+ */
+ if (meta->map_ptr->map_flags & BPF_F_DYNPTR_KEY) {
+ if (base_type(reg->type) != PTR_TO_STACK ||
+ !is_dynptr_reg_valid_init(env, reg, ARG_PTR_TO_DYNPTR)) {
+ verbose(env, "expect R%d to be dynptr instead of %s\n",
+ regno, reg_type_str(env, reg->type));
+ return -EACCES;
+ }
+ } else {
+ err = check_helper_mem_access(env, regno,
+ meta->map_ptr->key_size, false,
+ NULL);
+ }
break;
case ARG_PTR_TO_MAP_VALUE:
if (type_may_be_null(arg_type) && register_is_null(reg))
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 05/10] libbpf: Add helpers for bpf_dynptr_user
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
` (3 preceding siblings ...)
2022-09-17 15:31 ` [PATCH bpf-next 04/10] bpf: Support bpf_dynptr-typed map key for bpf program Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 06/10] bpf: Add support for qp-trie map Hou Tao
` (4 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
Add bpf_dynptr_user_init() to initialize a bpf_dynptr, and
bpf_dynptr_user_get_{data,size} to get the buffer address
and buffer length of bpf_dynptr_user.
Instead of exporting these symbols, simply adding these helpers as
inline functions.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
tools/lib/bpf/bpf.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 9c50beabdd14..cd3e9dbaffa1 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -371,6 +371,25 @@ LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_link_get_fd_by_id(__u32 id);
LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len);
+/* sys_bpf() will check the validity of size */
+static inline void bpf_dynptr_user_init(void *data, __u32 size, struct bpf_dynptr_user *dynptr)
+{
+ /* Zero padding bytes */
+ memset(dynptr, 0, sizeof(*dynptr));
+ dynptr->data = (__u64)(unsigned long)data;
+ dynptr->size = size;
+}
+
+static inline __u32 bpf_dynptr_user_get_size(const struct bpf_dynptr_user *dynptr)
+{
+ return dynptr->size;
+}
+
+static inline void *bpf_dynptr_user_get_data(const struct bpf_dynptr_user *dynptr)
+{
+ return (void *)(unsigned long)dynptr->data;
+}
+
struct bpf_prog_query_opts {
size_t sz; /* size of this struct for forward/backward compatibility */
__u32 query_flags;
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 06/10] bpf: Add support for qp-trie map
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
` (4 preceding siblings ...)
2022-09-17 15:31 ` [PATCH bpf-next 05/10] libbpf: Add helpers for bpf_dynptr_user Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 07/10] selftests/bpf: Add two new dynptr_fail cases for map key Hou Tao
` (3 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
The initial motivation for qp-trie map is to reduce memory usage for
string keys specially those with large differencies in length. Moreover
as a big-endian lexicographic-ordered map, qp-trie can also be used for
any binary data with fixed or variable length.
The memory efficiency of qp-tries comes partly from the design of qp-trie
which doesn't save key for branch node and uses sparse array to save leaf
nodes, partly comes from the support of bpf_dynptr-typed key: only the
used part in key is saved.
But the memory efficiency and ordered keys come with cost: for strings
(e.g. symbol names in /proc/kallsyms) the lookup performance of qp-trie
is about 30% or more slower compared with hash-table. But the lookup
performance is not always bad than hash-table, for randomly generated
binary data set with big differencies in length, the lookup performance
of qp-trie will be twice as good as hash-table as showed in the
following benchmark.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf_types.h | 1 +
include/uapi/linux/bpf.h | 1 +
kernel/bpf/Makefile | 1 +
kernel/bpf/bpf_qp_trie.c | 1055 ++++++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 1 +
5 files changed, 1059 insertions(+)
create mode 100644 kernel/bpf/bpf_qp_trie.c
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 2b9112b80171..a73d47f83d74 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_QP_TRIE, qp_trie_map_ops)
BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 77a2828f8148..e35e70b5cf0d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -928,6 +928,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_INODE_STORAGE,
BPF_MAP_TYPE_TASK_STORAGE,
BPF_MAP_TYPE_BLOOM_FILTER,
+ BPF_MAP_TYPE_QP_TRIE,
};
/* Note that tracing related programs such as
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 341c94f208f4..8419f44fea50 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_i
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
+obj-$(CONFIG_BPF_SYSCALL) += bpf_qp_trie.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
diff --git a/kernel/bpf/bpf_qp_trie.c b/kernel/bpf/bpf_qp_trie.c
new file mode 100644
index 000000000000..4f56f69360ce
--- /dev/null
+++ b/kernel/bpf/bpf_qp_trie.c
@@ -0,0 +1,1055 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Derived from qp.c in https://github.com/fanf2/qp.git
+ *
+ * Copyright (C) 2022. Huawei Technologies Co., Ltd
+ */
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/rcupdate.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+
+/* qp-trie (quadbit popcount trie) is a memory efficient trie. Unlike
+ * normal trie which uses byte as lookup key, qp-trie interprets its keys
+ * as quadbit/nibble array and uses one nibble each time during lookup.
+ * The most significant nibble (upper nibble) of byte N in the key will
+ * be the 2*N element of nibble array, and the least significant nibble
+ * (lower nibble) of byte N will be the 2*N+1 element in nibble array.
+ *
+ * For normal trie, it may have 256 child nodes, and for qp-trie one branch
+ * node may have 17 child nodes. #0 child node is special because it must
+ * be a leaf node and its key is the same as the branch node. #1~#16 child
+ * nodes represent leaf nodes or branch nodes which have different keys
+ * with parent node. The key of branch node is the common prefix for these
+ * child nodes, and the index of child node minus one is the value of first
+ * different nibble between these child nodes.
+ *
+ * qp-trie reduces memory usage through two methods:
+ * (1) Branch node doesn't store the key. It only stores the position of
+ * the first nibble which differentiates child nodes.
+ * (2) Branch node doesn't store all 17 child nodes. It uses a bitmap and
+ * popcount() to implement a sparse array and only allocates memory
+ * for those present children.
+ *
+ * Like normal trie, qp-trie is also ordered and is in big-endian
+ * lexicographic order. If traverse qp-trie in a depth-first way, it will
+ * return a string of ordered keys.
+ *
+ * The following diagrams show the construction of a tiny qp-trie:
+ *
+ * (1) insert abc
+ *
+ * [ leaf node: abc ]
+ *
+ * (2) insert abc_d
+ *
+ * The first different nibble between "abc" and "abc_d" is the upper nibble
+ * of character '_' (0x5), and its position in nibble array is 6
+ * (starts from 0).
+ *
+ * [ branch node ] bitmap: 0x41 diff pos: 6
+ * |
+ * *
+ * children
+ * [0] [6]
+ * | |
+ * [leaf: abc] [leaf: abc_d]
+ *
+ * (3) insert abc_e
+ *
+ * The first different nibble between "abc_d" and "abc_e" is the lower
+ * nibble of character 'd'/'e', and its position in array is 9.
+ *
+ * [ branch node ] bitmap: 0x41 diff pos: 6
+ * |
+ * *
+ * children
+ * [0] [6]
+ * | |
+ * [leaf: abc] |
+ * *
+ * [ branch node ] bitmap: 0x60 diff pos: 9
+ * |
+ * *
+ * children
+ * [5] [6]
+ * | |
+ * [leaf: abc_d] [leaf: abc_e]
+ */
+
+#define QP_TRIE_MANDATORY_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_DYNPTR_KEY)
+#define QP_TRIE_CREATE_FLAG_MASK (QP_TRIE_MANDATORY_FLAG_MASK | BPF_F_NUMA_NODE | \
+ BPF_F_ACCESS_MASK)
+
+/* bit[0] of nodes in qp_trie_branch is used to tell node type:
+ *
+ * bit[0]: 0-branch node
+ * bit[0]: 1-leaf node
+ *
+ * Size of qp_trie_branch is already 2-bytes aligned, so only need to make
+ * allocation of leaf node to be 2-bytes aligned.
+ */
+#define QP_TRIE_LEAF_NODE_MASK 1UL
+#define QP_TRIE_LEAF_ALLOC_ALIGN 2
+
+/* To reduce memory usage, only qp_trie_branch is RCU-freed. To handle
+ * freeing of the last leaf node, an extra qp_trie_branch node is
+ * allocated. The branch node has only one child and its index is 0. It
+ * is set as root node after adding the first leaf node.
+ */
+#define QP_TRIE_ROOT_NODE_INDEX 0
+#define QP_TRIE_NON_ROOT_NODE_MASK 1
+
+#define QP_TRIE_NIBBLE_SHIFT 1
+#define QP_TRIE_BYTE_INDEX_SHIFT 2
+
+#define QP_TRIE_TWIGS_FREE_NONE_IDX 17
+
+struct qp_trie_branch {
+ /* The bottom two bits of index are used as special flags:
+ *
+ * bit[0]: 0-root, 1-not root
+ * bit[1]: 0-upper nibble, 1-lower nibble
+ *
+ * bit[2:31]: byte index for key
+ */
+ unsigned int index;
+ /* 17 bits are used to accommodate arbitrary keys, even when there are
+ * zero-bytes in these keys.
+ *
+ * bit[0]: a leaf node has the same key as the prefix of parent node
+ * bit[N]: a child node with the value of nibble at index as (N - 1)
+ */
+ unsigned int bitmap:17;
+ /* The index of leaf node will be RCU-freed together */
+ unsigned int to_free_idx:5;
+ struct qp_trie_branch __rcu *parent;
+ struct rcu_head rcu;
+ void __rcu *nodes[0];
+};
+
+#define QP_TRIE_NR_SUBTREE 256
+
+struct qp_trie {
+ struct bpf_map map;
+ atomic_t entries;
+ void __rcu *roots[QP_TRIE_NR_SUBTREE];
+ spinlock_t locks[QP_TRIE_NR_SUBTREE];
+};
+
+/* Internally use qp_trie_key instead of bpf_dynptr_kern
+ * to reduce memory usage
+ */
+struct qp_trie_key {
+ /* the length of blob data */
+ unsigned int len;
+ /* blob data */
+ unsigned char data[0];
+};
+
+struct qp_trie_diff {
+ unsigned int index;
+ unsigned int sibling_bm;
+ unsigned int new_bm;
+};
+
+static inline void *to_child_node(const struct qp_trie_key *key)
+{
+ return (void *)((long)key | QP_TRIE_LEAF_NODE_MASK);
+}
+
+static inline struct qp_trie_key *to_leaf_node(void *node)
+{
+ return (void *)((long)node & ~QP_TRIE_LEAF_NODE_MASK);
+}
+
+static inline bool is_branch_node(void *node)
+{
+ return !((long)node & QP_TRIE_LEAF_NODE_MASK);
+}
+
+static inline bool is_same_key(const struct qp_trie_key *k, const unsigned char *data,
+ unsigned int len)
+{
+ return k->len == len && !memcmp(k->data, data, len);
+}
+
+static inline void *qp_trie_leaf_value(const struct qp_trie_key *key)
+{
+ return (void *)key + sizeof(*key) + key->len;
+}
+
+static inline unsigned int calc_twig_index(unsigned int mask, unsigned int bitmap)
+{
+ return hweight32(mask & (bitmap - 1));
+}
+
+static inline unsigned int calc_twig_nr(unsigned int bitmap)
+{
+ return hweight32(bitmap);
+}
+
+static inline unsigned int nibble_to_bitmap(unsigned char nibble)
+{
+ return 1U << (nibble + 1);
+}
+
+static inline unsigned int index_to_byte_index(unsigned int index)
+{
+ return index >> QP_TRIE_BYTE_INDEX_SHIFT;
+}
+
+static inline unsigned int calc_br_bitmap(unsigned int index, const unsigned char *data,
+ unsigned int len)
+{
+ unsigned int byte;
+ unsigned char nibble;
+
+ if (index == QP_TRIE_ROOT_NODE_INDEX)
+ return 1;
+
+ byte = index_to_byte_index(index);
+ if (byte >= len)
+ return 1;
+
+ nibble = data[byte];
+ /* lower nibble */
+ if ((index >> QP_TRIE_NIBBLE_SHIFT) & 1)
+ nibble &= 0xf;
+ else
+ nibble >>= 4;
+ return nibble_to_bitmap(nibble);
+}
+
+static void qp_trie_free_twigs_rcu(struct rcu_head *rcu)
+{
+ struct qp_trie_branch *twigs = container_of(rcu, struct qp_trie_branch, rcu);
+ unsigned int idx = twigs->to_free_idx;
+
+ if (idx != QP_TRIE_TWIGS_FREE_NONE_IDX)
+ kfree(to_leaf_node(rcu_access_pointer(twigs->nodes[idx])));
+ kfree(twigs);
+}
+
+static void qp_trie_branch_free(struct qp_trie_branch *twigs, unsigned int to_free_idx)
+{
+ twigs->to_free_idx = to_free_idx;
+ call_rcu(&twigs->rcu, qp_trie_free_twigs_rcu);
+}
+
+static inline struct qp_trie_branch *
+qp_trie_branch_new(struct bpf_map *map, unsigned int nr)
+{
+ struct qp_trie_branch *a;
+
+ a = bpf_map_kmalloc_node(map, sizeof(*a) + nr * sizeof(*a->nodes),
+ GFP_NOWAIT | __GFP_NOWARN, map->numa_node);
+ return a;
+}
+
+static inline void qp_trie_assign_parent(struct qp_trie_branch *parent, void *node)
+{
+ if (is_branch_node(node))
+ rcu_assign_pointer(((struct qp_trie_branch *)node)->parent, parent);
+}
+
+static void qp_trie_update_parent(struct qp_trie_branch *parent, unsigned int nr)
+{
+ unsigned int i;
+
+ for (i = 0; i < nr; i++)
+ qp_trie_assign_parent(parent, rcu_dereference_protected(parent->nodes[i], 1));
+}
+
+/* new_node can be either a leaf node or a branch node */
+static struct qp_trie_branch *
+qp_trie_branch_replace(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap,
+ void *new_node)
+{
+ unsigned int nr = calc_twig_nr(old->bitmap);
+ unsigned int p = calc_twig_index(old->bitmap, bitmap);
+ struct qp_trie_branch *twigs;
+
+ twigs = qp_trie_branch_new(map, nr);
+ if (!twigs)
+ return NULL;
+
+ if (p)
+ memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes));
+
+ rcu_assign_pointer(twigs->nodes[p], new_node);
+
+ if (nr - 1 > p)
+ memcpy(&twigs->nodes[p+1], &old->nodes[p+1], (nr - 1 - p) * sizeof(*twigs->nodes));
+
+ twigs->index = old->index;
+ twigs->bitmap = old->bitmap;
+ /* twigs will not be visible to reader until rcu_assign_pointer(), so
+ * use RCU_INIT_POINTER() here.
+ */
+ RCU_INIT_POINTER(twigs->parent, old->parent);
+
+ /* Initialize ->parent of parent node first, then update ->parent for
+ * child nodes after parent node is fully initialized.
+ */
+ qp_trie_update_parent(twigs, nr);
+
+ return twigs;
+}
+
+static struct qp_trie_branch *
+qp_trie_branch_insert(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap,
+ const struct qp_trie_key *new)
+{
+ unsigned int nr = calc_twig_nr(old->bitmap);
+ unsigned int p = calc_twig_index(old->bitmap, bitmap);
+ struct qp_trie_branch *twigs;
+
+ twigs = qp_trie_branch_new(map, nr + 1);
+ if (!twigs)
+ return NULL;
+
+ if (p)
+ memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes));
+
+ rcu_assign_pointer(twigs->nodes[p], to_child_node(new));
+
+ if (nr > p)
+ memcpy(&twigs->nodes[p+1], &old->nodes[p], (nr - p) * sizeof(*twigs->nodes));
+
+ twigs->bitmap = old->bitmap | bitmap;
+ twigs->index = old->index;
+ RCU_INIT_POINTER(twigs->parent, old->parent);
+
+ qp_trie_update_parent(twigs, nr + 1);
+
+ return twigs;
+}
+
+static struct qp_trie_branch *
+qp_trie_branch_remove(struct bpf_map *map, struct qp_trie_branch *old, unsigned int bitmap)
+{
+ unsigned int nr = calc_twig_nr(old->bitmap);
+ unsigned int p = calc_twig_index(old->bitmap, bitmap);
+ struct qp_trie_branch *twigs;
+
+ twigs = qp_trie_branch_new(map, nr - 1);
+ if (!twigs)
+ return NULL;
+
+ if (p)
+ memcpy(twigs->nodes, old->nodes, p * sizeof(*twigs->nodes));
+ if (nr - 1 > p)
+ memcpy(&twigs->nodes[p], &old->nodes[p+1], (nr - 1 - p) * sizeof(*twigs->nodes));
+
+ twigs->bitmap = old->bitmap & ~bitmap;
+ twigs->index = old->index;
+ RCU_INIT_POINTER(twigs->parent, old->parent);
+
+ qp_trie_update_parent(twigs, nr - 1);
+
+ return twigs;
+}
+
+static struct qp_trie_key *
+qp_trie_init_leaf_node(struct bpf_map *map, const struct bpf_dynptr_kern *k, void *v)
+{
+ unsigned int key_size, total;
+ struct qp_trie_key *new;
+
+ key_size = bpf_dynptr_get_size(k);
+ if (!key_size || key_size > (u32)map->map_extra)
+ return ERR_PTR(-EINVAL);
+
+ total = round_up(sizeof(*new) + key_size + map->value_size, QP_TRIE_LEAF_ALLOC_ALIGN);
+ new = bpf_map_kmalloc_node(map, total, GFP_NOWAIT | __GFP_NOWARN, map->numa_node);
+ if (!new)
+ return ERR_PTR(-ENOMEM);
+
+ new->len = key_size;
+ memcpy(new->data, k->data + k->offset, key_size);
+ memcpy((void *)&new[1] + key_size, v, map->value_size);
+
+ return new;
+}
+
+static bool calc_prefix_len(const struct qp_trie_key *s_key, const struct qp_trie_key *n_key,
+ unsigned int *index)
+{
+ unsigned int i, len = min(s_key->len, n_key->len);
+ unsigned char diff = 0;
+
+ for (i = 0; i < len; i++) {
+ diff = s_key->data[i] ^ n_key->data[i];
+ if (diff)
+ break;
+ }
+
+ *index = (i << QP_TRIE_BYTE_INDEX_SHIFT) | QP_TRIE_NON_ROOT_NODE_MASK;
+ if (!diff)
+ return s_key->len == n_key->len;
+
+ *index += (diff & 0xf0) ? 0 : (1U << QP_TRIE_NIBBLE_SHIFT);
+ return false;
+}
+
+static int qp_trie_new_branch(struct qp_trie *trie, struct qp_trie_branch __rcu **parent,
+ unsigned int bitmap, void *sibling, struct qp_trie_diff *d,
+ const struct qp_trie_key *leaf)
+{
+ struct qp_trie_branch *new_child_twigs, *new_twigs, *old_twigs;
+ struct bpf_map *map;
+ unsigned int iip;
+ int err;
+
+ map = &trie->map;
+ if (atomic_inc_return(&trie->entries) > map->max_entries) {
+ err = -ENOSPC;
+ goto dec_entries;
+ }
+
+ new_child_twigs = qp_trie_branch_new(map, 2);
+ if (!new_child_twigs) {
+ err = -ENOMEM;
+ goto dec_entries;
+ }
+
+ new_child_twigs->index = d->index;
+ new_child_twigs->bitmap = d->sibling_bm | d->new_bm;
+
+ iip = calc_twig_index(new_child_twigs->bitmap, d->sibling_bm);
+ RCU_INIT_POINTER(new_child_twigs->nodes[iip], sibling);
+ rcu_assign_pointer(new_child_twigs->nodes[!iip], to_child_node(leaf));
+ RCU_INIT_POINTER(new_child_twigs->parent, NULL);
+
+ old_twigs = rcu_dereference_protected(*parent, 1);
+ new_twigs = qp_trie_branch_replace(map, old_twigs, bitmap, new_child_twigs);
+ if (!new_twigs) {
+ err = -ENOMEM;
+ goto free_child_twigs;
+ }
+
+ qp_trie_assign_parent(new_child_twigs, sibling);
+ rcu_assign_pointer(*parent, new_twigs);
+ qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX);
+
+ return 0;
+
+free_child_twigs:
+ kfree(new_child_twigs);
+dec_entries:
+ atomic_dec(&trie->entries);
+ return err;
+}
+
+static int qp_trie_ext_branch(struct qp_trie *trie, struct qp_trie_branch __rcu **parent,
+ const struct qp_trie_key *new, unsigned int bitmap)
+{
+ struct qp_trie_branch *old_twigs, *new_twigs;
+ struct bpf_map *map;
+ int err;
+
+ map = &trie->map;
+ if (atomic_inc_return(&trie->entries) > map->max_entries) {
+ err = -ENOSPC;
+ goto dec_entries;
+ }
+
+ old_twigs = rcu_dereference_protected(*parent, 1);
+ new_twigs = qp_trie_branch_insert(map, old_twigs, bitmap, new);
+ if (!new_twigs) {
+ err = -ENOMEM;
+ goto dec_entries;
+ }
+
+ rcu_assign_pointer(*parent, new_twigs);
+ qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX);
+
+ return 0;
+
+dec_entries:
+ atomic_dec(&trie->entries);
+ return err;
+}
+
+static int qp_trie_add_leaf_node(struct qp_trie *trie, struct qp_trie_branch __rcu **parent,
+ const struct qp_trie_key *new)
+{
+ struct bpf_map *map = &trie->map;
+ struct qp_trie_branch *twigs;
+ int err;
+
+ if (atomic_inc_return(&trie->entries) > map->max_entries) {
+ err = -ENOSPC;
+ goto dec_entries;
+ }
+
+ twigs = qp_trie_branch_new(map, 1);
+ if (!twigs) {
+ err = -ENOMEM;
+ goto dec_entries;
+ }
+ twigs->index = QP_TRIE_ROOT_NODE_INDEX;
+ twigs->bitmap = 1;
+ RCU_INIT_POINTER(twigs->parent, NULL);
+ rcu_assign_pointer(twigs->nodes[0], to_child_node(new));
+
+ rcu_assign_pointer(*parent, twigs);
+
+ return 0;
+dec_entries:
+ atomic_dec(&trie->entries);
+ return err;
+}
+
+static int qp_trie_rep_leaf_node(struct qp_trie *trie, struct qp_trie_branch __rcu **parent,
+ const struct qp_trie_key *new, unsigned int bitmap)
+{
+ struct qp_trie_branch *old_twigs, *new_twigs;
+ struct bpf_map *map = &trie->map;
+
+ /* Only branch node is freed by RCU, so replace the old branch node
+ * and free the old leaf node together with the old branch node.
+ */
+ old_twigs = rcu_dereference_protected(*parent, 1);
+ new_twigs = qp_trie_branch_replace(map, old_twigs, bitmap, to_child_node(new));
+ if (!new_twigs)
+ return -ENOMEM;
+
+ rcu_assign_pointer(*parent, new_twigs);
+
+ qp_trie_branch_free(old_twigs, calc_twig_index(old_twigs->bitmap, bitmap));
+
+ return 0;
+}
+
+static int qp_trie_remove_leaf(struct qp_trie *trie, struct qp_trie_branch __rcu **parent,
+ unsigned int bitmap, const struct qp_trie_key *node)
+{
+ struct bpf_map *map = &trie->map;
+ struct qp_trie_branch *new, *old;
+ unsigned int nr;
+
+ old = rcu_dereference_protected(*parent, 1);
+ nr = calc_twig_nr(old->bitmap);
+ if (nr > 2) {
+ new = qp_trie_branch_remove(map, old, bitmap);
+ if (!new)
+ return -ENOMEM;
+ } else {
+ new = NULL;
+ }
+
+ rcu_assign_pointer(*parent, new);
+
+ qp_trie_branch_free(old, calc_twig_index(old->bitmap, bitmap));
+
+ atomic_dec(&trie->entries);
+
+ return 0;
+}
+
+static int qp_trie_merge_node(struct qp_trie *trie, struct qp_trie_branch __rcu **grand_parent,
+ struct qp_trie_branch *parent, unsigned int parent_bitmap,
+ unsigned int bitmap)
+{
+ struct qp_trie_branch *old_twigs, *new_twigs;
+ struct bpf_map *map = &trie->map;
+ void *new_sibling;
+ unsigned int iip;
+
+ iip = calc_twig_index(parent->bitmap, bitmap);
+ new_sibling = rcu_dereference_protected(parent->nodes[!iip], 1);
+
+ old_twigs = rcu_dereference_protected(*grand_parent, 1);
+ new_twigs = qp_trie_branch_replace(map, old_twigs, parent_bitmap, new_sibling);
+ if (!new_twigs)
+ return -ENOMEM;
+
+ rcu_assign_pointer(*grand_parent, new_twigs);
+
+ qp_trie_branch_free(old_twigs, QP_TRIE_TWIGS_FREE_NONE_IDX);
+ qp_trie_branch_free(parent, iip);
+
+ atomic_dec(&trie->entries);
+
+ return 0;
+}
+
+static int qp_trie_alloc_check(union bpf_attr *attr)
+{
+ if (!bpf_capable())
+ return -EPERM;
+
+ if ((attr->map_flags & QP_TRIE_MANDATORY_FLAG_MASK) != QP_TRIE_MANDATORY_FLAG_MASK ||
+ attr->map_flags & ~QP_TRIE_CREATE_FLAG_MASK ||
+ !bpf_map_flags_access_ok(attr->map_flags))
+ return -EINVAL;
+
+ if (!attr->max_entries || !attr->value_size)
+ return -EINVAL;
+
+ /* Key and value are allocated together in qp_trie_init_leaf_node() */
+ if (round_up((u64)sizeof(struct qp_trie_key) + (u32)attr->map_extra + attr->value_size,
+ QP_TRIE_LEAF_ALLOC_ALIGN) >= KMALLOC_MAX_SIZE)
+ return -E2BIG;
+
+ return 0;
+}
+
+static struct bpf_map *qp_trie_alloc(union bpf_attr *attr)
+{
+ struct qp_trie *trie;
+ unsigned int i;
+
+ trie = bpf_map_area_alloc(sizeof(*trie), bpf_map_attr_numa_node(attr));
+ if (!trie)
+ return ERR_PTR(-ENOMEM);
+
+ /* roots are zeroed by bpf_map_area_alloc() */
+ for (i = 0; i < QP_TRIE_NR_SUBTREE; i++)
+ spin_lock_init(&trie->locks[i]);
+
+ atomic_set(&trie->entries, 0);
+ bpf_map_init_from_attr(&trie->map, attr);
+
+ return &trie->map;
+}
+
+static void qp_trie_free_subtree(void *root)
+{
+ struct qp_trie_branch *parent = NULL;
+ struct qp_trie_key *cur = NULL;
+ void *node = root;
+
+ /*
+ * Depth-first deletion
+ *
+ * 1. find left-most key and its parent
+ * 2. get next sibling Y from parent
+ * (a) Y is leaf node: continue
+ * (b) Y is branch node: goto step 1
+ * (c) no more sibling: backtrace upwards if parent is not NULL and
+ * goto step 1
+ */
+ do {
+ while (is_branch_node(node)) {
+ parent = node;
+ node = rcu_dereference_raw(parent->nodes[0]);
+ }
+
+ cur = to_leaf_node(node);
+ while (parent) {
+ unsigned int iip, bitmap, nr;
+ void *ancestor;
+
+ bitmap = calc_br_bitmap(parent->index, cur->data, cur->len);
+ iip = calc_twig_index(parent->bitmap, bitmap) + 1;
+ nr = calc_twig_nr(parent->bitmap);
+
+ for (; iip < nr; iip++) {
+ kfree(cur);
+
+ node = rcu_dereference_raw(parent->nodes[iip]);
+ if (is_branch_node(node))
+ break;
+
+ cur = to_leaf_node(node);
+ }
+ if (iip < nr)
+ break;
+
+ ancestor = rcu_dereference_raw(parent->parent);
+ kfree(parent);
+ parent = ancestor;
+ }
+ } while (parent);
+
+ kfree(cur);
+}
+
+static void qp_trie_free(struct bpf_map *map)
+{
+ struct qp_trie *trie = container_of(map, struct qp_trie, map);
+ unsigned int i;
+
+ /* Wait for the pending qp_trie_free_twigs_rcu() */
+ rcu_barrier();
+
+ for (i = 0; i < ARRAY_SIZE(trie->roots); i++) {
+ void *root = rcu_dereference_raw(trie->roots[i]);
+
+ if (root)
+ qp_trie_free_subtree(root);
+ }
+ bpf_map_area_free(trie);
+}
+
+static inline void qp_trie_copy_leaf(const struct qp_trie_key *leaf, struct bpf_dynptr_kern *key)
+{
+ memcpy(key->data + key->offset, leaf->data, leaf->len);
+ bpf_dynptr_set_size(key, leaf->len);
+}
+
+static void qp_trie_copy_min_key_from(void *root, struct bpf_dynptr_kern *key)
+{
+ void *node;
+
+ node = root;
+ while (is_branch_node(node))
+ node = rcu_dereference(((struct qp_trie_branch *)node)->nodes[0]);
+
+ qp_trie_copy_leaf(to_leaf_node(node), key);
+}
+
+static int qp_trie_lookup_min_key(struct qp_trie *trie, unsigned int from,
+ struct bpf_dynptr_kern *key)
+{
+ unsigned int i;
+
+ for (i = from; i < ARRAY_SIZE(trie->roots); i++) {
+ void *root = rcu_dereference(trie->roots[i]);
+
+ if (root) {
+ qp_trie_copy_min_key_from(root, key);
+ return 0;
+ }
+ }
+
+ return -ENOENT;
+}
+
+static int qp_trie_next_twigs_index(struct qp_trie_branch *twigs, unsigned int bitmap)
+{
+ unsigned int idx, nr, next;
+
+ /* bitmap may not in twigs->bitmap */
+ idx = calc_twig_index(twigs->bitmap, bitmap);
+ nr = calc_twig_nr(twigs->bitmap);
+
+ next = idx;
+ if (twigs->bitmap & bitmap)
+ next += 1;
+
+ if (next >= nr)
+ return -1;
+ return next;
+}
+
+static int qp_trie_lookup_next_node(struct qp_trie *trie, const struct bpf_dynptr_kern *key,
+ struct bpf_dynptr_kern *next_key)
+{
+ const struct qp_trie_key *found;
+ struct qp_trie_branch *parent;
+ const unsigned char *data;
+ unsigned int data_len;
+ void *node, *next;
+
+ /* Non-existent key, so restart from the beginning */
+ data = key->data + key->offset;
+ node = rcu_dereference(trie->roots[*data]);
+ if (!node)
+ return qp_trie_lookup_min_key(trie, 0, next_key);
+
+ parent = NULL;
+ data_len = bpf_dynptr_get_size(key);
+ while (is_branch_node(node)) {
+ struct qp_trie_branch *br = node;
+ unsigned int iip, bitmap;
+
+ bitmap = calc_br_bitmap(br->index, data, data_len);
+ if (bitmap & br->bitmap)
+ iip = calc_twig_index(br->bitmap, bitmap);
+ else
+ iip = 0;
+
+ parent = br;
+ node = rcu_dereference(br->nodes[iip]);
+ }
+ found = to_leaf_node(node);
+ if (!is_same_key(found, data, data_len))
+ return qp_trie_lookup_min_key(trie, 0, next_key);
+
+ /* Pair with store release in rcu_assign_pointer(*parent, twigs) to
+ * ensure reading node->parent will not return the old parent if
+ * the node is found by following the newly-created parent.
+ */
+ smp_rmb();
+
+ next = NULL;
+ while (parent) {
+ unsigned int bitmap;
+ int next_idx;
+
+ bitmap = calc_br_bitmap(parent->index, data, data_len);
+ next_idx = qp_trie_next_twigs_index(parent, bitmap);
+ if (next_idx >= 0) {
+ next = rcu_dereference(parent->nodes[next_idx]);
+ break;
+ }
+ parent = rcu_dereference(parent->parent);
+ }
+
+ /* Goto next sub-tree */
+ if (!next)
+ return qp_trie_lookup_min_key(trie, *data + 1, next_key);
+
+ if (!is_branch_node(next))
+ qp_trie_copy_leaf(to_leaf_node(next), next_key);
+ else
+ qp_trie_copy_min_key_from(next, next_key);
+
+ return 0;
+}
+
+/* Called from syscall */
+static int qp_trie_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+ struct qp_trie *trie = container_of(map, struct qp_trie, map);
+ int err;
+
+ if (!key)
+ err = qp_trie_lookup_min_key(trie, 0, next_key);
+ else
+ err = qp_trie_lookup_next_node(trie, key, next_key);
+ return err;
+}
+
+/* Called from syscall or from eBPF program */
+static void *qp_trie_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct qp_trie *trie = container_of(map, struct qp_trie, map);
+ const struct bpf_dynptr_kern *dynptr_key = key;
+ const struct qp_trie_key *found;
+ const unsigned char *data;
+ unsigned int data_len;
+ void *node, *value;
+
+ /* Dynptr with zero length is possible, but is invalid for qp-trie */
+ data_len = bpf_dynptr_get_size(dynptr_key);
+ if (!data_len)
+ return NULL;
+
+ data = dynptr_key->data + dynptr_key->offset;
+ node = rcu_dereference_check(trie->roots[*data], rcu_read_lock_bh_held());
+ if (!node)
+ return NULL;
+
+ value = NULL;
+ while (is_branch_node(node)) {
+ struct qp_trie_branch *br = node;
+ unsigned int bitmap;
+ unsigned int iip;
+
+ /* When byte index equals with key len, the target key
+ * may be in twigs->nodes[0].
+ */
+ if (index_to_byte_index(br->index) > data_len)
+ goto done;
+
+ bitmap = calc_br_bitmap(br->index, data, data_len);
+ if (!(bitmap & br->bitmap))
+ goto done;
+
+ iip = calc_twig_index(br->bitmap, bitmap);
+ node = rcu_dereference_check(br->nodes[iip], rcu_read_lock_bh_held());
+ }
+
+ found = to_leaf_node(node);
+ if (is_same_key(found, data, data_len))
+ value = qp_trie_leaf_value(found);
+done:
+ return value;
+}
+
+/* Called from syscall or from eBPF program */
+static int qp_trie_update_elem(struct bpf_map *map, void *key, void *value, u64 flags)
+{
+ struct qp_trie *trie = container_of(map, struct qp_trie, map);
+ const struct qp_trie_key *leaf_key, *new_key;
+ struct qp_trie_branch __rcu **parent;
+ struct qp_trie_diff d;
+ unsigned int bitmap;
+ void __rcu **node;
+ spinlock_t *lock;
+ unsigned char c;
+ bool equal;
+ int err;
+
+ if (flags > BPF_EXIST)
+ return -EINVAL;
+
+ /* The content of key may change, so copy it firstly */
+ new_key = qp_trie_init_leaf_node(map, key, value);
+ if (IS_ERR(new_key))
+ return PTR_ERR(new_key);
+
+ c = new_key->data[0];
+ lock = &trie->locks[c];
+ spin_lock(lock);
+ parent = (struct qp_trie_branch __rcu **)&trie->roots[c];
+ if (!rcu_dereference_protected(*parent, 1)) {
+ if (flags == BPF_EXIST) {
+ err = -ENOENT;
+ goto unlock;
+ }
+ err = qp_trie_add_leaf_node(trie, parent, new_key);
+ goto unlock;
+ }
+
+ bitmap = 1;
+ node = &rcu_dereference_protected(*parent, 1)->nodes[0];
+ while (is_branch_node(rcu_dereference_protected(*node, 1))) {
+ struct qp_trie_branch *br = rcu_dereference_protected(*node, 1);
+ unsigned int iip;
+
+ bitmap = calc_br_bitmap(br->index, new_key->data, new_key->len);
+ if (bitmap & br->bitmap)
+ iip = calc_twig_index(br->bitmap, bitmap);
+ else
+ iip = 0;
+ parent = (struct qp_trie_branch __rcu **)node;
+ node = &br->nodes[iip];
+ }
+
+ leaf_key = to_leaf_node(rcu_dereference_protected(*node, 1));
+ equal = calc_prefix_len(leaf_key, new_key, &d.index);
+ if (equal) {
+ if (flags == BPF_NOEXIST) {
+ err = -EEXIST;
+ goto unlock;
+ }
+ err = qp_trie_rep_leaf_node(trie, parent, new_key, bitmap);
+ goto unlock;
+ }
+
+ d.sibling_bm = calc_br_bitmap(d.index, leaf_key->data, leaf_key->len);
+ d.new_bm = calc_br_bitmap(d.index, new_key->data, new_key->len);
+
+ bitmap = 1;
+ parent = (struct qp_trie_branch __rcu **)&trie->roots[c];
+ node = &rcu_dereference_protected(*parent, 1)->nodes[0];
+ while (is_branch_node(rcu_dereference_protected(*node, 1))) {
+ struct qp_trie_branch *br = rcu_dereference_protected(*node, 1);
+ unsigned int iip;
+
+ if (d.index < br->index)
+ goto new_branch;
+
+ parent = (struct qp_trie_branch __rcu **)node;
+ if (d.index == br->index) {
+ if (flags == BPF_EXIST) {
+ err = -ENOENT;
+ goto unlock;
+ }
+ err = qp_trie_ext_branch(trie, parent, new_key, d.new_bm);
+ goto unlock;
+ }
+
+ bitmap = calc_br_bitmap(br->index, new_key->data, new_key->len);
+ iip = calc_twig_index(br->bitmap, bitmap);
+ node = &br->nodes[iip];
+ }
+
+new_branch:
+ if (flags == BPF_EXIST) {
+ err = -ENOENT;
+ goto unlock;
+ }
+ err = qp_trie_new_branch(trie, parent, bitmap, rcu_dereference_protected(*node, 1),
+ &d, new_key);
+unlock:
+ spin_unlock(lock);
+ if (err)
+ kfree(new_key);
+ return err;
+}
+
+/* Called from syscall or from eBPF program */
+static int qp_trie_delete_elem(struct bpf_map *map, void *key)
+{
+ struct qp_trie *trie = container_of(map, struct qp_trie, map);
+ unsigned int bitmap, parent_bitmap, data_len, nr;
+ struct qp_trie_branch __rcu **parent, **grand_parent;
+ const struct bpf_dynptr_kern *dynptr_key;
+ const struct qp_trie_key *found;
+ const unsigned char *data;
+ void __rcu **node;
+ spinlock_t *lock;
+ unsigned char c;
+ int err;
+
+ dynptr_key = key;
+ data_len = bpf_dynptr_get_size(dynptr_key);
+ if (!data_len)
+ return -EINVAL;
+
+ err = -ENOENT;
+ data = dynptr_key->data + dynptr_key->offset;
+ c = *data;
+ lock = &trie->locks[c];
+ spin_lock(lock);
+ parent = (struct qp_trie_branch __rcu **)&trie->roots[c];
+ if (!*parent)
+ goto unlock;
+
+ grand_parent = NULL;
+ parent_bitmap = bitmap = 1;
+ node = &rcu_dereference_protected(*parent, 1)->nodes[0];
+ while (is_branch_node(rcu_dereference_protected(*node, 1))) {
+ struct qp_trie_branch *br = rcu_dereference_protected(*node, 1);
+ unsigned int iip;
+
+ if (index_to_byte_index(br->index) > data_len)
+ goto unlock;
+
+ parent_bitmap = bitmap;
+ bitmap = calc_br_bitmap(br->index, data, data_len);
+ if (!(bitmap & br->bitmap))
+ goto unlock;
+
+ grand_parent = parent;
+ parent = (struct qp_trie_branch __rcu **)node;
+ iip = calc_twig_index(br->bitmap, bitmap);
+ node = &br->nodes[iip];
+ }
+
+ found = to_leaf_node(rcu_dereference_protected(*node, 1));
+ if (!is_same_key(found, data, data_len))
+ goto unlock;
+
+ nr = calc_twig_nr(rcu_dereference_protected(*parent, 1)->bitmap);
+ if (nr != 2)
+ err = qp_trie_remove_leaf(trie, parent, bitmap, found);
+ else
+ err = qp_trie_merge_node(trie, grand_parent, rcu_dereference_protected(*parent, 1),
+ parent_bitmap, bitmap);
+unlock:
+ spin_unlock(lock);
+ return err;
+}
+
+static int qp_trie_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
+{
+ return 0;
+}
+
+BTF_ID_LIST_SINGLE(qp_trie_map_btf_ids, struct, qp_trie)
+const struct bpf_map_ops qp_trie_map_ops = {
+ .map_alloc_check = qp_trie_alloc_check,
+ .map_alloc = qp_trie_alloc,
+ .map_free = qp_trie_free,
+ .map_get_next_key = qp_trie_get_next_key,
+ .map_lookup_elem = qp_trie_lookup_elem,
+ .map_update_elem = qp_trie_update_elem,
+ .map_delete_elem = qp_trie_delete_elem,
+ .map_meta_equal = bpf_map_meta_equal,
+ .map_check_btf = qp_trie_check_btf,
+ .map_btf_id = &qp_trie_map_btf_ids[0],
+};
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 600c3fcee37a..1591f0df46f3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -928,6 +928,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_INODE_STORAGE,
BPF_MAP_TYPE_TASK_STORAGE,
BPF_MAP_TYPE_BLOOM_FILTER,
+ BPF_MAP_TYPE_QP_TRIE,
};
/* Note that tracing related programs such as
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 07/10] selftests/bpf: Add two new dynptr_fail cases for map key
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
` (5 preceding siblings ...)
2022-09-17 15:31 ` [PATCH bpf-next 06/10] bpf: Add support for qp-trie map Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 08/10] selftests/bpf: Add test case for basic qp-trie map operations Hou Tao
` (2 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
Now bpf_dynptr is also not usable for map key, so add two test cases to
ensure that: one to directly use bpf_dynptr use as map key and another
to use a struct with embedded bpf_dynptr as map key.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
.../testing/selftests/bpf/prog_tests/dynptr.c | 2 +
.../testing/selftests/bpf/progs/dynptr_fail.c | 43 +++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c
index bcf80b9f7c27..1df50fe1a741 100644
--- a/tools/testing/selftests/bpf/prog_tests/dynptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c
@@ -20,6 +20,8 @@ static struct {
{"ringbuf_invalid_api", "type=mem expected=alloc_mem"},
{"add_dynptr_to_map1", "invalid indirect read from stack"},
{"add_dynptr_to_map2", "invalid indirect read from stack"},
+ {"add_dynptr_to_key1", "invalid indirect read from stack"},
+ {"add_dynptr_to_key2", "invalid indirect read from stack"},
{"data_slice_out_of_bounds_ringbuf", "value is outside of the allowed memory range"},
{"data_slice_out_of_bounds_map_value", "value is outside of the allowed memory range"},
{"data_slice_use_after_release1", "invalid mem access 'scalar'"},
diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c
index b0f08ff024fb..85cc32999e8e 100644
--- a/tools/testing/selftests/bpf/progs/dynptr_fail.c
+++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c
@@ -35,6 +35,20 @@ struct {
__type(value, __u32);
} array_map3 SEC(".maps");
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 1);
+ __type(key, struct bpf_dynptr);
+ __type(value, __u32);
+} hash_map1 SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 1);
+ __type(key, struct test_info);
+ __type(value, __u32);
+} hash_map2 SEC(".maps");
+
struct sample {
int pid;
long value;
@@ -206,6 +220,35 @@ int add_dynptr_to_map2(void *ctx)
return 0;
}
+/* Can't use a dynptr as key of normal map */
+SEC("?raw_tp")
+int add_dynptr_to_key1(void *ctx)
+{
+ struct bpf_dynptr ptr;
+
+ get_map_val_dynptr(&ptr);
+ bpf_map_lookup_elem(&hash_map1, &ptr);
+
+ return 0;
+}
+
+/* Can't use a struct with an embedded dynptr as key of normal map */
+SEC("?raw_tp")
+int add_dynptr_to_key2(void *ctx)
+{
+ struct test_info x;
+
+ x.x = 0;
+ bpf_ringbuf_reserve_dynptr(&ringbuf, val, 0, &x.ptr);
+
+ /* this should fail */
+ bpf_map_lookup_elem(&hash_map2, &x);
+
+ bpf_ringbuf_submit_dynptr(&x.ptr, 0);
+
+ return 0;
+}
+
/* A data slice can't be accessed out of bounds */
SEC("?raw_tp")
int data_slice_out_of_bounds_ringbuf(void *ctx)
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 08/10] selftests/bpf: Add test case for basic qp-trie map operations
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
` (6 preceding siblings ...)
2022-09-17 15:31 ` [PATCH bpf-next 07/10] selftests/bpf: Add two new dynptr_fail cases for map key Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 09/10] selftests/bpf: Add benchmark for qp-trie map Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 10/10] selftests/bpf: Add tests for qp-trie map by using bpf syscalls Hou Tao
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
First showing the lookup on qp-trie map is different with on hash-tab,
because qp-trie map only uses the specified length in key buffer instead
of the full key size, then checking update and deletion operations on
qp-trie map work as expected by using bpf_dynptr. Also checking the
zero-sized bpf_dynptr is rejected by map operations.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../selftests/bpf/prog_tests/qp_trie_test.c | 91 +++++++++++++++
.../selftests/bpf/progs/qp_trie_test.c | 110 ++++++++++++++++++
3 files changed, 202 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/qp_trie_test.c
create mode 100644 tools/testing/selftests/bpf/progs/qp_trie_test.c
diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index 168c5b287b5c..e18d6f5ffeef 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -71,3 +71,4 @@ cb_refs # expected error message unexpected err
cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc)
htab_update # failed to attach: ERROR: strerror_r(-524)=22 (trampoline)
tracing_struct # failed to auto-attach: -524 (trampoline)
+qp_trie_test # failed to attach: ERROR: strerror_r(-524)=22 (trampoline)
diff --git a/tools/testing/selftests/bpf/prog_tests/qp_trie_test.c b/tools/testing/selftests/bpf/prog_tests/qp_trie_test.c
new file mode 100644
index 000000000000..24d5719f4f7f
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/qp_trie_test.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <test_progs.h>
+#include "qp_trie_test.skel.h"
+
+static int setup_maps(struct qp_trie_test *skel, char *name, unsigned int value)
+{
+#define FILE_PATH_SIZE 64
+ struct bpf_dynptr_user dynptr;
+ char raw[FILE_PATH_SIZE];
+ char zero;
+ int fd, err;
+
+ memset(raw, 0, sizeof(raw));
+ strncpy(raw, name, sizeof(raw) - 1);
+
+ fd = bpf_map__fd(skel->maps.trie);
+ /* Full path returned from d_path includes the trailing terminator */
+ bpf_dynptr_user_init(name, strlen(name) + 1, &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "trie add name"))
+ return -EINVAL;
+
+ zero = 0;
+ bpf_dynptr_user_init(&zero, 1, &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "trie add zero"))
+ return -EINVAL;
+
+ fd = bpf_map__fd(skel->maps.htab);
+ err = bpf_map_update_elem(fd, raw, &value, BPF_NOEXIST);
+ if (!ASSERT_OK(err, "htab add"))
+ return -EINVAL;
+
+ return 0;
+}
+
+void test_qp_trie_test(void)
+{
+ char name[] = "/tmp/qp_trie_test";
+ unsigned int value, new_value;
+ struct bpf_dynptr_user dynptr;
+ struct qp_trie_test *skel;
+ int err, fd;
+ char zero;
+
+ skel = qp_trie_test__open();
+ if (!ASSERT_OK_PTR(skel, "qp_trie_test__open()"))
+ return;
+
+ err = qp_trie_test__load(skel);
+ if (!ASSERT_OK(err, "qp_trie_test__load()"))
+ goto out;
+
+ value = time(NULL);
+ if (setup_maps(skel, name, value))
+ goto out;
+
+ skel->bss->pid = getpid();
+ err = qp_trie_test__attach(skel);
+ if (!ASSERT_OK(err, "attach"))
+ goto out;
+
+ fd = open(name, O_RDONLY | O_CREAT, 0644);
+ if (!ASSERT_GE(fd, 0, "open tmp file"))
+ goto out;
+ close(fd);
+ unlink(name);
+
+ ASSERT_EQ(skel->bss->trie_value, value, "trie lookup str");
+ ASSERT_EQ(skel->bss->htab_value, -1, "htab lookup bytes");
+ ASSERT_FALSE(skel->bss->zero_sized_key_bad, "zero-sized key");
+
+ bpf_dynptr_user_init(name, strlen(name) + 1, &dynptr);
+ new_value = 0;
+ err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.trie), &dynptr, &new_value);
+ ASSERT_OK(err, "lookup elem");
+ ASSERT_EQ(new_value, value + 1, "check new value");
+
+ zero = 0;
+ bpf_dynptr_user_init(&zero, 1, &dynptr);
+ err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.trie), &dynptr, &new_value);
+ ASSERT_EQ(err, -ENOENT, "lookup deleted elem");
+
+out:
+ qp_trie_test__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/qp_trie_test.c b/tools/testing/selftests/bpf/progs/qp_trie_test.c
new file mode 100644
index 000000000000..c7cf3dfae971
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/qp_trie_test.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct path {
+} __attribute__((preserve_access_index));
+
+struct file {
+ struct path f_path;
+} __attribute__((preserve_access_index));
+
+#define FILE_PATH_SIZE 64
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(max_entries, 2);
+ __uint(key_size, 4);
+ __uint(value_size, FILE_PATH_SIZE);
+} array SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_QP_TRIE);
+ __uint(max_entries, 2);
+ __type(key, struct bpf_dynptr);
+ __type(value, unsigned int);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_KEY);
+ __uint(map_extra, FILE_PATH_SIZE);
+} trie SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 1);
+ __uint(key_size, FILE_PATH_SIZE);
+ __uint(value_size, 4);
+} htab SEC(".maps");
+
+int pid = 0;
+unsigned int trie_value = 0;
+unsigned int htab_value = 0;
+bool zero_sized_key_bad = false;
+
+SEC("fentry/filp_close")
+int BPF_PROG(filp_close, struct file *filp)
+{
+ struct bpf_dynptr str_ptr, zero_ptr, zero_sized_ptr;
+ unsigned int new_value, *value;
+ int idx, len, err;
+ struct path *p;
+ char *raw;
+
+ if (bpf_get_current_pid_tgid() >> 32 != pid)
+ return 0;
+
+ idx = 0;
+ raw = bpf_map_lookup_elem(&array, &idx);
+ if (!raw)
+ return 0;
+
+ p = &filp->f_path;
+ len = bpf_d_path(p, raw, FILE_PATH_SIZE);
+ if (len < 0 || len > FILE_PATH_SIZE)
+ return 0;
+
+ bpf_dynptr_from_mem(raw, len, 0, &str_ptr);
+ value = bpf_map_lookup_elem(&trie, &str_ptr);
+ if (value)
+ trie_value = *value;
+ else
+ trie_value = -1;
+
+ value = bpf_map_lookup_elem(&htab, raw);
+ if (value)
+ htab_value = *value;
+ else
+ htab_value = -1;
+
+ /* Update qp_trie map */
+ new_value = trie_value + 1;
+ bpf_map_update_elem(&trie, &str_ptr, &new_value, BPF_ANY);
+
+ idx = 1;
+ raw = bpf_map_lookup_elem(&array, &idx);
+ if (!raw)
+ return 0;
+ bpf_dynptr_from_mem(raw, 1, 0, &zero_ptr);
+ bpf_map_delete_elem(&trie, &zero_ptr);
+
+ /* Use zero-sized bpf_dynptr */
+ bpf_dynptr_from_mem(raw, 0, 0, &zero_sized_ptr);
+
+ value = bpf_map_lookup_elem(&trie, &zero_sized_ptr);
+ if (value)
+ zero_sized_key_bad = true;
+ err = bpf_map_update_elem(&trie, &zero_sized_ptr, &new_value, BPF_ANY);
+ if (err != -EINVAL)
+ zero_sized_key_bad = true;
+ err = bpf_map_delete_elem(&trie, &zero_sized_ptr);
+ if (err != -EINVAL)
+ zero_sized_key_bad = true;
+
+ return 0;
+}
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 09/10] selftests/bpf: Add benchmark for qp-trie map
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
` (7 preceding siblings ...)
2022-09-17 15:31 ` [PATCH bpf-next 08/10] selftests/bpf: Add test case for basic qp-trie map operations Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
2022-09-17 15:31 ` [PATCH bpf-next 10/10] selftests/bpf: Add tests for qp-trie map by using bpf syscalls Hou Tao
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
Add a benchmark for qp-trie map to compare lookup, update/delete
performance and memory usage with hash table.
When the content of keys are uniformly distributed and there are large
differencies between key length, qp-trie will be dense and has low
height, but the lookup overhead of htab increases due to unnecessary
memory comparison, so the lookup performance of qp-trie will be much
better than hash-table as shown below:
Randomly-generated binary data (key size=255, max entries=16K, key length range:[1, 255])
htab lookup (1 thread) 4.968 ± 0.009M/s (drops 0.002 ± 0.000M/s mem 8.169 MiB)
htab lookup (2 thread) 10.118 ± 0.010M/s (drops 0.007 ± 0.000M/s mem 8.169 MiB)
htab lookup (4 thread) 20.084 ± 0.022M/s (drops 0.007 ± 0.000M/s mem 8.168 MiB)
htab lookup (8 thread) 39.866 ± 0.047M/s (drops 0.010 ± 0.000M/s mem 8.168 MiB)
htab lookup (16 thread) 79.412 ± 0.065M/s (drops 0.049 ± 0.000M/s mem 8.169 MiB)
htab update (1 thread) 2.122 ± 0.021M/s (drops 0.000 ± 0.000M/s mem 8.169 MiB)
htab update (2 thread) 4.248 ± 0.197M/s (drops 0.000 ± 0.000M/s mem 8.168 MiB)
htab update (4 thread) 8.475 ± 0.348M/s (drops 0.000 ± 0.000M/s mem 8.180 MiB)
htab update (8 thread) 16.725 ± 0.633M/s (drops 0.000 ± 0.000M/s mem 8.208 MiB)
htab update (16 thread) 30.246 ± 0.611M/s (drops 0.000 ± 0.000M/s mem 8.190 MiB)
qp-trie lookup (1 thread) 10.291 ± 0.007M/s (drops 0.004 ± 0.000M/s mem 4.899 MiB)
qp-trie lookup (2 thread) 20.797 ± 0.009M/s (drops 0.006 ± 0.000M/s mem 4.879 MiB)
qp-trie lookup (4 thread) 41.943 ± 0.019M/s (drops 0.015 ± 0.000M/s mem 4.262 MiB)
qp-trie lookup (8 thread) 81.985 ± 0.032M/s (drops 0.025 ± 0.000M/s mem 4.215 MiB)
qp-trie lookup (16 thread) 164.681 ± 0.051M/s (drops 0.050 ± 0.000M/s mem 4.261 MiB)
qp-trie update (1 thread) 1.622 ± 0.016M/s (drops 0.000 ± 0.000M/s mem 4.918 MiB)
qp-trie update (2 thread) 2.688 ± 0.021M/s (drops 0.000 ± 0.000M/s mem 4.874 MiB)
qp-trie update (4 thread) 4.062 ± 0.128M/s (drops 0.000 ± 0.000M/s mem 4.218 MiB)
qp-trie update (8 thread) 7.037 ± 0.247M/s (drops 0.000 ± 0.000M/s mem 4.900 MiB)
qp-trie update (16 thread) 11.024 ± 0.295M/s (drops 0.000 ± 0.000M/s mem 4.830 MiB)
For the strings in /proc/kallsyms, single-thread lookup performance is
about ~27% slower compared with hash table. When number of threads
increase, lookup performance is almost the same. But update and deletion
performance of qp-trie are much worsed compared with hash table as shown
below:
Strings in /proc/kallsyms (key size=83, max entries=170958)
htab lookup (1 thread) 5.686 ± 0.008M/s (drops 0.345 ± 0.002M/s mem 30.840 MiB)
htab lookup (2 thread) 10.147 ± 0.067M/s (drops 0.616 ± 0.005M/s mem 30.841 MiB)
htab lookup (4 thread) 16.503 ± 0.025M/s (drops 1.002 ± 0.004M/s mem 30.841 MiB)
htab lookup (8 thread) 33.429 ± 0.146M/s (drops 2.028 ± 0.020M/s mem 30.848 MiB)
htab lookup (16 thread) 67.249 ± 0.577M/s (drops 4.085 ± 0.032M/s mem 30.841 MiB)
htab update (1 thread) 3.135 ± 0.355M/s (drops 0.000 ± 0.000M/s mem 30.842 MiB)
htab update (2 thread) 6.269 ± 0.686M/s (drops 0.000 ± 0.000M/s mem 30.841 MiB)
htab update (4 thread) 11.607 ± 1.632M/s (drops 0.000 ± 0.000M/s mem 30.840 MiB)
htab update (8 thread) 23.041 ± 0.806M/s (drops 0.000 ± 0.000M/s mem 30.842 MiB)
htab update (16 thread) 31.393 ± 0.307M/s (drops 0.000 ± 0.000M/s mem 30.835 MiB)
qp-trie lookup (1 thread) 4.122 ± 0.010M/s (drops 0.250 ± 0.002M/s mem 30.108 MiB)
qp-trie lookup (2 thread) 9.119 ± 0.057M/s (drops 0.554 ± 0.004M/s mem 17.422 MiB)
qp-trie lookup (4 thread) 16.605 ± 0.032M/s (drops 1.008 ± 0.006M/s mem 17.203 MiB)
qp-trie lookup (8 thread) 33.461 ± 0.058M/s (drops 2.032 ± 0.004M/s mem 16.977 MiB)
qp-trie lookup (16 thread) 67.466 ± 0.145M/s (drops 4.097 ± 0.019M/s mem 17.452 MiB)
qp-trie update (1 thread) 1.191 ± 0.093M/s (drops 0.000 ± 0.000M/s mem 17.170 MiB)
qp-trie update (2 thread) 2.057 ± 0.041M/s (drops 0.000 ± 0.000M/s mem 17.058 MiB)
qp-trie update (4 thread) 2.975 ± 0.035M/s (drops 0.000 ± 0.000M/s mem 17.411 MiB)
qp-trie update (8 thread) 3.596 ± 0.031M/s (drops 0.000 ± 0.000M/s mem 17.110 MiB)
qp-trie update (16 thread) 4.200 ± 0.048M/s (drops 0.000 ± 0.000M/s mem 17.228 MiB)
For strings in BTF string section, the results are similar:
Sorted strings in BTF string sections (key size=71, max entries=115980)
htab lookup (1 thread) 6.990 ± 0.050M/s (drops 0.000 ± 0.000M/s mem 22.227 MiB)
htab lookup (2 thread) 12.729 ± 0.059M/s (drops 0.000 ± 0.000M/s mem 22.224 MiB)
htab lookup (4 thread) 21.202 ± 0.099M/s (drops 0.000 ± 0.000M/s mem 22.218 MiB)
htab lookup (8 thread) 43.418 ± 0.169M/s (drops 0.000 ± 0.000M/s mem 22.225 MiB)
htab lookup (16 thread) 88.745 ± 0.410M/s (drops 0.000 ± 0.000M/s mem 22.224 MiB)
htab update (1 thread) 3.238 ± 0.271M/s (drops 0.000 ± 0.000M/s mem 22.228 MiB)
htab update (2 thread) 6.483 ± 0.821M/s (drops 0.000 ± 0.000M/s mem 22.227 MiB)
htab update (4 thread) 12.702 ± 0.924M/s (drops 0.000 ± 0.000M/s mem 22.226 MiB)
htab update (8 thread) 22.167 ± 1.269M/s (drops 0.000 ± 0.000M/s mem 22.229 MiB)
htab update (16 thread) 31.225 ± 0.475M/s (drops 0.000 ± 0.000M/s mem 22.239 MiB)
qp-trie lookup (1 thread) 6.729 ± 0.006M/s (drops 0.000 ± 0.000M/s mem 11.335 MiB)
qp-trie lookup (2 thread) 13.417 ± 0.010M/s (drops 0.000 ± 0.000M/s mem 11.287 MiB)
qp-trie lookup (4 thread) 26.399 ± 0.043M/s (drops 0.000 ± 0.000M/s mem 11.111 MiB)
qp-trie lookup (8 thread) 52.910 ± 0.049M/s (drops 0.000 ± 0.000M/s mem 11.131 MiB)
qp-trie lookup (16 thread) 105.444 ± 0.064M/s (drops 0.000 ± 0.000M/s mem 11.060 MiB)
qp-trie update (1 thread) 1.508 ± 0.102M/s (drops 0.000 ± 0.000M/s mem 10.979 MiB)
qp-trie update (2 thread) 2.877 ± 0.034M/s (drops 0.000 ± 0.000M/s mem 10.843 MiB)
qp-trie update (4 thread) 5.111 ± 0.083M/s (drops 0.000 ± 0.000M/s mem 10.938 MiB)
qp-trie update (8 thread) 9.229 ± 0.046M/s (drops 0.000 ± 0.000M/s mem 11.042 MiB)
qp-trie update (16 thread) 11.625 ± 0.147M/s (drops 0.000 ± 0.000M/s mem 10.930 MiB)
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
tools/testing/selftests/bpf/Makefile | 5 +-
tools/testing/selftests/bpf/bench.c | 10 +
.../selftests/bpf/benchs/bench_qp_trie.c | 511 ++++++++++++++++++
.../selftests/bpf/benchs/run_bench_qp_trie.sh | 55 ++
.../selftests/bpf/progs/qp_trie_bench.c | 236 ++++++++
5 files changed, 816 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/benchs/bench_qp_trie.c
create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh
create mode 100644 tools/testing/selftests/bpf/progs/qp_trie_bench.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 5f42adddbbb0..42ea2bc91d3b 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -577,11 +577,13 @@ $(OUTPUT)/bench_strncmp.o: $(OUTPUT)/strncmp_bench.skel.h
$(OUTPUT)/bench_bpf_hashmap_full_update.o: $(OUTPUT)/bpf_hashmap_full_update_bench.skel.h
$(OUTPUT)/bench_local_storage.o: $(OUTPUT)/local_storage_bench.skel.h
$(OUTPUT)/bench_local_storage_rcu_tasks_trace.o: $(OUTPUT)/local_storage_rcu_tasks_trace_bench.skel.h
+$(OUTPUT)/bench_qp_trie.o: $(OUTPUT)/qp_trie_bench.skel.h
$(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ)
$(OUTPUT)/bench: LDLIBS += -lm
$(OUTPUT)/bench: $(OUTPUT)/bench.o \
$(TESTING_HELPERS) \
$(TRACE_HELPERS) \
+ $(CGROUP_HELPERS) \
$(OUTPUT)/bench_count.o \
$(OUTPUT)/bench_rename.o \
$(OUTPUT)/bench_trigger.o \
@@ -591,7 +593,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
$(OUTPUT)/bench_strncmp.o \
$(OUTPUT)/bench_bpf_hashmap_full_update.o \
$(OUTPUT)/bench_local_storage.o \
- $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o
+ $(OUTPUT)/bench_local_storage_rcu_tasks_trace.o \
+ $(OUTPUT)/bench_qp_trie.o
$(call msg,BINARY,,$@)
$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index c1f20a147462..618f45fbe6e2 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -275,6 +275,7 @@ extern struct argp bench_bpf_loop_argp;
extern struct argp bench_local_storage_argp;
extern struct argp bench_local_storage_rcu_tasks_trace_argp;
extern struct argp bench_strncmp_argp;
+extern struct argp bench_qp_trie_argp;
static const struct argp_child bench_parsers[] = {
{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
@@ -284,6 +285,7 @@ static const struct argp_child bench_parsers[] = {
{ &bench_strncmp_argp, 0, "bpf_strncmp helper benchmark", 0 },
{ &bench_local_storage_rcu_tasks_trace_argp, 0,
"local_storage RCU Tasks Trace slowdown benchmark", 0 },
+ { &bench_qp_trie_argp, 0, "qp-trie benchmark", 0 },
{},
};
@@ -490,6 +492,10 @@ extern const struct bench bench_local_storage_cache_seq_get;
extern const struct bench bench_local_storage_cache_interleaved_get;
extern const struct bench bench_local_storage_cache_hashmap_control;
extern const struct bench bench_local_storage_tasks_trace;
+extern const struct bench bench_htab_lookup;
+extern const struct bench bench_qp_trie_lookup;
+extern const struct bench bench_htab_update;
+extern const struct bench bench_qp_trie_update;
static const struct bench *benchs[] = {
&bench_count_global,
@@ -529,6 +535,10 @@ static const struct bench *benchs[] = {
&bench_local_storage_cache_interleaved_get,
&bench_local_storage_cache_hashmap_control,
&bench_local_storage_tasks_trace,
+ &bench_htab_lookup,
+ &bench_qp_trie_lookup,
+ &bench_htab_update,
+ &bench_qp_trie_update,
};
static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/benchs/bench_qp_trie.c b/tools/testing/selftests/bpf/benchs/bench_qp_trie.c
new file mode 100644
index 000000000000..ea38e272b5e4
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/bench_qp_trie.c
@@ -0,0 +1,511 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <argp.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include "bench.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+
+#include "qp_trie_bench.skel.h"
+
+enum {
+ FOR_HTAB = 0,
+ FOR_TRIE,
+};
+
+static struct qp_trie_ctx {
+ struct qp_trie_bench *skel;
+ int cgrp_dfd;
+ u64 map_slab_mem;
+} ctx;
+
+static struct {
+ const char *file;
+ __u32 entries;
+} args;
+
+struct qp_trie_key {
+ __u32 len;
+ unsigned char data[0];
+};
+
+struct run_stat {
+ __u64 stats[2];
+};
+
+enum {
+ ARG_DATA_FILE = 8001,
+ ARG_DATA_ENTRIES = 8002,
+};
+
+static const struct argp_option opts[] = {
+ { "file", ARG_DATA_FILE, "DATA-FILE", 0, "Set data file" },
+ { "entries", ARG_DATA_ENTRIES, "DATA-ENTRIES", 0, "Set data entries" },
+ {},
+};
+
+static error_t qp_trie_parse_arg(int key, char *arg, struct argp_state *state)
+{
+ switch (key) {
+ case ARG_DATA_FILE:
+ args.file = strdup(arg);
+ break;
+ case ARG_DATA_ENTRIES:
+ args.entries = strtoul(arg, NULL, 10);
+ break;
+ default:
+ return ARGP_ERR_UNKNOWN;
+ }
+
+ return 0;
+}
+
+const struct argp bench_qp_trie_argp = {
+ .options = opts,
+ .parser = qp_trie_parse_arg,
+};
+
+static int parse_data_set(const char *name, struct qp_trie_key ***set, unsigned int *nr,
+ unsigned int *max_len)
+{
+#define INT_MAX_DATA_SIZE 1024
+ unsigned int i, nr_items, item_max_len;
+ char line[INT_MAX_DATA_SIZE + 1];
+ struct qp_trie_key **items;
+ struct qp_trie_key *cur;
+ int err = 0;
+ FILE *file;
+ char *got;
+
+ file = fopen(name, "rb");
+ if (!file) {
+ fprintf(stderr, "open %s err %s\n", name, strerror(errno));
+ return -1;
+ }
+
+ got = fgets(line, sizeof(line), file);
+ if (!got) {
+ fprintf(stderr, "empty file ?\n");
+ err = -1;
+ goto out;
+ }
+ if (sscanf(line, "%u", &nr_items) != 1) {
+ fprintf(stderr, "the first line must be the number of items\n");
+ err = -1;
+ goto out;
+ }
+
+ fprintf(stdout, "item %u\n", nr_items);
+
+ items = (struct qp_trie_key **)calloc(nr_items, sizeof(*items) + INT_MAX_DATA_SIZE);
+ if (!items) {
+ fprintf(stderr, "no mem for items\n");
+ err = -1;
+ goto out;
+ }
+
+ i = 0;
+ item_max_len = 0;
+ cur = (void *)items + sizeof(*items) * nr_items;
+ while (true) {
+ unsigned int len;
+
+ got = fgets(line, sizeof(line), file);
+ if (!got) {
+ if (!feof(file)) {
+ fprintf(stderr, "read file %s error\n", name);
+ err = -1;
+ }
+ break;
+ }
+
+ len = strlen(got);
+ if (len && got[len - 1] == '\n') {
+ got[len - 1] = 0;
+ len -= 1;
+ }
+ if (!len) {
+ fprintf(stdout, "#%u empty line\n", i + 2);
+ continue;
+ }
+
+ if (i >= nr_items) {
+ fprintf(stderr, "too many line in %s\n", name);
+ break;
+ }
+
+ if (len > item_max_len)
+ item_max_len = len;
+ cur->len = len;
+ memcpy(cur->data, got, len);
+ items[i++] = cur;
+ cur = (void *)cur + INT_MAX_DATA_SIZE;
+ }
+
+ if (!err) {
+ if (i != nr_items)
+ fprintf(stdout, "few lines in %s (exp %u got %u)\n", name, nr_items, i);
+ *nr = i;
+ *set = items;
+ *max_len = item_max_len;
+ } else {
+ free(items);
+ }
+
+out:
+ fclose(file);
+ return err;
+}
+
+static int gen_data_set(struct qp_trie_key ***set, unsigned int *nr, unsigned int *max_len)
+{
+#define RND_MAX_DATA_SIZE 255
+ struct qp_trie_key **items;
+ size_t ptr_size, data_size;
+ struct qp_trie_key *cur;
+ unsigned int i, nr_items;
+ ssize_t got;
+ int err = 0;
+
+ ptr_size = *nr * sizeof(*items);
+ data_size = *nr * (sizeof(*cur) + RND_MAX_DATA_SIZE);
+ items = (struct qp_trie_key **)malloc(ptr_size + data_size);
+ if (!items) {
+ fprintf(stderr, "no mem for items\n");
+ err = -1;
+ goto out;
+ }
+
+ cur = (void *)items + ptr_size;
+ got = syscall(__NR_getrandom, cur, data_size, 0);
+ if (got != data_size) {
+ fprintf(stderr, "getrandom error %s\n", strerror(errno));
+ err = -1;
+ goto out;
+ }
+
+ nr_items = 0;
+ for (i = 0; i < *nr; i++) {
+ cur->len &= 0xff;
+ if (cur->len) {
+ items[nr_items++] = cur;
+ memset(cur->data + cur->len, 0, RND_MAX_DATA_SIZE - cur->len);
+ }
+ cur = (void *)cur + (sizeof(*cur) + RND_MAX_DATA_SIZE);
+ }
+ if (!nr_items) {
+ fprintf(stderr, "no valid key in random data\n");
+ err = -1;
+ goto out;
+ }
+ fprintf(stdout, "generate %u random keys\n", nr_items);
+
+ *nr = nr_items;
+ *set = items;
+ *max_len = RND_MAX_DATA_SIZE;
+out:
+ if (err && items)
+ free(items);
+ return err;
+}
+
+static void qp_trie_validate(void)
+{
+ if (env.consumer_cnt != 1) {
+ fprintf(stderr, "qp_trie_map benchmark doesn't support multi-consumer!\n");
+ exit(1);
+ }
+
+ if (!args.file && !args.entries) {
+ fprintf(stderr, "must specify entries when use random generated data set\n");
+ exit(1);
+ }
+
+ if (args.file && access(args.file, R_OK)) {
+ fprintf(stderr, "data file is un-accessible\n");
+ exit(1);
+ }
+}
+
+static void qp_trie_init_map_opts(struct qp_trie_bench *skel, unsigned int data_size,
+ unsigned int nr)
+{
+ bpf_map__set_value_size(skel->maps.htab_array, data_size);
+ bpf_map__set_max_entries(skel->maps.htab_array, nr);
+
+ bpf_map__set_key_size(skel->maps.htab, data_size);
+ bpf_map__set_max_entries(skel->maps.htab, nr);
+
+ bpf_map__set_value_size(skel->maps.trie_array, sizeof(struct qp_trie_key) + data_size);
+ bpf_map__set_max_entries(skel->maps.trie_array, nr);
+
+ bpf_map__set_map_extra(skel->maps.qp_trie, data_size);
+ bpf_map__set_max_entries(skel->maps.qp_trie, nr);
+}
+
+static void qp_trie_setup_key_map(struct bpf_map *map, unsigned int map_type,
+ struct qp_trie_key **set, unsigned int nr)
+{
+ int fd = bpf_map__fd(map);
+ unsigned int i;
+
+ for (i = 0; i < nr; i++) {
+ void *value;
+ int err;
+
+ value = (map_type != FOR_HTAB) ? (void *)set[i] : (void *)set[i]->data;
+ err = bpf_map_update_elem(fd, &i, value, 0);
+ if (err) {
+ fprintf(stderr, "add #%u key (%s) on %s error %d\n",
+ i, set[i]->data, bpf_map__name(map), err);
+ exit(1);
+ }
+ }
+}
+
+static u64 qp_trie_get_slab_mem(int dfd)
+{
+ const char *magic = "slab ";
+ const char *name = "memory.stat";
+ int fd;
+ ssize_t nr;
+ char buf[4096];
+ char *from;
+
+ fd = openat(dfd, name, 0);
+ if (fd < 0) {
+ fprintf(stderr, "no %s\n", name);
+ exit(1);
+ }
+
+ nr = read(fd, buf, sizeof(buf));
+ if (nr <= 0) {
+ fprintf(stderr, "empty %s ?\n", name);
+ exit(1);
+ }
+ buf[nr - 1] = 0;
+
+ close(fd);
+
+ from = strstr(buf, magic);
+ if (!from) {
+ fprintf(stderr, "no slab in %s\n", name);
+ exit(1);
+ }
+
+ return strtoull(from + strlen(magic), NULL, 10);
+}
+
+static void qp_trie_setup_lookup_map(struct bpf_map *map, unsigned int map_type,
+ struct qp_trie_key **set, unsigned int nr)
+{
+ int fd = bpf_map__fd(map);
+ unsigned int i;
+
+ for (i = 0; i < nr; i++) {
+ int err;
+
+ if (map_type == FOR_HTAB) {
+ void *key;
+
+ key = set[i]->data;
+ err = bpf_map_update_elem(fd, key, &i, 0);
+ } else {
+ struct bpf_dynptr_user dynptr;
+
+ bpf_dynptr_user_init(set[i]->data, set[i]->len, &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &i, 0);
+ }
+ if (err) {
+ fprintf(stderr, "add #%u key (%s) on %s error %d\n",
+ i, set[i]->data, bpf_map__name(map), err);
+ exit(1);
+ }
+ }
+}
+
+static void qp_trie_setup(unsigned int map_type)
+{
+ struct qp_trie_key **set = NULL;
+ struct qp_trie_bench *skel;
+ unsigned int nr = 0, max_len = 0;
+ struct bpf_map *map;
+ u64 before, after;
+ int dfd;
+ int err;
+
+ if (!args.file) {
+ nr = args.entries;
+ err = gen_data_set(&set, &nr, &max_len);
+ } else {
+ err = parse_data_set(args.file, &set, &nr, &max_len);
+ }
+ if (err < 0)
+ exit(1);
+
+ if (args.entries && args.entries < nr)
+ nr = args.entries;
+
+ dfd = cgroup_setup_and_join("/qp_trie");
+ if (dfd < 0) {
+ fprintf(stderr, "failed to setup cgroup env\n");
+ exit(1);
+ }
+
+ setup_libbpf();
+
+ before = qp_trie_get_slab_mem(dfd);
+
+ skel = qp_trie_bench__open();
+ if (!skel) {
+ fprintf(stderr, "failed to open skeleton\n");
+ exit(1);
+ }
+
+ qp_trie_init_map_opts(skel, max_len, nr);
+
+ skel->rodata->qp_trie_key_size = max_len;
+ skel->bss->update_nr = nr;
+ skel->bss->update_chunk = nr / env.producer_cnt;
+
+ err = qp_trie_bench__load(skel);
+ if (err) {
+ fprintf(stderr, "failed to load skeleton\n");
+ exit(1);
+ }
+
+ map = (map_type == FOR_HTAB) ? skel->maps.htab_array : skel->maps.trie_array;
+ qp_trie_setup_key_map(map, map_type, set, nr);
+
+ map = (map_type == FOR_HTAB) ? skel->maps.htab : skel->maps.qp_trie;
+ qp_trie_setup_lookup_map(map, map_type, set, nr);
+
+ after = qp_trie_get_slab_mem(dfd);
+
+ ctx.skel = skel;
+ ctx.cgrp_dfd = dfd;
+ ctx.map_slab_mem = after - before;
+}
+
+static void qp_trie_attach_prog(struct bpf_program *prog)
+{
+ struct bpf_link *link;
+
+ link = bpf_program__attach(prog);
+ if (!link) {
+ fprintf(stderr, "failed to attach program!\n");
+ exit(1);
+ }
+}
+
+static void htab_lookup_setup(void)
+{
+ qp_trie_setup(FOR_HTAB);
+ qp_trie_attach_prog(ctx.skel->progs.htab_lookup);
+}
+
+static void qp_trie_lookup_setup(void)
+{
+ qp_trie_setup(FOR_TRIE);
+ qp_trie_attach_prog(ctx.skel->progs.qp_trie_lookup);
+}
+
+static void htab_update_setup(void)
+{
+ qp_trie_setup(FOR_HTAB);
+ qp_trie_attach_prog(ctx.skel->progs.htab_update);
+}
+
+static void qp_trie_update_setup(void)
+{
+ qp_trie_setup(FOR_TRIE);
+ qp_trie_attach_prog(ctx.skel->progs.qp_trie_update);
+}
+
+static void *qp_trie_producer(void *ctx)
+{
+ while (true)
+ (void)syscall(__NR_getpgid);
+ return NULL;
+}
+
+static void *qp_trie_consumer(void *ctx)
+{
+ return NULL;
+}
+
+static void qp_trie_measure(struct bench_res *res)
+{
+ static __u64 last_hits, last_drops;
+ __u64 total_hits = 0, total_drops = 0;
+ unsigned int i, nr_cpus;
+
+ nr_cpus = bpf_num_possible_cpus();
+ for (i = 0; i < nr_cpus; i++) {
+ struct run_stat *s = (void *)&ctx.skel->bss->percpu_stats[i & 255];
+
+ total_hits += s->stats[0];
+ total_drops += s->stats[1];
+ }
+
+ res->hits = total_hits - last_hits;
+ res->drops = total_drops - last_drops;
+
+ last_hits = total_hits;
+ last_drops = total_drops;
+}
+
+static void qp_trie_report_final(struct bench_res res[], int res_cnt)
+{
+ close(ctx.cgrp_dfd);
+ cleanup_cgroup_environment();
+
+ fprintf(stdout, "Slab: %.3f MiB\n", (float)ctx.map_slab_mem / 1024 / 1024);
+ hits_drops_report_final(res, res_cnt);
+}
+
+const struct bench bench_htab_lookup = {
+ .name = "htab-lookup",
+ .validate = qp_trie_validate,
+ .setup = htab_lookup_setup,
+ .producer_thread = qp_trie_producer,
+ .consumer_thread = qp_trie_consumer,
+ .measure = qp_trie_measure,
+ .report_progress = hits_drops_report_progress,
+ .report_final = qp_trie_report_final,
+};
+
+const struct bench bench_qp_trie_lookup = {
+ .name = "qp-trie-lookup",
+ .validate = qp_trie_validate,
+ .setup = qp_trie_lookup_setup,
+ .producer_thread = qp_trie_producer,
+ .consumer_thread = qp_trie_consumer,
+ .measure = qp_trie_measure,
+ .report_progress = hits_drops_report_progress,
+ .report_final = qp_trie_report_final,
+};
+
+const struct bench bench_htab_update = {
+ .name = "htab-update",
+ .validate = qp_trie_validate,
+ .setup = htab_update_setup,
+ .producer_thread = qp_trie_producer,
+ .consumer_thread = qp_trie_consumer,
+ .measure = qp_trie_measure,
+ .report_progress = hits_drops_report_progress,
+ .report_final = qp_trie_report_final,
+};
+
+const struct bench bench_qp_trie_update = {
+ .name = "qp-trie-update",
+ .validate = qp_trie_validate,
+ .setup = qp_trie_update_setup,
+ .producer_thread = qp_trie_producer,
+ .consumer_thread = qp_trie_consumer,
+ .measure = qp_trie_measure,
+ .report_progress = hits_drops_report_progress,
+ .report_final = qp_trie_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh b/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh
new file mode 100755
index 000000000000..0cbcb5bc9292
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_bench_qp_trie.sh
@@ -0,0 +1,55 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2022. Huawei Technologies Co., Ltd
+
+source ./benchs/run_common.sh
+
+set -eufo pipefail
+
+mem()
+{
+ echo "$*" | sed -E "s/.*Slab: ([0-9]+\.[0-9]+ MiB).*/\1/"
+}
+
+run_qp_trie_bench()
+{
+ local title=$1
+ local summary
+
+ shift 1
+ summary=$($RUN_BENCH "$@" | grep "Summary\|Slab:")
+ printf "%s %20s (drops %-16s mem %s)\n" "$title" "$(hits $summary)" \
+ "$(drops $summary)" "$(mem $summary)"
+}
+
+run_qp_trie_benchs()
+{
+ local p
+ local m
+ local b
+ local title
+
+ for m in htab qp-trie
+ do
+ for b in lookup update
+ do
+ for p in 1 2 4 8 16
+ do
+ title=$(printf "%-16s (%-2d thread)" "$m $b" $p)
+ run_qp_trie_bench "$title" ${m}-${b} -p $p "$@"
+ done
+ done
+ done
+ echo
+}
+
+echo "Randomly-generated binary data (16K)"
+run_qp_trie_benchs --entries 16384
+
+echo "Strings in /proc/kallsyms"
+TMP_FILE=/tmp/kallsyms.txt
+SRC_FILE=/proc/kallsyms
+trap 'rm -f $TMP_FILE' EXIT
+wc -l $SRC_FILE | awk '{ print $1}' > $TMP_FILE
+awk '{ print $3 }' $SRC_FILE >> $TMP_FILE
+run_qp_trie_benchs --file $TMP_FILE
diff --git a/tools/testing/selftests/bpf/progs/qp_trie_bench.c b/tools/testing/selftests/bpf/progs/qp_trie_bench.c
new file mode 100644
index 000000000000..b60acb7b9f94
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/qp_trie_bench.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <linux/errno.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct bpf_map;
+
+struct qp_trie_key {
+ __u32 len;
+ unsigned char data[0];
+};
+
+/* value_size will be set by benchmark */
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(key_size, 4);
+} htab_array SEC(".maps");
+
+/* value_size will be set by benchmark */
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(key_size, 4);
+} trie_array SEC(".maps");
+
+/* key_size will be set by benchmark */
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(value_size, 4);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+} htab SEC(".maps");
+
+/* map_extra will be set by benchmark */
+struct {
+ __uint(type, BPF_MAP_TYPE_QP_TRIE);
+ __type(key, struct bpf_dynptr);
+ __type(value, unsigned int);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_DYNPTR_KEY);
+} qp_trie SEC(".maps");
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+ __u64 stats[2];
+} __attribute__((__aligned__(128))) percpu_stats[256];
+
+struct update_ctx {
+ unsigned int max;
+ unsigned int from;
+};
+
+volatile const unsigned int qp_trie_key_size;
+
+unsigned int update_nr;
+unsigned int update_chunk;
+
+static __always_inline void update_stats(int idx)
+{
+ __u32 cpu = bpf_get_smp_processor_id();
+
+ percpu_stats[cpu & 255].stats[idx]++;
+}
+
+static int lookup_htab(struct bpf_map *map, __u32 *key, void *value, void *data)
+{
+ __u32 *index;
+
+ index = bpf_map_lookup_elem(&htab, value);
+ if (index && *index == *key)
+ update_stats(0);
+ else
+ update_stats(1);
+ return 0;
+}
+
+static int update_htab_loop(unsigned int i, void *ctx)
+{
+ struct update_ctx *update = ctx;
+ void *value;
+ int err;
+
+ if (update->from >= update->max)
+ update->from = 0;
+ value = bpf_map_lookup_elem(&htab_array, &update->from);
+ if (!value)
+ return 1;
+
+ err = bpf_map_update_elem(&htab, value, &update->from, 0);
+ if (!err)
+ update_stats(0);
+ else
+ update_stats(1);
+ update->from++;
+
+ return 0;
+}
+
+static int delete_htab_loop(unsigned int i, void *ctx)
+{
+ struct update_ctx *update = ctx;
+ void *value;
+ int err;
+
+ if (update->from >= update->max)
+ update->from = 0;
+ value = bpf_map_lookup_elem(&htab_array, &update->from);
+ if (!value)
+ return 1;
+
+ err = bpf_map_delete_elem(&htab, value);
+ if (!err)
+ update_stats(0);
+ update->from++;
+
+ return 0;
+}
+
+static int lookup_qp_trie(struct bpf_map *map, __u32 *key, void *value, void *data)
+{
+ struct qp_trie_key *qp_trie_key = value;
+ struct bpf_dynptr dynptr;
+ __u32 *index;
+
+ if (qp_trie_key->len > qp_trie_key_size)
+ return 0;
+
+ bpf_dynptr_from_mem(qp_trie_key->data, qp_trie_key->len, 0, &dynptr);
+ index = bpf_map_lookup_elem(&qp_trie, &dynptr);
+ if (index && *index == *key)
+ update_stats(0);
+ else
+ update_stats(1);
+ return 0;
+}
+
+static int update_qp_trie_loop(unsigned int i, void *ctx)
+{
+ struct update_ctx *update = ctx;
+ struct qp_trie_key *value;
+ struct bpf_dynptr dynptr;
+ int err;
+
+ if (update->from >= update->max)
+ update->from = 0;
+ value = bpf_map_lookup_elem(&trie_array, &update->from);
+ if (!value || value->len > qp_trie_key_size)
+ return 1;
+
+ bpf_dynptr_from_mem(value->data, value->len, 0, &dynptr);
+ err = bpf_map_update_elem(&qp_trie, &dynptr, &update->from, 0);
+ if (!err)
+ update_stats(0);
+ else
+ update_stats(1);
+ update->from++;
+
+ return 0;
+}
+
+static int delete_qp_trie_loop(unsigned int i, void *ctx)
+{
+ struct update_ctx *update = ctx;
+ struct qp_trie_key *value;
+ struct bpf_dynptr dynptr;
+ int err;
+
+ if (update->from >= update->max)
+ update->from = 0;
+ value = bpf_map_lookup_elem(&trie_array, &update->from);
+ if (!value || value->len > qp_trie_key_size)
+ return 1;
+
+ bpf_dynptr_from_mem(value->data, value->len, 0, &dynptr);
+ err = bpf_map_delete_elem(&qp_trie, &dynptr);
+ if (!err)
+ update_stats(0);
+ update->from++;
+
+ return 0;
+}
+
+SEC("tp/syscalls/sys_enter_getpgid")
+int htab_lookup(void *ctx)
+{
+ bpf_for_each_map_elem(&htab_array, lookup_htab, NULL, 0);
+ return 0;
+}
+
+SEC("tp/syscalls/sys_enter_getpgid")
+int qp_trie_lookup(void *ctx)
+{
+ bpf_for_each_map_elem(&trie_array, lookup_qp_trie, NULL, 0);
+ return 0;
+}
+
+SEC("tp/syscalls/sys_enter_getpgid")
+int htab_update(void *ctx)
+{
+ unsigned int index = bpf_get_smp_processor_id() * update_chunk;
+ struct update_ctx update;
+
+ update.max = update_nr;
+ if (update.max && index >= update.max)
+ index %= update.max;
+
+ /* Only operate part of keys according to cpu id */
+ update.from = index;
+ bpf_loop(update_chunk, update_htab_loop, &update, 0);
+
+ update.from = index;
+ bpf_loop(update_chunk, delete_htab_loop, &update, 0);
+
+ return 0;
+}
+
+SEC("tp/syscalls/sys_enter_getpgid")
+int qp_trie_update(void *ctx)
+{
+ unsigned int index = bpf_get_smp_processor_id() * update_chunk;
+ struct update_ctx update;
+
+ update.max = update_nr;
+ if (update.max && index >= update.max)
+ index %= update.max;
+
+ /* Only operate part of keys according to cpu id */
+ update.from = index;
+ bpf_loop(update_chunk, update_qp_trie_loop, &update, 0);
+
+ update.from = index;
+ bpf_loop(update_chunk, delete_qp_trie_loop, &update, 0);
+
+ return 0;
+}
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf-next 10/10] selftests/bpf: Add tests for qp-trie map by using bpf syscalls
2022-09-17 15:31 [PATCH bpf-next 00/10] Add support for qp-trie map Hou Tao
` (8 preceding siblings ...)
2022-09-17 15:31 ` [PATCH bpf-next 09/10] selftests/bpf: Add benchmark for qp-trie map Hou Tao
@ 2022-09-17 15:31 ` Hou Tao
9 siblings, 0 replies; 13+ messages in thread
From: Hou Tao @ 2022-09-17 15:31 UTC (permalink / raw)
To: bpf, Andrii Nakryiko
Cc: Song Liu, Hao Luo, Yonghong Song, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, KP Singh, David S . Miller,
Jakub Kicinski, Stanislav Fomichev, Jiri Olsa, John Fastabend,
Lorenz Bauer, Paul E . McKenney, houtao1
From: Hou Tao <houtao1@huawei.com>
Add test cases for creation, lookup, update, deletion and iteration on
qp-trie. Also checking the returned keys during iterations are ordered.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
.../selftests/bpf/map_tests/qp_trie_map.c | 1165 +++++++++++++++++
1 file changed, 1165 insertions(+)
create mode 100644 tools/testing/selftests/bpf/map_tests/qp_trie_map.c
diff --git a/tools/testing/selftests/bpf/map_tests/qp_trie_map.c b/tools/testing/selftests/bpf/map_tests/qp_trie_map.c
new file mode 100644
index 000000000000..27d0805c3049
--- /dev/null
+++ b/tools/testing/selftests/bpf/map_tests/qp_trie_map.c
@@ -0,0 +1,1165 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <unistd.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <endian.h>
+#include <limits.h>
+#include <time.h>
+#include <pthread.h>
+#include <linux/btf.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include <test_btf.h>
+#include <test_maps.h>
+
+#include "bpf_util.h"
+
+#define QP_TRIE_KEY_SIZE sizeof(struct bpf_dynptr)
+#define QP_TRIE_DFT_MAX_KEY_LEN 4
+#define QP_TRIE_DFT_VAL_SIZE 4
+#define QP_TRIE_DFT_MAP_FLAGS (BPF_F_NO_PREALLOC | BPF_F_DYNPTR_KEY)
+
+#define QP_TRIE_DFT_BTF_KEY_ID 1
+#define QP_TRIE_DFT_BTF_VAL_ID 2
+
+struct qp_trie_create_case {
+ const char *name;
+ int error;
+ unsigned int map_flags;
+ unsigned int max_key_len;
+ unsigned int value_size;
+ unsigned int max_entries;
+ unsigned int btf_key_type_id;
+ unsigned int btf_value_type_id;
+};
+
+struct qp_trie_bytes_key {
+ unsigned int len;
+ unsigned char data[4];
+};
+
+struct qp_trie_int_key {
+ unsigned int len;
+ unsigned int data;
+};
+
+enum {
+ UPDATE_OP = 0,
+ DELETE_OP,
+ LOOKUP_OP,
+ ITERATE_OP,
+ MAX_OP,
+};
+
+struct stress_conf {
+ unsigned int threads[MAX_OP];
+ unsigned int max_key_len;
+ unsigned int loop;
+ unsigned int nr;
+};
+
+struct qp_trie_rw_ctx {
+ unsigned int nr;
+ unsigned int max_key_len;
+ int fd;
+ struct bpf_dynptr_user *set;
+ unsigned int loop;
+ unsigned int nr_delete;
+};
+
+static int qp_trie_load_btf(void)
+{
+ const char btf_str_sec[] = "\0bpf_dynptr\0qp_test_key";
+ __u32 btf_raw_types[] = {
+ /* struct bpf_dynptr */ /* [1] */
+ BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 0), 16),
+ /* unsigned int */ /* [2] */
+ BTF_TYPE_INT_ENC(0, 0, 0, 32, 4),
+ /* struct qp_test_key */ /* [3] */
+ BTF_TYPE_ENC(12, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, 0), 16),
+ };
+ struct btf_header btf_hdr = {
+ .magic = BTF_MAGIC,
+ .version = BTF_VERSION,
+ .hdr_len = sizeof(struct btf_header),
+ .type_len = sizeof(btf_raw_types),
+ .str_off = sizeof(btf_raw_types),
+ .str_len = sizeof(btf_str_sec),
+ };
+ __u8 raw_btf[sizeof(struct btf_header) + sizeof(btf_raw_types) +
+ sizeof(btf_str_sec)];
+
+ memcpy(raw_btf, &btf_hdr, sizeof(btf_hdr));
+ memcpy(raw_btf + sizeof(btf_hdr), btf_raw_types, sizeof(btf_raw_types));
+ memcpy(raw_btf + sizeof(btf_hdr) + sizeof(btf_raw_types),
+ btf_str_sec, sizeof(btf_str_sec));
+
+ return bpf_btf_load(raw_btf, sizeof(raw_btf), NULL);
+}
+
+struct qp_trie_create_case create_cases[] = {
+ {
+ .name = "tiny qp-trie",
+ .error = 0,
+ .map_flags = QP_TRIE_DFT_MAP_FLAGS,
+ .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN,
+ .value_size = QP_TRIE_DFT_VAL_SIZE,
+ .max_entries = 1,
+ .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID,
+ .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID,
+ },
+ {
+ .name = "empty qp-trie",
+ .error = -EINVAL,
+ .map_flags = QP_TRIE_DFT_MAP_FLAGS,
+ .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN,
+ .value_size = QP_TRIE_DFT_VAL_SIZE,
+ .max_entries = 0,
+ .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID,
+ .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID,
+ },
+ {
+ .name = "preallocated qp-trie",
+ .error = -EINVAL,
+ .map_flags = BPF_F_DYNPTR_KEY,
+ .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN,
+ .value_size = QP_TRIE_DFT_VAL_SIZE,
+ .max_entries = 1,
+ .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID,
+ .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID,
+ },
+ {
+ .name = "fixed-size key qp-trie",
+ .error = -EINVAL,
+ .map_flags = BPF_F_NO_PREALLOC,
+ .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN,
+ .value_size = QP_TRIE_DFT_VAL_SIZE,
+ .max_entries = 1,
+ .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID,
+ .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID,
+ },
+ {
+ .name = "mmapable qp-trie",
+ .error = -EINVAL,
+ .map_flags = QP_TRIE_DFT_MAP_FLAGS | BPF_F_MMAPABLE,
+ .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN,
+ .value_size = QP_TRIE_DFT_VAL_SIZE,
+ .max_entries = 1,
+ .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID,
+ .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID,
+ },
+ {
+ .name = "no btf qp-trie",
+ .error = -EINVAL,
+ .map_flags = QP_TRIE_DFT_MAP_FLAGS,
+ .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN,
+ .value_size = QP_TRIE_DFT_VAL_SIZE,
+ .max_entries = 1,
+ .btf_key_type_id = 0,
+ .btf_value_type_id = 0,
+ },
+ {
+ .name = "qp_test_key qp-trie",
+ .error = -EINVAL,
+ .map_flags = QP_TRIE_DFT_MAP_FLAGS,
+ .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN,
+ .value_size = QP_TRIE_DFT_VAL_SIZE,
+ .max_entries = 1,
+ .btf_key_type_id = 3,
+ .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID,
+ },
+ {
+ .name = "zero max key len qp-trie",
+ .error = -EINVAL,
+ .map_flags = QP_TRIE_DFT_MAP_FLAGS,
+ .max_key_len = 0,
+ .value_size = QP_TRIE_DFT_VAL_SIZE,
+ .max_entries = 1,
+ .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID,
+ .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID,
+ },
+ {
+ .name = "big k-v size qp-trie",
+ .error = -E2BIG,
+ .map_flags = QP_TRIE_DFT_MAP_FLAGS,
+ .max_key_len = QP_TRIE_DFT_MAX_KEY_LEN,
+ .value_size = 128 << 20,
+ .max_entries = 1,
+ .btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID,
+ .btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID,
+ },
+};
+
+static void test_qp_trie_create(void)
+{
+ unsigned int i;
+ int btf_fd;
+
+ btf_fd = qp_trie_load_btf();
+ CHECK(btf_fd < 0, "load btf", "error %d\n", btf_fd);
+
+ for (i = 0; i < ARRAY_SIZE(create_cases); i++) {
+ LIBBPF_OPTS(bpf_map_create_opts, opts);
+ int fd;
+
+ opts.map_flags = create_cases[i].map_flags;
+ opts.btf_fd = btf_fd;
+ opts.btf_key_type_id = create_cases[i].btf_key_type_id;
+ opts.btf_value_type_id = create_cases[i].btf_value_type_id;
+ opts.map_extra = create_cases[i].max_key_len;
+ fd = bpf_map_create(BPF_MAP_TYPE_QP_TRIE, "qp_trie", QP_TRIE_KEY_SIZE,
+ create_cases[i].value_size, create_cases[i].max_entries, &opts);
+ if (!create_cases[i].error) {
+ CHECK(fd < 0, create_cases[i].name, "error %d\n", fd);
+ close(fd);
+ } else {
+ CHECK(fd != create_cases[i].error, create_cases[i].name,
+ "expect error %d got %d\n", create_cases[i].error, fd);
+ }
+ }
+
+ close(btf_fd);
+}
+
+static int qp_trie_create(unsigned int max_key_len, unsigned int value_size, unsigned int max_entries)
+{
+ LIBBPF_OPTS(bpf_map_create_opts, opts);
+ int btf_fd, map_fd;
+
+ btf_fd = qp_trie_load_btf();
+ CHECK(btf_fd < 0, "load btf", "error %d\n", btf_fd);
+
+ opts.map_flags = QP_TRIE_DFT_MAP_FLAGS;
+ opts.btf_fd = btf_fd;
+ opts.btf_key_type_id = QP_TRIE_DFT_BTF_KEY_ID;
+ opts.btf_value_type_id = QP_TRIE_DFT_BTF_VAL_ID;
+ opts.map_extra = max_key_len;
+ map_fd = bpf_map_create(BPF_MAP_TYPE_QP_TRIE, "qp_trie", QP_TRIE_KEY_SIZE, value_size,
+ max_entries, &opts);
+ CHECK(map_fd < 0, "bpf_map_create", "error %d\n", map_fd);
+
+ close(btf_fd);
+
+ return map_fd;
+}
+
+static void test_qp_trie_bad_update(void)
+{
+ struct bpf_dynptr_user dynptr;
+ unsigned int key, value;
+ u64 big_key;
+ int fd, err;
+
+ fd = qp_trie_create(sizeof(key), sizeof(value), 1);
+
+ /* Invalid flags (Error) */
+ key = 0;
+ value = 0;
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST | BPF_EXIST);
+ CHECK(err != -EINVAL, "invalid update flag", "error %d\n", err);
+
+ /* Invalid key len (Error) */
+ big_key = 1;
+ value = 1;
+ bpf_dynptr_user_init(&big_key, sizeof(big_key), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, 0);
+ CHECK(err != -EINVAL, "invalid data len", "error %d\n", err);
+
+ /* Iterate an empty qp-trie (Error) */
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ err = bpf_map_get_next_key(fd, NULL, &dynptr);
+ CHECK(err != -ENOENT, "non-empty qp-trie", "error %d\n", err);
+
+ /* Overwrite an empty qp-trie (Error) */
+ key = 2;
+ value = 2;
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_EXIST);
+ CHECK(err != -ENOENT, "overwrite empty qp-trie", "error %d\n", err);
+
+ /* Iterate an empty qp-trie (Error) */
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ err = bpf_map_get_next_key(fd, NULL, &dynptr);
+ CHECK(err != -ENOENT, "non-empty qp-trie", "error %d\n", err);
+
+ close(fd);
+}
+
+static void test_qp_trie_bad_lookup_delete(void)
+{
+ struct bpf_dynptr_user dynptr;
+ unsigned int key, value;
+ int fd, err;
+
+ fd = qp_trie_create(sizeof(key), sizeof(value), 2);
+
+ /* Lookup/Delete non-existent key (Error) */
+ key = 0;
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ err = bpf_map_delete_elem(fd, &dynptr);
+ CHECK(err != -ENOENT, "del non-existent key", "error %d\n", err);
+ err = bpf_map_lookup_elem(fd, &dynptr, &value);
+ CHECK(err != -ENOENT, "lookup non-existent key", "error %d\n", err);
+
+ key = 0;
+ value = 2;
+ bpf_dynptr_user_init(&key, 2, &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ CHECK(err, "add elem", "error %d\n", err);
+
+ key = 0;
+ value = 4;
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ CHECK(err, "add elem", "error %d\n", err);
+
+ /*
+ * Lookup/Delete non-existent key, although it is the prefix of
+ * existent keys (Error)
+ */
+ key = 0;
+ bpf_dynptr_user_init(&key, 1, &dynptr);
+ err = bpf_map_delete_elem(fd, &dynptr);
+ CHECK(err != -ENOENT, "del non-existent key", "error %d\n", err);
+ err = bpf_map_lookup_elem(fd, &dynptr, &value);
+ CHECK(err != -ENOENT, "lookup non-existent key", "error %d\n", err);
+
+ /* Lookup/Delete non-existent key, although its prefix exists (Error) */
+ key = 0;
+ bpf_dynptr_user_init(&key, 3, &dynptr);
+ err = bpf_map_delete_elem(fd, &dynptr);
+ CHECK(err != -ENOENT, "del non-existent key", "error %d\n", err);
+ err = bpf_map_lookup_elem(fd, &dynptr, &value);
+ CHECK(err != -ENOENT, "lookup non-existent key", "error %d\n", err);
+
+ close(fd);
+}
+
+static int cmp_str(const void *a, const void *b)
+{
+ const char *str_a = *(const char **)a, *str_b = *(const char **)b;
+
+ return strcmp(str_a, str_b);
+}
+
+static void test_qp_trie_one_subtree_update(void)
+{
+ const char *keys[] = {
+ "ab", "abc", "abo", "abS", "abcd",
+ };
+ const char *sorted_keys[ARRAY_SIZE(keys)];
+ unsigned int value, got, i, j;
+ struct bpf_dynptr_user dynptr;
+ struct bpf_dynptr_user *cur;
+ char data[4];
+ int fd, err;
+
+ fd = qp_trie_create(4, sizeof(value), ARRAY_SIZE(keys));
+
+ for (i = 0; i < ARRAY_SIZE(keys); i++) {
+ unsigned int flags;
+
+ /* Add i-th element */
+ flags = i % 2 ? BPF_NOEXIST : 0;
+ bpf_dynptr_user_init((void *)keys[i], strlen(keys[i]), &dynptr);
+ value = i + 100;
+ err = bpf_map_update_elem(fd, &dynptr, &value, flags);
+ CHECK(err, "add elem", "#%u error %d\n", i, err);
+
+ err = bpf_map_lookup_elem(fd, &dynptr, &got);
+ CHECK(err, "lookup elem", "#%u error %d\n", i, err);
+ CHECK(got != value, "lookup elem", "#%u expect %u got %u\n", i, value, got);
+
+ /* Re-add i-th element (Error) */
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ CHECK(err != -EEXIST, "re-add elem", "#%u error %d\n", i, err);
+
+ /* Overwrite i-th element */
+ flags = i % 2 ? 0 : BPF_EXIST;
+ value = i;
+ err = bpf_map_update_elem(fd, &dynptr, &value, flags);
+ CHECK(err, "update elem", "error %d\n", err);
+
+ /* Lookup #[0~i] elements */
+ for (j = 0; j <= i; j++) {
+ bpf_dynptr_user_init((void *)keys[j], strlen(keys[j]), &dynptr);
+ err = bpf_map_lookup_elem(fd, &dynptr, &got);
+ CHECK(err, "lookup elem", "#%u/%u error %d\n", i, j, err);
+ CHECK(got != j, "lookup elem", "#%u/%u expect %u got %u\n", i, j, value, got);
+ }
+ }
+
+ /* Add element to a full qp-trie (Error) */
+ memset(data, 0, sizeof(data));
+ bpf_dynptr_user_init(&data, sizeof(data), &dynptr);
+ value = 0;
+ err = bpf_map_update_elem(fd, &dynptr, &value, 0);
+ CHECK(err != -ENOSPC, "add to full qp-trie", "error %d\n", err);
+
+ /* Iterate sorted elements */
+ cur = NULL;
+ memcpy(sorted_keys, keys, sizeof(keys));
+ qsort(sorted_keys, ARRAY_SIZE(sorted_keys), sizeof(sorted_keys[0]), cmp_str);
+ bpf_dynptr_user_init(data, sizeof(data), &dynptr);
+ for (i = 0; i < ARRAY_SIZE(sorted_keys); i++) {
+ unsigned int len;
+ char *got;
+
+ len = strlen(sorted_keys[i]);
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err, "iterate", "#%u error %d\n", i, err);
+ CHECK(bpf_dynptr_user_get_size(&dynptr) != len, "iterate",
+ "#%u invalid len %u expect %u\n",
+ i, bpf_dynptr_user_get_size(&dynptr), len);
+ got = bpf_dynptr_user_get_data(&dynptr);
+ CHECK(memcmp(sorted_keys[i], got, len), "iterate",
+ "#%u got %.*s exp %.*s\n", i, len, got, len, sorted_keys[i]);
+
+ if (!cur)
+ cur = &dynptr;
+ }
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err != -ENOENT, "more element", "error %d\n", err);
+
+ /* Delete all elements */
+ for (i = 0; i < ARRAY_SIZE(keys); i++) {
+ bpf_dynptr_user_init((void *)keys[i], strlen(keys[i]), &dynptr);
+ err = bpf_map_delete_elem(fd, &dynptr);
+ CHECK(err, "del elem", "#%u elem error %d\n", i, err);
+
+ /* Lookup deleted element (Error) */
+ err = bpf_map_lookup_elem(fd, &dynptr, &got);
+ CHECK(err != -ENOENT, "lookup elem", "#%u error %d\n", i, err);
+
+ /* Lookup #(i~N] elements */
+ for (j = i + 1; j < ARRAY_SIZE(keys); j++) {
+ bpf_dynptr_user_init((void *)keys[j], strlen(keys[j]), &dynptr);
+ err = bpf_map_lookup_elem(fd, &dynptr, &got);
+ CHECK(err, "lookup elem", "#%u/%u error %d\n", i, j, err);
+ CHECK(got != j, "lookup elem", "#%u/%u expect %u got %u\n", i, j, value, got);
+ }
+ }
+
+ memset(data, 0, sizeof(data));
+ bpf_dynptr_user_init(&data, sizeof(data), &dynptr);
+ err = bpf_map_get_next_key(fd, NULL, &dynptr);
+ CHECK(err != -ENOENT, "non-empty qp-trie", "error %d\n", err);
+
+ close(fd);
+}
+
+static void test_qp_trie_all_subtree_update(void)
+{
+ unsigned int i, max_entries, key, value, got;
+ struct bpf_dynptr_user dynptr;
+ struct bpf_dynptr_user *cur;
+ int fd, err;
+
+ /* 16 elements per subtree */
+ max_entries = 256 * 16;
+ fd = qp_trie_create(sizeof(key), sizeof(value), max_entries);
+
+ for (i = 0; i < max_entries; i++) {
+ key = htole32(i);
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ value = i;
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ CHECK(err, "add elem", "#%u error %d\n", i, err);
+
+ err = bpf_map_lookup_elem(fd, &dynptr, &got);
+ CHECK(err, "lookup elem", "#%u elem error %d\n", i, err);
+ CHECK(got != value, "lookup elem", "#%u expect %u got %u\n", i, value, got);
+ }
+
+ /* Add element to a full qp-trie (Error) */
+ key = htole32(max_entries + 1);
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ value = 0;
+ err = bpf_map_update_elem(fd, &dynptr, &value, 0);
+ CHECK(err != -ENOSPC, "add to full qp-trie", "error %d\n", err);
+
+ /* Iterate all elements */
+ cur = NULL;
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ for (i = 0; i < max_entries; i++) {
+ unsigned int *data;
+ unsigned int exp;
+
+ exp = htole32((i / 16) | ((i & 0xf) << 8));
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err, "iterate", "#%u error %d\n", i, err);
+ CHECK(bpf_dynptr_user_get_size(&dynptr) != 4, "iterate",
+ "#%u invalid len %u\n", i, bpf_dynptr_user_get_size(&dynptr));
+ data = bpf_dynptr_user_get_data(&dynptr);
+ CHECK(data != &key, "dynptr data", "#%u got %p exp %p\n", i, data, &key);
+ CHECK(key != exp, "iterate", "#%u got %u exp %u\n", i, key, exp);
+
+ if (!cur)
+ cur = &dynptr;
+ }
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err != -ENOENT, "more element", "error %d\n", err);
+
+ /* Delete all elements */
+ i = max_entries;
+ while (i-- > 0) {
+ key = i;
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ err = bpf_map_delete_elem(fd, &dynptr);
+ CHECK(err, "del elem", "#%u error %d\n", i, err);
+
+ /* Lookup deleted element (Error) */
+ err = bpf_map_lookup_elem(fd, &dynptr, &got);
+ CHECK(err != -ENOENT, "lookup elem", "#%u error %d\n", i, err);
+ }
+
+ bpf_dynptr_user_init(&key, sizeof(key), &dynptr);
+ err = bpf_map_get_next_key(fd, NULL, &dynptr);
+ CHECK(err != -ENOENT, "non-empty qp-trie", "error %d\n", err);
+
+ close(fd);
+}
+
+static int binary_insert_data(unsigned int *set, unsigned int nr, unsigned int data)
+{
+ int begin = 0, end = nr - 1, mid, i;
+
+ while (begin <= end) {
+ mid = begin + (end - begin) / 2;
+ if (data == set[mid])
+ return -1;
+ if (data > set[mid])
+ begin = mid + 1;
+ else
+ end = mid - 1;
+ }
+
+ /* Move [begin, nr) backwards and insert new item at begin */
+ i = nr - 1;
+ while (i >= begin) {
+ set[i + 1] = set[i];
+ i--;
+ }
+ set[begin] = data;
+
+ return 0;
+}
+
+/* UINT_MAX will not be in the returned data set */
+static unsigned int *gen_random_unique_data_set(unsigned int max_entries)
+{
+ unsigned int *data_set;
+ unsigned int i, data;
+
+ data_set = malloc(sizeof(*data_set) * max_entries);
+ CHECK(!data_set, "malloc", "no mem");
+
+ for (i = 0; i < max_entries; i++) {
+ while (true) {
+ data = random() % UINT_MAX;
+ if (!binary_insert_data(data_set, i, data))
+ break;
+ }
+ }
+
+ return data_set;
+}
+
+static int cmp_be32(const void *l, const void *r)
+{
+ unsigned int a = htobe32(*(unsigned int *)l), b = htobe32(*(unsigned int *)r);
+
+ if (a < b)
+ return -1;
+ if (a > b)
+ return 1;
+ return 0;
+}
+
+static void test_qp_trie_rdonly_iterate(void)
+{
+ unsigned int i, max_entries, value, data, len;
+ struct bpf_dynptr_user dynptr;
+ struct bpf_dynptr_user *cur;
+ unsigned int *data_set;
+ int fd, err;
+
+ max_entries = 4096;
+ data_set = gen_random_unique_data_set(max_entries);
+ qsort(data_set, max_entries, sizeof(*data_set), cmp_be32);
+
+ fd = qp_trie_create(sizeof(*data_set), sizeof(value), max_entries);
+ value = 1;
+ for (i = 0; i < max_entries; i++) {
+ bpf_dynptr_user_init(&data_set[i], sizeof(data_set[i]), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, 0);
+ CHECK(err, "add elem", "#%u error %d\n", i, err);
+ }
+
+ /* Iteration results are big-endian ordered */
+ cur = NULL;
+ bpf_dynptr_user_init(&data, sizeof(data), &dynptr);
+ for (i = 0; i < max_entries; i++) {
+ unsigned int *got;
+
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err, "iterate", "#%u error %d\n", i, err);
+
+ got = bpf_dynptr_user_get_data(&dynptr);
+ len = bpf_dynptr_user_get_size(&dynptr);
+ CHECK(len != 4, "iterate", "#%u invalid len %u\n", i, len);
+ CHECK(got != &data, "iterate", "#%u invalid dynptr got %p exp %p\n", i, got, &data);
+ CHECK(*got != data_set[i], "iterate", "#%u got 0x%x exp 0x%x\n",
+ i, *got, data_set[i]);
+ cur = &dynptr;
+ }
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err != -ENOENT, "more element", "error %d\n", err);
+
+ /* Iterate from non-existent key */
+ data = htobe32(UINT_MAX);
+ bpf_dynptr_user_init(&data, sizeof(data), &dynptr);
+ err = bpf_map_get_next_key(fd, &dynptr, &dynptr);
+ CHECK(err, "iterate from non-existent", "error %d\n", err);
+ len = bpf_dynptr_user_get_size(&dynptr);
+ CHECK(len != 4, "iterate", "invalid len %u\n", len);
+ CHECK(data != data_set[0], "iterate", "got 0x%x exp 0x%x\n",
+ data, data_set[0]);
+
+ free(data_set);
+
+ close(fd);
+}
+
+/*
+ * Delete current key (also the smallest key) after iteration, the next
+ * iteration will return the second smallest key, so the iteration result
+ * is still ordered.
+ */
+static void test_qp_trie_iterate_then_delete(void)
+{
+ unsigned int i, max_entries, value, data, len;
+ struct bpf_dynptr_user dynptr;
+ struct bpf_dynptr_user *cur;
+ unsigned int *data_set;
+ int fd, err;
+
+ max_entries = 4096;
+ data_set = gen_random_unique_data_set(max_entries);
+ qsort(data_set, max_entries, sizeof(*data_set), cmp_be32);
+
+ fd = qp_trie_create(sizeof(*data_set), sizeof(value), max_entries);
+ value = 1;
+ for (i = 0; i < max_entries; i++) {
+ bpf_dynptr_user_init(&data_set[i], sizeof(data_set[i]), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ CHECK(err, "add elem", "#%u error %d\n", i, err);
+ }
+
+ /* Iteration results are big-endian ordered */
+ cur = NULL;
+ bpf_dynptr_user_init(&data, sizeof(data), &dynptr);
+ for (i = 0; i < max_entries; i++) {
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err, "iterate", "#%u error %d\n", i, err);
+
+ len = bpf_dynptr_user_get_size(&dynptr);
+ CHECK(len != 4, "iterate", "#%u invalid len %u\n", i, len);
+ CHECK(data != data_set[i], "iterate", "#%u got 0x%x exp 0x%x\n",
+ i, data, data_set[i]);
+ cur = &dynptr;
+
+ /*
+ * Delete the mininal key, next call of bpf_get_next_key() will
+ * return the second minimal key.
+ */
+ err = bpf_map_delete_elem(fd, &dynptr);
+ CHECK(err, "del elem", "#%u elem error %d\n", i, err);
+ }
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err != -ENOENT, "more element", "error %d\n", err);
+
+ err = bpf_map_get_next_key(fd, NULL, &dynptr);
+ CHECK(err != -ENOENT, "no-empty qp-trie", "error %d\n", err);
+
+ free(data_set);
+
+ close(fd);
+}
+
+/* The range is half-closed: [from, to) */
+static void delete_random_keys_in_range(int fd, unsigned int *data_set,
+ unsigned int from, unsigned int to)
+{
+ unsigned int del_from, del_to;
+
+ if (from >= to)
+ return;
+
+ del_from = random() % (to - from) + from;
+ del_to = random() % (to - del_from) + del_from;
+ for (; del_from <= del_to; del_from++) {
+ struct bpf_dynptr_user dynptr;
+ int err;
+
+ /* Skip deleted keys */
+ if (data_set[del_from] == UINT_MAX)
+ continue;
+
+ bpf_dynptr_user_init(&data_set[del_from], sizeof(data_set[del_from]), &dynptr);
+ err = bpf_map_delete_elem(fd, &dynptr);
+ CHECK(err, "del elem", "#%u range %u-%u error %d\n", del_from, from, to, err);
+ data_set[del_from] = UINT_MAX;
+ }
+}
+
+/* Delete keys randomly and ensure the iteration returns the expected data */
+static void test_qp_trie_iterate_then_batch_delete(void)
+{
+ unsigned int i, max_entries, value, data, len;
+ struct bpf_dynptr_user dynptr;
+ struct bpf_dynptr_user *cur;
+ unsigned int *data_set;
+ int fd, err;
+
+ max_entries = 8192;
+ data_set = gen_random_unique_data_set(max_entries);
+ qsort(data_set, max_entries, sizeof(*data_set), cmp_be32);
+
+ fd = qp_trie_create(sizeof(*data_set), sizeof(value), max_entries);
+ value = 1;
+ for (i = 0; i < max_entries; i++) {
+ bpf_dynptr_user_init(&data_set[i], sizeof(data_set[i]), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ CHECK(err, "add elem", "#%u error %d\n", i, err);
+ }
+
+ cur = NULL;
+ bpf_dynptr_user_init(&data, sizeof(data), &dynptr);
+ for (i = 0; i < max_entries; i++) {
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err, "iterate", "#%u error %d\n", i, err);
+
+ len = bpf_dynptr_user_get_size(&dynptr);
+ CHECK(len != 4, "iterate", "#%u invalid len %u\n", i, len);
+ CHECK(data != data_set[i], "iterate", "#%u got 0x%x exp 0x%x\n",
+ i, data, data_set[i]);
+ cur = &dynptr;
+
+ /* Delete some keys from iterated keys */
+ delete_random_keys_in_range(fd, data_set, 0, i);
+
+ /* Skip deleted keys */
+ while (i + 1 < max_entries) {
+ if (data_set[i + 1] != UINT_MAX)
+ break;
+ i++;
+ }
+
+ /* Delete some keys from to-iterate keys */
+ delete_random_keys_in_range(fd, data_set, i + 1, max_entries);
+
+ /* Skip deleted keys */
+ while (i + 1 < max_entries) {
+ if (data_set[i + 1] != UINT_MAX)
+ break;
+ i++;
+ }
+ }
+ err = bpf_map_get_next_key(fd, cur, &dynptr);
+ CHECK(err != -ENOENT, "more element", "error %d\n", err);
+
+ free(data_set);
+
+ close(fd);
+}
+
+/*
+ * Add keys with odd index first and add keys with even index during iteration.
+ * Check whether or not the whole key set is returned by iteration procedure.
+ */
+static void test_qp_trie_iterate_then_add(void)
+{
+ unsigned int i, max_entries, value, data, len;
+ struct bpf_dynptr_user dynptr, next_key;
+ struct bpf_dynptr_user *cur;
+ unsigned int *data_set;
+ int fd, err;
+
+ max_entries = 8192;
+ data_set = gen_random_unique_data_set(max_entries);
+ qsort(data_set, max_entries, sizeof(*data_set), cmp_be32);
+
+ fd = qp_trie_create(sizeof(*data_set), sizeof(value), max_entries);
+ value = 1;
+ for (i = 0; i < max_entries; i++) {
+ if (i & 1)
+ continue;
+
+ bpf_dynptr_user_init(&data_set[i], sizeof(data_set[i]), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ CHECK(err, "add elem", "#%u error %d\n", i, err);
+ }
+
+ /* Iteration results are big-endian ordered */
+ cur = NULL;
+ bpf_dynptr_user_init(&data, sizeof(data), &next_key);
+ for (i = 0; i < max_entries; i++) {
+ err = bpf_map_get_next_key(fd, cur, &next_key);
+ CHECK(err, "iterate", "#%u error %d\n", i, err);
+
+ len = bpf_dynptr_user_get_size(&next_key);
+ CHECK(len != 4, "iterate", "#%u invalid len %u\n", i, len);
+ CHECK(data != data_set[i], "iterate", "#%u got 0x%x exp 0x%x\n",
+ i, data, data_set[i]);
+ cur = &next_key;
+
+ if ((i & 1) || i + 1 >= max_entries)
+ continue;
+
+ /* Add key with odd index which be returned in next iteration */
+ bpf_dynptr_user_init(&data_set[i + 1], sizeof(data_set[i + 1]), &dynptr);
+ err = bpf_map_update_elem(fd, &dynptr, &value, BPF_NOEXIST);
+ CHECK(err, "add elem", "#%u error %d\n", i + 1, err);
+ }
+ err = bpf_map_get_next_key(fd, cur, &next_key);
+ CHECK(err != -ENOENT, "more element", "error %d\n", err);
+
+ free(data_set);
+
+ close(fd);
+}
+
+static int get_int_from_env(const char *key, int dft)
+{
+ const char *value = getenv(key);
+
+ if (!value)
+ return dft;
+ return atoi(value);
+}
+
+static void free_bytes_set(struct bpf_dynptr_user *set, unsigned int nr)
+{
+ unsigned int i;
+
+ for (i = 0; i < nr; i++)
+ free(bpf_dynptr_user_get_data(&set[i]));
+ free(set);
+}
+
+struct bpf_dynptr_user *generate_random_bytes_set(unsigned int max_key_len, unsigned int nr)
+{
+ struct bpf_dynptr_user *set;
+ unsigned int i;
+
+ set = malloc(nr * sizeof(*set));
+ CHECK(!set, "malloc", "no mem for set");
+
+ for (i = 0; i < nr; i++) {
+ unsigned char *data;
+ unsigned int len, j;
+
+ len = random() % max_key_len + 1;
+ data = malloc(len);
+ CHECK(!data, "maloc", "no mem for data");
+
+ j = 0;
+ while (j + 4 <= len) {
+ unsigned int rnd = random();
+
+ memcpy(&data[j], &rnd, sizeof(rnd));
+ j += 4;
+ }
+ while (j < len)
+ data[j++] = random();
+
+ bpf_dynptr_user_init(data, len, &set[i]);
+ }
+
+ return set;
+}
+
+static struct bpf_dynptr_user *alloc_dynptr_user(unsigned int len)
+{
+ struct bpf_dynptr_user *dynptr;
+
+ dynptr = malloc(sizeof(*dynptr) + len);
+ if (!dynptr)
+ return NULL;
+
+ bpf_dynptr_user_init(&dynptr[1], len, dynptr);
+
+ return dynptr;
+}
+
+static int cmp_dynptr_user(const struct bpf_dynptr_user *a, const struct bpf_dynptr_user *b)
+{
+ unsigned int a_len = bpf_dynptr_user_get_size(a), b_len = bpf_dynptr_user_get_size(b);
+ unsigned int cmp = a_len < b_len ? a_len : b_len;
+ int ret;
+
+ ret = memcmp(bpf_dynptr_user_get_data(a), bpf_dynptr_user_get_data(b), cmp);
+ if (ret)
+ return ret;
+ return a_len - b_len;
+}
+
+static void dump_dynptr_user(const char *name, const struct bpf_dynptr_user *ptr)
+{
+ unsigned char *data = bpf_dynptr_user_get_data(ptr);
+ unsigned int i, len = bpf_dynptr_user_get_size(ptr);
+
+ fprintf(stderr, "%s dynptr len %u data %p\n", name, len, data);
+
+ for (i = 0; i < len; i++) {
+ fprintf(stderr, "%02x ", data[i]);
+ if (i % 16 == 15)
+ fprintf(stderr, "\n");
+ }
+ fprintf(stderr, "\n");
+}
+
+static void copy_and_reset_dynptr_user(struct bpf_dynptr_user *dst_ptr,
+ struct bpf_dynptr_user *src_ptr, unsigned int reset_len)
+{
+ unsigned char *dst = bpf_dynptr_user_get_data(dst_ptr);
+ unsigned char *src = bpf_dynptr_user_get_data(src_ptr);
+ unsigned int src_len = bpf_dynptr_user_get_size(src_ptr);
+
+ memcpy(dst, src, src_len);
+ bpf_dynptr_user_init(dst, src_len, dst_ptr);
+ bpf_dynptr_user_init(src, reset_len, src_ptr);
+}
+
+static void *update_fn(void *arg)
+{
+ const struct qp_trie_rw_ctx *ctx = arg;
+ unsigned int i, j;
+
+ for (i = 0; i < ctx->loop; i++) {
+ for (j = 0; j < ctx->nr; j++) {
+ unsigned int value;
+ int err;
+
+ value = bpf_dynptr_user_get_size(&ctx->set[i]);
+ err = bpf_map_update_elem(ctx->fd, &ctx->set[i], &value, BPF_ANY);
+ if (err) {
+ fprintf(stderr, "update #%u element error %d\n", j, err);
+ return (void *)(long)err;
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static void *delete_fn(void *arg)
+{
+ const struct qp_trie_rw_ctx *ctx = arg;
+ unsigned int i, j;
+
+ for (i = 0; i < ctx->loop; i++) {
+ for (j = 0; j < ctx->nr; j++) {
+ int err;
+
+ err = bpf_map_delete_elem(ctx->fd, &ctx->set[i]);
+ if (err && err != -ENOENT) {
+ fprintf(stderr, "delete #%u element error %d\n", j, err);
+ return (void *)(long)err;
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static void *lookup_fn(void *arg)
+{
+ const struct qp_trie_rw_ctx *ctx = arg;
+ unsigned int i, j;
+
+ for (i = 0; i < ctx->loop; i++) {
+ for (j = 0; j < ctx->nr; j++) {
+ unsigned int got, value;
+ int err;
+
+ got = 0;
+ value = bpf_dynptr_user_get_size(&ctx->set[i]);
+ err = bpf_map_lookup_elem(ctx->fd, &ctx->set[i], &got);
+ if (!err && got != value) {
+ fprintf(stderr, "lookup #%u element got %u expected %u\n", j, got, value);
+ return (void *)(long)err;
+ } else if (err && err != -ENOENT) {
+ fprintf(stderr, "lookup #%u element error %d\n", j, err);
+ return (void *)(long)err;
+ }
+ }
+ }
+
+ return NULL;
+}
+
+static void *iterate_fn(void *arg)
+{
+ const struct qp_trie_rw_ctx *ctx = arg;
+ struct bpf_dynptr_user *key, *next_key;
+ unsigned int i;
+ int err;
+
+ key = NULL;
+ next_key = alloc_dynptr_user(ctx->max_key_len);
+ if (!next_key)
+ return (void *)(long)-ENOMEM;
+
+ err = 0;
+ for (i = 0; i < ctx->loop; i++) {
+ while (true) {
+ err = bpf_map_get_next_key(ctx->fd, key, next_key);
+ if (err < 0) {
+ if (err != -ENOENT) {
+ fprintf(stderr, "get key error %d\n", err);
+ goto out;
+ }
+ err = 0;
+ break;
+ }
+
+ /* If no deletion, next key should be greater than key */
+ if (!ctx->nr_delete && key && cmp_dynptr_user(key, next_key) >= 0) {
+ fprintf(stderr, "unordered iteration result\n");
+ dump_dynptr_user("previous key", key);
+ dump_dynptr_user("cur key", next_key);
+ err = -EINVAL;
+ goto out;
+ }
+
+ if (!key) {
+ key = alloc_dynptr_user(ctx->max_key_len);
+ if (!key) {
+ err = -ENOMEM;
+ goto out;
+ }
+ }
+
+ /* Copy next_key to key, and reset next_key */
+ copy_and_reset_dynptr_user(key, next_key, ctx->max_key_len);
+ }
+
+ free(key);
+ key = NULL;
+ }
+
+out:
+ free(key);
+ free(next_key);
+ return (void *)(long)err;
+}
+
+static void do_qp_trie_stress_test(const struct stress_conf *conf)
+{
+ void *(*fns[MAX_OP])(void *arg) = {
+ update_fn, delete_fn, lookup_fn, iterate_fn,
+ };
+ unsigned int created[MAX_OP];
+ struct qp_trie_rw_ctx ctx;
+ pthread_t *tids[MAX_OP];
+ unsigned int op, i, err;
+
+ ctx.nr = conf->nr;
+ ctx.max_key_len = conf->max_key_len;
+ ctx.fd = qp_trie_create(ctx.max_key_len, sizeof(unsigned int), ctx.nr);
+ ctx.set = generate_random_bytes_set(ctx.max_key_len, ctx.nr);
+ ctx.loop = conf->loop;
+ ctx.nr_delete = conf->threads[DELETE_OP];
+
+ /* Create threads */
+ for (op = 0; op < ARRAY_SIZE(tids); op++) {
+ if (!conf->threads[op]) {
+ tids[op] = NULL;
+ continue;
+ }
+
+ tids[op] = malloc(conf->threads[op] * sizeof(*tids[op]));
+ CHECK(!tids[op], "malloc", "no mem for op %u threads %u\n", op, conf->threads[op]);
+ }
+
+ for (op = 0; op < ARRAY_SIZE(tids); op++) {
+ for (i = 0; i < conf->threads[op]; i++) {
+ err = pthread_create(&tids[op][i], NULL, fns[op], &ctx);
+ if (err) {
+ fprintf(stderr, "create #%u thread for op %u error %d\n", i, op, err);
+ break;
+ }
+ }
+ created[op] = i;
+ }
+
+ err = 0;
+ for (op = 0; op < ARRAY_SIZE(tids); op++) {
+ for (i = 0; i < created[op]; i++) {
+ void *thread_err = NULL;
+
+ pthread_join(tids[op][i], &thread_err);
+ if (thread_err)
+ err |= 1 << op;
+ }
+ }
+ CHECK(err, "stress operation", "err %u\n", err);
+
+ for (op = 0; op < ARRAY_SIZE(tids); op++)
+ free(tids[op]);
+ free_bytes_set(ctx.set, ctx.nr);
+ close(ctx.fd);
+}
+
+static void test_qp_trie_stress(void)
+{
+ struct stress_conf conf;
+
+ memset(&conf, 0, sizeof(conf));
+
+ /* Test concurrently update, lookup and iterate operations. There is
+ * no deletion, so iteration can check the order of returned keys.
+ */
+ conf.threads[UPDATE_OP] = get_int_from_env("QP_TRIE_NR_UPDATE", 8);
+ conf.threads[LOOKUP_OP] = get_int_from_env("QP_TRIE_NR_LOOKUP", 8);
+ conf.threads[ITERATE_OP] = get_int_from_env("QP_TRIE_NR_ITERATE", 8);
+ conf.max_key_len = get_int_from_env("QP_TRIE_MAX_KEY_LEN", 256);
+ conf.loop = get_int_from_env("QP_TRIE_NR_LOOP", 32);
+ conf.nr = get_int_from_env("QP_TRIE_NR_DATA", 8192);
+ do_qp_trie_stress_test(&conf);
+
+ /* Add delete operation */
+ conf.threads[DELETE_OP] = get_int_from_env("QP_TRIE_NR_DELETE", 8);
+ do_qp_trie_stress_test(&conf);
+}
+
+void test_qp_trie_map(void)
+{
+ test_qp_trie_create();
+
+ test_qp_trie_bad_update();
+
+ test_qp_trie_bad_lookup_delete();
+
+ test_qp_trie_one_subtree_update();
+
+ test_qp_trie_all_subtree_update();
+
+ test_qp_trie_rdonly_iterate();
+
+ test_qp_trie_iterate_then_delete();
+
+ test_qp_trie_iterate_then_batch_delete();
+
+ test_qp_trie_iterate_then_add();
+
+ test_qp_trie_stress();
+
+ printf("%s:PASS\n", __func__);
+}
--
2.29.2
^ permalink raw reply related [flat|nested] 13+ messages in thread