bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab
@ 2022-02-14 11:13 Hou Tao
  2022-02-14 11:13 ` [RFC PATCH bpf-next v2 1/3] bpf: add support for string in hash table key Hou Tao
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Hou Tao @ 2022-02-14 11:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, John Fastabend, netdev, bpf, houtao1

Hi,

In order to use string as hash-table key, key_size must be the storage
size of longest string. If there are large differencies in string
length, the hash distribution will be sub-optimal due to the unused
zero bytes in shorter strings and the lookup will be inefficient due to
unnecessary memcmp().

Also it is possible the unused part of string key returned from bpf helper
(e.g. bpf_d_path) is not mem-zeroed and if using it directly as lookup key,
the lookup will fail with -ENOENT (as reported in [1]).

The patchset tries to address the inefficiency by adding support for
string key. There is extensibility problem in v1 because the string key
and its optimization is only available for string-only key. To make it
extensible, v2 introduces bpf_str_key_stor and bpf_str_key_desc and enforce
the layout of hash key struct through BTF as follows:

	>the start of hash key
	...
	[struct bpf_str_key_desc m;]
	...
	[struct bpf_str_key_desc n;]
	...
	struct bpf_str_key_stor z;
	unsigned char raw[N];
	>the end of hash key

So if there is string-only key, the struct of hash key will be:

	struct key {
		struct bpf_str_key_stor comm;
		unsigend char raw[128];
	};

And if there are other fields in hash key, the struct will be:

	struct key {
		int pid;
		struct bpf_str_key_stor comm;
		unsigned char raw[128];
	};

If there are multiple string in hash, the struct will become as:

	struct key {
		int pid;
		struct bpf_str_key_desc path;
		struct bpf_str_key_desc comm;
		unsigned char raw[128 + 128];
	};

See patch #1 and #3 for more details on how these key are manipulated and
used. Patch #2 adds a simple test to demonstrate how string key solves the
reported problem ([1]) due to unused part in hash key.

There are about 180% and 170% improvment in benchmark under x86-64 and
arm64 when key_size is 252. About 280% and %270 when key size is greater
than 512.

Also testing the performance improvment by using all files under linux
kernel sources as the string key input. There are about 74k files and the
maximum string length is 101. When key_size is 108, there are about 71%
and 39% win under x86-64 and arm64 in lookup performance, and when key_size
is 252, the win increases to 150% and 94% respectively.

The patchset is still in early stage of development, so any comments and
suggestions are always welcome.

Regards,
Tao

Change Log
v2:
  * make string key being extensible for no-string-only hash key

v1: https://lore.kernel.org/bpf/20211219052245.791605-1-houtao1@huawei.com/

[1]: https://lore.kernel.org/bpf/20211120051839.28212-2-yunbo.xufeng@linux.alibaba.com

Hou Tao (3):
  bpf: add support for string in hash table key
  selftests/bpf: add a simple test for htab str key
  selftests/bpf: add benchmark for string-key hash table

 include/linux/btf.h                           |   3 +
 include/uapi/linux/bpf.h                      |  19 +
 kernel/bpf/btf.c                              |  39 ++
 kernel/bpf/hashtab.c                          | 162 ++++++-
 tools/include/uapi/linux/bpf.h                |  19 +
 tools/testing/selftests/bpf/Makefile          |   4 +-
 tools/testing/selftests/bpf/bench.c           |  14 +
 .../selftests/bpf/benchs/bench_str_htab.c     | 449 ++++++++++++++++++
 .../testing/selftests/bpf/benchs/run_htab.sh  |  11 +
 .../selftests/bpf/prog_tests/str_key.c        |  71 +++
 .../selftests/bpf/progs/str_htab_bench.c      | 224 +++++++++
 tools/testing/selftests/bpf/progs/str_key.c   |  75 +++
 12 files changed, 1064 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_str_htab.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_htab.sh
 create mode 100644 tools/testing/selftests/bpf/prog_tests/str_key.c
 create mode 100644 tools/testing/selftests/bpf/progs/str_htab_bench.c
 create mode 100644 tools/testing/selftests/bpf/progs/str_key.c

-- 
2.25.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH bpf-next v2 1/3] bpf: add support for string in hash table key
  2022-02-14 11:13 [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Hou Tao
@ 2022-02-14 11:13 ` Hou Tao
  2022-02-14 11:13 ` [RFC PATCH bpf-next v2 2/3] selftests/bpf: add a simple test for htab str key Hou Tao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2022-02-14 11:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, John Fastabend, netdev, bpf, houtao1

In order to use string as hash-table key, key_size must be the storage
size of longest string. If there are large differencies in string
length, the hash distribution will be sub-optimal due to the unused
zero bytes in shorter strings and the lookup will be inefficient due to
unnecessary memcmp().

Also it is possible the unused part of string key returned from bpf helper
(e.g. bpf_d_path) is not mem-zeroed and if using it directly as lookup key,
the lookup will fail with -ENOENT.

Three changes are made to support string in key. Enforce the layout of hash
key struct through btf to place strings togerther and to compare the string
length and string hash firstly before the comparison of string content. The
layout of hash key struct needs to be as follow:

	>the start of hash key
	...
	[struct bpf_str_key_desc m;]
	...
	[struct bpf_str_key_desc n;]
	...
	struct bpf_str_key_stor z;
	unsigned char raw[N];
	>the end of hash key

String or concatenation of strings is placed in the trailing char array.
bpf_str_key_stor saves the length and hash of string. And bpf_str_key_desc
describes the offset and length of string if there are multiple strings in
hash key.

The second change is that change the hash algorithm for string content from
jhash() to full_name_hash() to reduce hash collision for string and the
string hash in bpf_str_key_stor will be used in hash of the whole key when
the string is not the only key in hash key. The last change is only compare
the used part in the trailing char array for comparison of string content.

A new flag BPF_F_STR_IN_KEY is added to enable above three changes and to
support strings in hash table key.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 include/linux/btf.h            |   3 +
 include/uapi/linux/bpf.h       |  19 ++++
 kernel/bpf/btf.c               |  39 ++++++++
 kernel/bpf/hashtab.c           | 162 ++++++++++++++++++++++++++++-----
 tools/include/uapi/linux/bpf.h |  19 ++++
 5 files changed, 217 insertions(+), 25 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 36bc09b8e890..270f5e3329bd 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -123,6 +123,9 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   u32 expected_offset, u32 expected_size);
 int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t);
 int btf_find_timer(const struct btf *btf, const struct btf_type *t);
+int btf_find_str_key_stor(const struct btf *btf, const struct btf_type *t);
+int btf_find_array_at(const struct btf *btf, const struct btf_type *t,
+		      unsigned int at, unsigned int nelems);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
 const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index afe3d0d7f5f2..c764a82d79cd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1218,6 +1218,9 @@ enum {
 
 /* Create a map that is suitable to be an inner map with dynamic max entries */
 	BPF_F_INNER_MAP		= (1U << 12),
+
+/* Flag for hash_map, there is string in hash key */
+	BPF_F_STR_IN_KEY	= (1U << 13),
 };
 
 /* Flags for BPF_PROG_QUERY. */
@@ -6565,4 +6568,20 @@ struct bpf_core_relo {
 	enum bpf_core_relo_kind kind;
 };
 
+struct bpf_str_key_desc {
+	/* the relative offset of string */
+	__u32 offset;
+	/* the length of string (include the trailing zero) */
+	__u32 len;
+};
+
+struct bpf_str_key_stor {
+	/* the total length of string */
+	__u32 len;
+	/* the hash of string */
+	__u32 hash;
+	/* the content of string */
+	unsigned char raw[0];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 11740b300de9..c57eb6c12ef2 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3187,6 +3187,16 @@ static int btf_find_field(const struct btf *btf, const struct btf_type *t,
 	return -EINVAL;
 }
 
+int btf_find_str_key_stor(const struct btf *btf, const struct btf_type *t)
+{
+	if (!__btf_type_is_struct(t))
+		return -EINVAL;
+
+	return btf_find_struct_field(btf, t, "bpf_str_key_stor",
+				     sizeof(struct bpf_str_key_stor),
+				     __alignof__(struct bpf_str_key_stor));
+}
+
 /* find 'struct bpf_spin_lock' in map value.
  * return >= 0 offset if found
  * and < 0 in case of error
@@ -7281,3 +7291,32 @@ int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo,
 	}
 	return err;
 }
+
+int btf_find_array_at(const struct btf *btf, const struct btf_type *t,
+		      unsigned int at, unsigned int nelems)
+{
+	const struct btf_member *member;
+	u32 i, off;
+
+	for_each_member(i, t, member) {
+		const struct btf_type *member_type = btf_type_by_id(btf, member->type);
+		const struct btf_array *array;
+
+		if (!btf_type_is_array(member_type))
+			continue;
+
+		off = __btf_member_bit_offset(t, member);
+		if (off % 8)
+			return -EINVAL;
+		off /= 8;
+		if (off != at)
+			continue;
+
+		array = btf_type_array(member_type);
+		if (array->nelems == nelems)
+			return off;
+		break;
+	}
+
+	return -ENOENT;
+}
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d29af9988f37..ab2f95212a9c 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -16,7 +16,7 @@
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
-	 BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
+	 BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED | BPF_F_STR_IN_KEY)
 
 #define BATCH_OPS(_name)			\
 	.map_lookup_batch =			\
@@ -93,6 +93,8 @@ struct bpf_htab {
 	struct bpf_map map;
 	struct bucket *buckets;
 	void *elems;
+	u32 hash_key_size;
+	int str_stor_off;
 	union {
 		struct pcpu_freelist freelist;
 		struct bpf_lru lru;
@@ -137,6 +139,61 @@ static inline bool htab_use_raw_lock(const struct bpf_htab *htab)
 	return (!IS_ENABLED(CONFIG_PREEMPT_RT) || htab_is_prealloc(htab));
 }
 
+static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
+{
+	return jhash(key, key_len, hashrnd);
+}
+
+static inline bool htab_str_in_key(const struct bpf_htab *htab)
+{
+	return htab->map.map_flags & BPF_F_STR_IN_KEY;
+}
+
+static inline bool htab_valid_str_in_key(const struct bpf_htab *htab, void *key)
+{
+	struct bpf_str_key_stor *str;
+
+	if (!htab_str_in_key(htab))
+		return true;
+
+	str = key + htab->str_stor_off;
+	return str->len > 0 &&
+	       str->len <= htab->map.key_size - htab->hash_key_size;
+}
+
+static inline bool htab_compared_key_size(const struct bpf_htab *htab, void *key)
+{
+	struct bpf_str_key_stor *str;
+
+	if (!htab_str_in_key(htab))
+		return htab->map.key_size;
+
+	str = key + htab->str_stor_off;
+	return htab->hash_key_size + str->len;
+}
+
+static inline u32 htab_map_calc_hash(const struct bpf_htab *htab, void *key)
+{
+	struct bpf_str_key_stor *str;
+
+	if (!htab_str_in_key(htab))
+		return htab_map_hash(key, htab->map.key_size, htab->hashrnd);
+
+	str = key + htab->str_stor_off;
+	if (!str->hash) {
+		str->hash = full_name_hash((void *)(unsigned long)htab->hashrnd,
+					   str->raw, str->len);
+		if (!str->hash)
+			str->hash = htab->n_buckets > 1 ? htab->n_buckets - 1 : 1;
+	}
+
+	/* String key only */
+	if (!htab->str_stor_off)
+		return str->hash;
+
+	return htab_map_hash(key, htab->hash_key_size, htab->hashrnd);
+}
+
 static void htab_init_buckets(struct bpf_htab *htab)
 {
 	unsigned i;
@@ -410,6 +467,7 @@ static int htab_map_alloc_check(union bpf_attr *attr)
 	bool percpu_lru = (attr->map_flags & BPF_F_NO_COMMON_LRU);
 	bool prealloc = !(attr->map_flags & BPF_F_NO_PREALLOC);
 	bool zero_seed = (attr->map_flags & BPF_F_ZERO_SEED);
+	bool str_in_key = (attr->map_flags & BPF_F_STR_IN_KEY);
 	int numa_node = bpf_map_attr_numa_node(attr);
 
 	BUILD_BUG_ON(offsetof(struct htab_elem, htab) !=
@@ -440,6 +498,10 @@ static int htab_map_alloc_check(union bpf_attr *attr)
 	if (numa_node != NUMA_NO_NODE && (percpu || percpu_lru))
 		return -EINVAL;
 
+	/* string in key needs key btf info */
+	if (str_in_key && !attr->btf_key_type_id)
+		return -EINVAL;
+
 	/* check sanity of attributes.
 	 * value_size == 0 may be allowed in the future to use map as a set
 	 */
@@ -563,11 +625,6 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
 	return ERR_PTR(err);
 }
 
-static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
-{
-	return jhash(key, key_len, hashrnd);
-}
-
 static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
 {
 	return &htab->buckets[hash & (htab->n_buckets - 1)];
@@ -624,18 +681,20 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
 	struct hlist_nulls_head *head;
 	struct htab_elem *l;
-	u32 hash, key_size;
+	u32 hash, cmp_size;
 
 	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
 		     !rcu_read_lock_bh_held());
 
-	key_size = map->key_size;
+	if (!htab_valid_str_in_key(htab, key))
+		return NULL;
 
-	hash = htab_map_hash(key, key_size, htab->hashrnd);
+	hash = htab_map_calc_hash(htab, key);
 
 	head = select_bucket(htab, hash);
 
-	l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets);
+	cmp_size = htab_compared_key_size(htab, key);
+	l = lookup_nulls_elem_raw(head, hash, key, cmp_size, htab->n_buckets);
 
 	return l;
 }
@@ -676,6 +735,45 @@ static int htab_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
 	return insn - insn_buf;
 }
 
+static int htab_check_btf(const struct bpf_map *map, const struct btf *btf,
+			  const struct btf_type *key_type,
+			  const struct btf_type *value_type)
+{
+	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+	int offset;
+	int next;
+
+	if (!htab_str_in_key(htab))
+		return 0;
+
+	/*
+	 * The layout of string in hash key must be as follows:
+	 *
+	 *   >the start of hash key
+	 *   ...
+	 *   [struct bpf_str_key_desc m;]
+	 *   ...
+	 *   [struct bpf_str_key_desc n;]
+	 *   ...
+	 *   struct bpf_str_key_stor z;
+	 *   unsigned char s[N];
+	 *   >the end of hash key
+	 */
+	offset = btf_find_str_key_stor(btf, key_type);
+	if (offset < 0)
+		return offset;
+
+	next = offset + sizeof(struct bpf_str_key_stor);
+	if (next >= map->key_size ||
+	    btf_find_array_at(btf, key_type, next, map->key_size - next) < 0)
+		return -EINVAL;
+
+	htab->hash_key_size = next;
+	htab->str_stor_off = offset;
+
+	return 0;
+}
+
 static __always_inline void *__htab_lru_map_lookup_elem(struct bpf_map *map,
 							void *key, const bool mark)
 {
@@ -772,7 +870,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
 	struct hlist_nulls_head *head;
 	struct htab_elem *l, *next_l;
-	u32 hash, key_size;
+	u32 hash, key_size, cmp_size;
 	int i = 0;
 
 	WARN_ON_ONCE(!rcu_read_lock_held());
@@ -782,12 +880,16 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 	if (!key)
 		goto find_first_elem;
 
-	hash = htab_map_hash(key, key_size, htab->hashrnd);
+	if (!htab_valid_str_in_key(htab, key))
+		return -EINVAL;
+
+	hash = htab_map_calc_hash(htab, key);
 
 	head = select_bucket(htab, hash);
 
 	/* lookup the key */
-	l = lookup_nulls_elem_raw(head, hash, key, key_size, htab->n_buckets);
+	cmp_size = htab_compared_key_size(htab, key);
+	l = lookup_nulls_elem_raw(head, hash, key, cmp_size, htab->n_buckets);
 
 	if (!l)
 		goto find_first_elem;
@@ -1024,19 +1126,22 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	struct hlist_nulls_head *head;
 	unsigned long flags;
 	struct bucket *b;
-	u32 key_size, hash;
+	u32 key_size, hash, cmp_size;
 	int ret;
 
 	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
 		/* unknown flags */
 		return -EINVAL;
+	if (!htab_valid_str_in_key(htab, key))
+		return -EINVAL;
 
 	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
 		     !rcu_read_lock_bh_held());
 
 	key_size = map->key_size;
+	cmp_size = htab_compared_key_size(htab, key);
 
-	hash = htab_map_hash(key, key_size, htab->hashrnd);
+	hash = htab_map_calc_hash(htab, key);
 
 	b = __select_bucket(htab, hash);
 	head = &b->head;
@@ -1045,7 +1150,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		if (unlikely(!map_value_has_spin_lock(map)))
 			return -EINVAL;
 		/* find an element without taking the bucket lock */
-		l_old = lookup_nulls_elem_raw(head, hash, key, key_size,
+		l_old = lookup_nulls_elem_raw(head, hash, key, cmp_size,
 					      htab->n_buckets);
 		ret = check_flags(htab, l_old, map_flags);
 		if (ret)
@@ -1067,7 +1172,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	if (ret)
 		return ret;
 
-	l_old = lookup_elem_raw(head, hash, key, key_size);
+	l_old = lookup_elem_raw(head, hash, key, cmp_size);
 
 	ret = check_flags(htab, l_old, map_flags);
 	if (ret)
@@ -1328,15 +1433,16 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
 	struct bucket *b;
 	struct htab_elem *l;
 	unsigned long flags;
-	u32 hash, key_size;
+	u32 hash, cmp_size;
 	int ret;
 
+	if (!htab_valid_str_in_key(htab, key))
+		return -EINVAL;
+
 	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
 		     !rcu_read_lock_bh_held());
 
-	key_size = map->key_size;
-
-	hash = htab_map_hash(key, key_size, htab->hashrnd);
+	hash = htab_map_calc_hash(htab, key);
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
@@ -1344,7 +1450,8 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
 	if (ret)
 		return ret;
 
-	l = lookup_elem_raw(head, hash, key, key_size);
+	cmp_size = htab_compared_key_size(htab, key);
+	l = lookup_elem_raw(head, hash, key, cmp_size);
 
 	if (l) {
 		hlist_nulls_del_rcu(&l->hash_node);
@@ -1495,13 +1602,16 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	struct hlist_nulls_head *head;
 	unsigned long bflags;
 	struct htab_elem *l;
-	u32 hash, key_size;
 	struct bucket *b;
+	u32 hash, key_size, cmp_size;
 	int ret;
 
+	if (!htab_valid_str_in_key(htab, key))
+		return -EINVAL;
+
 	key_size = map->key_size;
 
-	hash = htab_map_hash(key, key_size, htab->hashrnd);
+	hash = htab_map_calc_hash(htab, key);
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
@@ -1509,7 +1619,8 @@ static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
 	if (ret)
 		return ret;
 
-	l = lookup_elem_raw(head, hash, key, key_size);
+	cmp_size = htab_compared_key_size(htab, key);
+	l = lookup_elem_raw(head, hash, key, cmp_size);
 	if (!l) {
 		ret = -ENOENT;
 	} else {
@@ -2118,6 +2229,7 @@ const struct bpf_map_ops htab_map_ops = {
 	.map_update_elem = htab_map_update_elem,
 	.map_delete_elem = htab_map_delete_elem,
 	.map_gen_lookup = htab_map_gen_lookup,
+	.map_check_btf = htab_check_btf,
 	.map_seq_show_elem = htab_map_seq_show_elem,
 	.map_set_for_each_callback_args = map_set_for_each_callback_args,
 	.map_for_each_callback = bpf_for_each_hash_elem,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index afe3d0d7f5f2..c764a82d79cd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1218,6 +1218,9 @@ enum {
 
 /* Create a map that is suitable to be an inner map with dynamic max entries */
 	BPF_F_INNER_MAP		= (1U << 12),
+
+/* Flag for hash_map, there is string in hash key */
+	BPF_F_STR_IN_KEY	= (1U << 13),
 };
 
 /* Flags for BPF_PROG_QUERY. */
@@ -6565,4 +6568,20 @@ struct bpf_core_relo {
 	enum bpf_core_relo_kind kind;
 };
 
+struct bpf_str_key_desc {
+	/* the relative offset of string */
+	__u32 offset;
+	/* the length of string (include the trailing zero) */
+	__u32 len;
+};
+
+struct bpf_str_key_stor {
+	/* the total length of string */
+	__u32 len;
+	/* the hash of string */
+	__u32 hash;
+	/* the content of string */
+	unsigned char raw[0];
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH bpf-next v2 2/3] selftests/bpf: add a simple test for htab str key
  2022-02-14 11:13 [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Hou Tao
  2022-02-14 11:13 ` [RFC PATCH bpf-next v2 1/3] bpf: add support for string in hash table key Hou Tao
@ 2022-02-14 11:13 ` Hou Tao
  2022-02-14 11:13 ` [RFC PATCH bpf-next v2 3/3] selftests/bpf: add benchmark for string-key hash table Hou Tao
  2022-02-17  3:50 ` [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Alexei Starovoitov
  3 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2022-02-14 11:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, John Fastabend, netdev, bpf, houtao1

Add a test to demonstrate that str key doesn't care about
the content of unused part in hash-table key.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../selftests/bpf/prog_tests/str_key.c        | 71 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/str_key.c   | 75 +++++++++++++++++++
 2 files changed, 146 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/str_key.c
 create mode 100644 tools/testing/selftests/bpf/progs/str_key.c

diff --git a/tools/testing/selftests/bpf/prog_tests/str_key.c b/tools/testing/selftests/bpf/prog_tests/str_key.c
new file mode 100644
index 000000000000..998f12f8b919
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/str_key.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <test_progs.h>
+#include "str_key.skel.h"
+
+#define HTAB_NAME_SIZE 64
+
+struct str_htab_key {
+	struct bpf_str_key_stor name;
+	char raw[HTAB_NAME_SIZE];
+};
+
+static int setup_maps(struct str_key *skel, const char *name, unsigned int value)
+{
+	struct str_htab_key key;
+	int fd, err;
+
+	memset(&key, 0, sizeof(key));
+	strncpy(key.raw, name, sizeof(key.raw) - 1);
+	key.name.len = strlen(name) + 1;
+
+	fd = bpf_map__fd(skel->maps.str_htab);
+	err = bpf_map_update_elem(fd, &key, &value, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "str htab add"))
+		return -EINVAL;
+
+	fd = bpf_map__fd(skel->maps.byte_htab);
+	err = bpf_map_update_elem(fd, key.raw, &value, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "byte htab add"))
+		return -EINVAL;
+
+	return 0;
+}
+
+void test_str_key(void)
+{
+	const char *name = "/tmp/str_key_test";
+	struct str_key *skel;
+	unsigned int value;
+	int err, fd;
+
+	skel = str_key__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_load str key"))
+		return;
+
+	srandom(time(NULL));
+	value = random();
+	if (setup_maps(skel, name, value))
+		goto out;
+
+	skel->bss->pid = getpid();
+	err = str_key__attach(skel);
+	if (!ASSERT_OK(err, "attach"))
+		goto out;
+
+	fd = open(name, O_RDONLY | O_CREAT, 0644);
+	if (!ASSERT_GE(fd, 0, "open tmp file"))
+		goto out;
+	close(fd);
+	unlink(name);
+
+	ASSERT_EQ(skel->bss->str_htab_value, value, "str htab find");
+	ASSERT_EQ(skel->bss->byte_htab_value, -1, "byte htab find");
+
+out:
+	str_key__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/str_key.c b/tools/testing/selftests/bpf/progs/str_key.c
new file mode 100644
index 000000000000..4d5a4b7cf183
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/str_key.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct path {
+} __attribute__((preserve_access_index));
+
+struct file {
+	struct path f_path;
+} __attribute__((preserve_access_index));
+
+#define HTAB_NAME_SIZE 64
+
+struct str_htab_key {
+	struct bpf_str_key_stor name;
+	char raw[HTAB_NAME_SIZE];
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1);
+	__type(key, struct str_htab_key);
+	__type(value, __u32);
+	__uint(map_flags, BPF_F_STR_IN_KEY);
+} str_htab SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1);
+	__uint(key_size, HTAB_NAME_SIZE);
+	__uint(value_size, sizeof(__u32));
+} byte_htab SEC(".maps");
+
+int pid = 0;
+unsigned int str_htab_value = 0;
+unsigned int byte_htab_value = 0;
+
+SEC("fentry/filp_close")
+int BPF_PROG(filp_close, struct file *filp)
+{
+	struct path *p = &filp->f_path;
+	struct str_htab_key key;
+	unsigned int *value;
+	int len;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	__builtin_memset(key.raw, 0, sizeof(key.raw));
+	len = bpf_d_path(p, key.raw, sizeof(key.raw));
+	if (len < 0 || len > sizeof(key.raw))
+		return 0;
+
+	key.name.hash = 0;
+	key.name.len = len;
+	value = bpf_map_lookup_elem(&str_htab, &key);
+	if (value)
+		str_htab_value = *value;
+	else
+		str_htab_value = -1;
+
+	value = bpf_map_lookup_elem(&byte_htab, key.raw);
+	if (value)
+		byte_htab_value = *value;
+	else
+		byte_htab_value = -1;
+
+	return 0;
+}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH bpf-next v2 3/3] selftests/bpf: add benchmark for string-key hash table
  2022-02-14 11:13 [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Hou Tao
  2022-02-14 11:13 ` [RFC PATCH bpf-next v2 1/3] bpf: add support for string in hash table key Hou Tao
  2022-02-14 11:13 ` [RFC PATCH bpf-next v2 2/3] selftests/bpf: add a simple test for htab str key Hou Tao
@ 2022-02-14 11:13 ` Hou Tao
  2022-02-17  3:50 ` [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Alexei Starovoitov
  3 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2022-02-14 11:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, John Fastabend, netdev, bpf, houtao1

Three different key types are defined to representate the typical
use cases for string-key hash table:

(1) string-only key (str/byte)
(2) integer + string key (int-str/int-byte)
(3) integer + two strings key (int-strs/int-bytes)

It supports these three key types by using normal hash table and
string-key hash table and compares the lookup performance between normal
hash table and string-key hash table respectively.

Because string-key hash table depends on btf, and have not figured out
a way to setup btf for map key type dynamically, so in the benchmark
the max size of string is fixed as 252 (multiplier of 12 for jhash()).

The performance win comes from two aspect: (1) use full_name_hash()
for string content instead of jhash(). The former calculate hash in long
granularity and has better hash-distribution for string. (2) compare the
string hash and length firstly before compare the string content.

The win is about 180% and 170% under x86-64 and arm64 when key_size is
252 and increase to 280% and 270% when key_size is greater than 512.

The detailed results follows.

  N-{byte|int-byte|int-bytes}: lookup on normal hash table with three
  	different key type and N-bytes as max string size.
  N-{str|int-str|int-strs}: lookup on string-key hash table with three
  	different key type and N-bytes as max string size.

Under x86-64:

132-htab-byte-lookup       11.946 ± 0.005M/s (drops 0.000 ± 0.000M/s)
132-htab-str-lookup        23.762 ± 0.028M/s (drops 0.000 ± 0.000M/s)
132-htab-int-byte-lookup   11.510 ± 0.012M/s (drops 0.000 ± 0.000M/s)
132-htab-int-str-lookup    19.885 ± 0.020M/s (drops 0.000 ± 0.000M/s)
132-htab-int-bytes-lookup  11.464 ± 0.009M/s (drops 0.000 ± 0.000M/s)
132-htab-int-strs-lookup   18.183 ± 0.020M/s (drops 0.000 ± 0.000M/s)

252-htab-byte-lookup       7.092 ± 0.003M/s (drops 0.000 ± 0.000M/s)
252-htab-str-lookup        20.126 ± 0.017M/s (drops 0.000 ± 0.000M/s)
252-htab-int-byte-lookup   6.949 ± 0.002M/s (drops 0.000 ± 0.000M/s)
252-htab-int-str-lookup    17.173 ± 0.019M/s (drops 0.000 ± 0.000M/s)
252-htab-int-bytes-lookup  6.878 ± 0.001M/s (drops 0.000 ± 0.000M/s)
252-htab-int-strs-lookup   15.153 ± 0.006M/s (drops 0.000 ± 0.000M/s)

516-htab-byte-lookup       3.719 ± 0.001M/s (drops 0.000 ± 0.000M/s)
516-htab-str-lookup        14.185 ± 0.004M/s (drops 0.000 ± 0.000M/s)
516-htab-int-byte-lookup   3.675 ± 0.002M/s (drops 0.000 ± 0.000M/s)
516-htab-int-str-lookup    12.354 ± 0.005M/s (drops 0.000 ± 0.000M/s)
516-htab-int-bytes-lookup  3.673 ± 0.001M/s (drops 0.000 ± 0.000M/s)
516-htab-int-strs-lookup   11.328 ± 0.004M/s (drops 0.000 ± 0.000M/s)

Under arm64:

132-htab-byte-lookup       8.680 ± 0.003M/s (drops 0.009 ± 0.000M/s)
132-htab-str-lookup        18.736 ± 0.010M/s (drops 0.000 ± 0.000M/s)
132-htab-int-byte-lookup   8.239 ± 0.004M/s (drops 0.000 ± 0.000M/s)
132-htab-int-str-lookup    16.395 ± 0.049M/s (drops 0.000 ± 0.000M/s)
132-htab-int-bytes-lookup  8.199 ± 0.008M/s (drops 0.000 ± 0.000M/s)
132-htab-int-strs-lookup   13.047 ± 0.006M/s (drops 0.000 ± 0.000M/s)

252-htab-byte-lookup       5.092 ± 0.008M/s (drops 0.000 ± 0.000M/s)
252-htab-str-lookup        14.065 ± 0.179M/s (drops 0.000 ± 0.000M/s)
252-htab-int-byte-lookup   4.939 ± 0.002M/s (drops 0.000 ± 0.000M/s)
252-htab-int-str-lookup    13.612 ± 0.009M/s (drops 0.000 ± 0.000M/s)
252-htab-int-bytes-lookup  4.953 ± 0.007M/s (drops 0.000 ± 0.000M/s)
252-htab-int-strs-lookup   11.228 ± 0.004M/s (drops 0.000 ± 0.000M/s)

516-htab-byte-lookup       2.661 ± 0.002M/s (drops 0.000 ± 0.000M/s)
516-htab-str-lookup        9.857 ± 0.012M/s (drops 0.000 ± 0.000M/s)
516-htab-int-byte-lookup   2.616 ± 0.001M/s (drops 0.000 ± 0.000M/s)
516-htab-int-str-lookup    9.143 ± 0.001M/s (drops 0.000 ± 0.000M/s)
516-htab-int-bytes-lookup  2.615 ± 0.000M/s (drops 0.000 ± 0.000M/s)
516-htab-int-strs-lookup   7.456 ± 0.002M/s (drops 0.000 ± 0.000M/s)

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 tools/testing/selftests/bpf/Makefile          |   4 +-
 tools/testing/selftests/bpf/bench.c           |  14 +
 .../selftests/bpf/benchs/bench_str_htab.c     | 449 ++++++++++++++++++
 .../testing/selftests/bpf/benchs/run_htab.sh  |  11 +
 .../selftests/bpf/progs/str_htab_bench.c      | 224 +++++++++
 5 files changed, 701 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_str_htab.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_htab.sh
 create mode 100644 tools/testing/selftests/bpf/progs/str_htab_bench.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 91ea729990da..df174cfad8dd 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -538,6 +538,7 @@ $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
 $(OUTPUT)/bench_bloom_filter_map.o: $(OUTPUT)/bloom_filter_bench.skel.h
 $(OUTPUT)/bench_bpf_loop.o: $(OUTPUT)/bpf_loop_bench.skel.h
 $(OUTPUT)/bench_strncmp.o: $(OUTPUT)/strncmp_bench.skel.h
+$(OUTPUT)/bench_str_htab.o: $(OUTPUT)/str_htab_bench.skel.h
 $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ)
 $(OUTPUT)/bench: LDLIBS += -lm
 $(OUTPUT)/bench: $(OUTPUT)/bench.o \
@@ -549,7 +550,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
 		 $(OUTPUT)/bench_ringbufs.o \
 		 $(OUTPUT)/bench_bloom_filter_map.o \
 		 $(OUTPUT)/bench_bpf_loop.o \
-		 $(OUTPUT)/bench_strncmp.o
+		 $(OUTPUT)/bench_strncmp.o \
+		 $(OUTPUT)/bench_str_htab.o
 	$(call msg,BINARY,,$@)
 	$(Q)$(CC) $(CFLAGS) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
 
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index f973320e6dbf..ba241d34b0ec 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -190,12 +190,14 @@ extern struct argp bench_ringbufs_argp;
 extern struct argp bench_bloom_map_argp;
 extern struct argp bench_bpf_loop_argp;
 extern struct argp bench_strncmp_argp;
+extern struct argp bench_str_htab_argp;
 
 static const struct argp_child bench_parsers[] = {
 	{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
 	{ &bench_bloom_map_argp, 0, "Bloom filter map benchmark", 0 },
 	{ &bench_bpf_loop_argp, 0, "bpf_loop helper benchmark", 0 },
 	{ &bench_strncmp_argp, 0, "bpf_strncmp helper benchmark", 0 },
+	{ &bench_str_htab_argp, 0, "string htab benchmark", 0 },
 	{},
 };
 
@@ -397,6 +399,12 @@ extern const struct bench bench_hashmap_with_bloom;
 extern const struct bench bench_bpf_loop;
 extern const struct bench bench_strncmp_no_helper;
 extern const struct bench bench_strncmp_helper;
+extern const struct bench bench_htab_byte_lookup;
+extern const struct bench bench_htab_str_lookup;
+extern const struct bench bench_htab_int_byte_lookup;
+extern const struct bench bench_htab_int_str_lookup;
+extern const struct bench bench_htab_int_bytes_lookup;
+extern const struct bench bench_htab_int_strs_lookup;
 
 static const struct bench *benchs[] = {
 	&bench_count_global,
@@ -431,6 +439,12 @@ static const struct bench *benchs[] = {
 	&bench_bpf_loop,
 	&bench_strncmp_no_helper,
 	&bench_strncmp_helper,
+	&bench_htab_byte_lookup,
+	&bench_htab_str_lookup,
+	&bench_htab_int_byte_lookup,
+	&bench_htab_int_str_lookup,
+	&bench_htab_int_bytes_lookup,
+	&bench_htab_int_strs_lookup,
 };
 
 static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/benchs/bench_str_htab.c b/tools/testing/selftests/bpf/benchs/bench_str_htab.c
new file mode 100644
index 000000000000..6fee5701821b
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/bench_str_htab.c
@@ -0,0 +1,449 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <argp.h>
+#include "bench.h"
+#include "bpf_util.h"
+
+#define DFT_STR_KEY_SIZE 252
+
+struct htab_byte_key {
+	char name[DFT_STR_KEY_SIZE];
+};
+
+struct htab_str_key {
+	struct bpf_str_key_stor name;
+	char raw[DFT_STR_KEY_SIZE];
+};
+
+struct htab_int_byte_key {
+        int cookie;
+        char name[DFT_STR_KEY_SIZE];
+};
+
+struct htab_int_str_key {
+        int cookie;
+        struct bpf_str_key_stor name;
+        char raw[DFT_STR_KEY_SIZE];
+};
+
+struct htab_int_bytes_key {
+        int cookie;
+        char name[DFT_STR_KEY_SIZE / 2];
+        char addr[DFT_STR_KEY_SIZE / 2];
+};
+
+struct htab_int_strs_key {
+        int cookie;
+        struct bpf_str_key_desc name;
+        struct bpf_str_key_desc addr;
+        struct bpf_str_key_stor stor;
+        char raw[DFT_STR_KEY_SIZE];
+};
+
+#include "str_htab_bench.skel.h"
+
+typedef void *(*get_nth_key)(struct str_htab_bench *skel, unsigned int i);
+typedef void (*set_key)(void *key);
+
+static const char strs[] =
+"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!\"#$%&'()*+,-./:;<=>?@[]^_`{|}~ ";
+
+static struct str_htab_ctx {
+	struct str_htab_bench *skel;
+} ctx;
+
+static struct {
+	bool same_len;
+	__u32 max_entries;
+} args = {
+	.same_len = false,
+	.max_entries = 1000,
+};
+
+enum {
+	ARG_SAME_LEN = 6000,
+	ARG_MAX_ENTRIES = 6001,
+};
+
+static const struct argp_option opts[] = {
+	{ "same-len", ARG_SAME_LEN, NULL, 0,
+	  "Use the same length for string keys" },
+	{ "max-entries", ARG_MAX_ENTRIES, "MAX_ENTRIES", 0,
+	  "Set the max entries" },
+	{},
+};
+
+static error_t str_htab_parse_arg(int key, char *arg, struct argp_state *state)
+{
+	switch (key) {
+	case ARG_SAME_LEN:
+		args.same_len = true;
+		break;
+	case ARG_MAX_ENTRIES:
+		args.max_entries = strtoul(arg, NULL, 10);
+		break;
+	default:
+		return ARGP_ERR_UNKNOWN;
+	}
+
+	return 0;
+}
+
+const struct argp bench_str_htab_argp = {
+	.options = opts,
+	.parser = str_htab_parse_arg,
+};
+
+static void str_htab_validate(void)
+{
+	if (env.consumer_cnt != 1) {
+		fprintf(stderr, "str_htab benchmark doesn't support multi-consumer!\n");
+		exit(1);
+	}
+
+	if (!args.max_entries ||
+	    args.max_entries > ARRAY_SIZE(ctx.skel->bss->byte_keys)) {
+		fprintf(stderr, "invalid max entries (max %zu)\n",
+			ARRAY_SIZE(ctx.skel->bss->byte_keys));
+		exit(1);
+	}
+}
+
+static void setup_max_entries(struct str_htab_bench *skel, unsigned int nr)
+{
+	bpf_map__set_max_entries(skel->maps.byte_htab, nr);
+	bpf_map__set_max_entries(skel->maps.str_htab, nr);
+	bpf_map__set_max_entries(skel->maps.int_byte_htab, nr);
+	bpf_map__set_max_entries(skel->maps.int_str_htab, nr);
+	bpf_map__set_max_entries(skel->maps.int_bytes_htab, nr);
+	bpf_map__set_max_entries(skel->maps.int_strs_htab, nr);
+}
+
+static void random_fill_str(char *str, unsigned int max_sz, unsigned int *sz)
+{
+	unsigned int len;
+	unsigned int i;
+
+	if (args.same_len)
+		len = max_sz - 1;
+	else
+		len = random() % (max_sz - 1) + 1;
+	if (sz)
+		*sz = len + 1;
+
+	/* Generate in byte-granularity to avoid zero byte */
+	for (i = 0; i < len; i++)
+		str[i] = strs[random() % (sizeof(strs) - 1)];
+	str[i] = 0;
+}
+
+static void setup_keys(struct str_htab_bench *skel, get_nth_key getter, set_key setter)
+{
+	unsigned int i;
+
+	for (i = 0; i < args.max_entries; i++) {
+		void *key = getter(skel, i);
+
+		setter(key);
+	}
+}
+
+static void setup_htab(struct str_htab_bench *skel, struct bpf_map *map,
+		       get_nth_key getter)
+{
+	int fd = bpf_map__fd(map);
+	unsigned int value;
+	unsigned int i;
+	void *key;
+
+	for (i = 0; i < args.max_entries; i++) {
+		int err;
+
+		key = getter(skel, i);
+		value = i;
+		err = bpf_map_update_elem(fd, key, &value, 0);
+		if (err) {
+			fprintf(stderr, "add #%u key on %s error %d\n",
+				i, bpf_map__name(map), err);
+			exit(1);
+		}
+	}
+}
+
+static void *byte_get_nth_key(struct str_htab_bench *skel, unsigned int i)
+{
+	return &skel->bss->byte_keys[i];
+}
+
+static void byte_set_key(void *key)
+{
+	struct htab_byte_key *cur = key;
+
+	random_fill_str(cur->name, sizeof(cur->name), NULL);
+}
+
+static void *int_byte_get_nth_key(struct str_htab_bench *skel, unsigned int i)
+{
+	return &skel->bss->int_byte_keys[i];
+}
+
+static void int_byte_set_key(void *key)
+{
+	struct htab_int_byte_key *cur = key;
+
+	cur->cookie = random();
+	random_fill_str(cur->name, sizeof(cur->name), NULL);
+}
+
+static void *int_bytes_get_nth_key(struct str_htab_bench *skel, unsigned int i)
+{
+	return &skel->bss->int_bytes_keys[i];
+}
+
+static void int_bytes_set_key(void *key)
+{
+	struct htab_int_bytes_key *cur = key;
+
+	cur->cookie = random();
+	random_fill_str(cur->name, sizeof(cur->name), NULL);
+	random_fill_str(cur->addr, sizeof(cur->addr), NULL);
+}
+
+static void *str_get_nth_key(struct str_htab_bench *skel, unsigned int i)
+{
+	return &skel->bss->str_keys[i];
+}
+
+static void str_set_key(void *key)
+{
+	struct htab_str_key *cur = key;
+
+	cur->name.hash = 0;
+	random_fill_str(cur->raw, sizeof(cur->raw), &cur->name.len);
+}
+
+static void *int_str_get_nth_key(struct str_htab_bench *skel, unsigned int i)
+{
+	return &skel->bss->int_str_keys[i];
+}
+
+static void int_str_set_key(void *key)
+{
+	struct htab_int_str_key *cur = key;
+
+	cur->cookie = random();
+	cur->name.hash = 0;
+	random_fill_str(cur->raw, sizeof(cur->raw), &cur->name.len);
+}
+
+static void *int_strs_get_nth_key(struct str_htab_bench *skel, unsigned int i)
+{
+	return &skel->bss->int_strs_keys[i];
+}
+
+static void int_strs_set_key(void *key)
+{
+	struct htab_int_strs_key *cur = key;
+	unsigned int max_sz = sizeof(cur->raw) / 2;
+
+	cur->cookie = random();
+
+	cur->name.offset = cur->raw - (char *)&cur->name;
+	random_fill_str(cur->raw, max_sz, &cur->name.len);
+
+	cur->addr.offset = cur->raw + cur->name.len - (char *)&cur->addr;
+	random_fill_str(cur->raw + cur->name.len, max_sz, &cur->addr.len);
+
+	cur->stor.hash = 0;
+	cur->stor.len = cur->name.len + cur->addr.len;
+}
+
+static void str_htab_common_setup(void)
+{
+	struct str_htab_bench *skel;
+	int err;
+
+	srandom(time(NULL));
+
+	setup_libbpf();
+
+	skel = str_htab_bench__open();
+	if (!skel) {
+		fprintf(stderr, "failed to open skeleton\n");
+		exit(1);
+	}
+
+	setup_max_entries(skel, args.max_entries);
+
+	skel->bss->loops = args.max_entries;
+
+	err = str_htab_bench__load(skel);
+	if (err) {
+		fprintf(stderr, "failed to load skeleton\n");
+		str_htab_bench__destroy(skel);
+		exit(1);
+	}
+
+	ctx.skel = skel;
+}
+
+static void str_htab_attach_prog(struct bpf_program *prog)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
+static void str_htab_byte_lookup_setup(void)
+{
+	str_htab_common_setup();
+
+	setup_keys(ctx.skel, byte_get_nth_key, byte_set_key);
+	setup_htab(ctx.skel, ctx.skel->maps.byte_htab, byte_get_nth_key);
+
+	ctx.skel->bss->key_type = 0;
+	str_htab_attach_prog(ctx.skel->progs.htab_byte_lookup);
+}
+
+static void str_htab_int_byte_lookup_setup(void)
+{
+	str_htab_common_setup();
+
+	setup_keys(ctx.skel, int_byte_get_nth_key, int_byte_set_key);
+	setup_htab(ctx.skel, ctx.skel->maps.int_byte_htab, int_byte_get_nth_key);
+
+	ctx.skel->bss->key_type = 1;
+	str_htab_attach_prog(ctx.skel->progs.htab_byte_lookup);
+}
+
+static void str_htab_int_bytes_lookup_setup(void)
+{
+	str_htab_common_setup();
+
+	setup_keys(ctx.skel, int_bytes_get_nth_key, int_bytes_set_key);
+	setup_htab(ctx.skel, ctx.skel->maps.int_bytes_htab, int_bytes_get_nth_key);
+
+	ctx.skel->bss->key_type = 2;
+	str_htab_attach_prog(ctx.skel->progs.htab_byte_lookup);
+}
+
+static void str_htab_str_lookup_setup(void)
+{
+	str_htab_common_setup();
+
+	setup_keys(ctx.skel, str_get_nth_key, str_set_key);
+	setup_htab(ctx.skel, ctx.skel->maps.str_htab, str_get_nth_key);
+
+	ctx.skel->bss->key_type = 0;
+	str_htab_attach_prog(ctx.skel->progs.htab_str_lookup);
+}
+
+static void str_htab_int_str_lookup_setup(void)
+{
+	str_htab_common_setup();
+
+	setup_keys(ctx.skel, int_str_get_nth_key, int_str_set_key);
+	setup_htab(ctx.skel, ctx.skel->maps.int_str_htab, int_str_get_nth_key);
+
+	ctx.skel->bss->key_type = 1;
+	str_htab_attach_prog(ctx.skel->progs.htab_str_lookup);
+}
+
+static void str_htab_int_strs_lookup_setup(void)
+{
+	str_htab_common_setup();
+
+	setup_keys(ctx.skel, int_strs_get_nth_key, int_strs_set_key);
+	setup_htab(ctx.skel, ctx.skel->maps.int_strs_htab, int_strs_get_nth_key);
+
+	ctx.skel->bss->key_type = 2;
+	str_htab_attach_prog(ctx.skel->progs.htab_str_lookup);
+}
+
+static void *str_htab_producer(void *ctx)
+{
+	while (true)
+		(void)syscall(__NR_getpgid);
+	return NULL;
+}
+
+static void *str_htab_consumer(void *ctx)
+{
+	return NULL;
+}
+
+static void str_htab_measure(struct bench_res *res)
+{
+	res->hits = atomic_swap(&ctx.skel->bss->hits, 0);
+	res->drops = atomic_swap(&ctx.skel->bss->drops, 0);
+}
+
+const struct bench bench_htab_byte_lookup = {
+	.name = "htab-byte-lookup",
+	.validate = str_htab_validate,
+	.setup = str_htab_byte_lookup_setup,
+	.producer_thread = str_htab_producer,
+	.consumer_thread = str_htab_consumer,
+	.measure = str_htab_measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_htab_int_byte_lookup = {
+	.name = "htab-int-byte-lookup",
+	.validate = str_htab_validate,
+	.setup = str_htab_int_byte_lookup_setup,
+	.producer_thread = str_htab_producer,
+	.consumer_thread = str_htab_consumer,
+	.measure = str_htab_measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_htab_int_bytes_lookup = {
+	.name = "htab-int-bytes-lookup",
+	.validate = str_htab_validate,
+	.setup = str_htab_int_bytes_lookup_setup,
+	.producer_thread = str_htab_producer,
+	.consumer_thread = str_htab_consumer,
+	.measure = str_htab_measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_htab_str_lookup = {
+	.name = "htab-str-lookup",
+	.validate = str_htab_validate,
+	.setup = str_htab_str_lookup_setup,
+	.producer_thread = str_htab_producer,
+	.consumer_thread = str_htab_consumer,
+	.measure = str_htab_measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_htab_int_str_lookup = {
+	.name = "htab-int-str-lookup",
+	.validate = str_htab_validate,
+	.setup = str_htab_int_str_lookup_setup,
+	.producer_thread = str_htab_producer,
+	.consumer_thread = str_htab_consumer,
+	.measure = str_htab_measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_htab_int_strs_lookup = {
+	.name = "htab-int-strs-lookup",
+	.validate = str_htab_validate,
+	.setup = str_htab_int_strs_lookup_setup,
+	.producer_thread = str_htab_producer,
+	.consumer_thread = str_htab_consumer,
+	.measure = str_htab_measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_htab.sh b/tools/testing/selftests/bpf/benchs/run_htab.sh
new file mode 100755
index 000000000000..c848324dd256
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_htab.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source ./benchs/run_common.sh
+
+set -eufo pipefail
+
+for tp in byte str int-byte int-str int-bytes int-strs; do
+	name=htab-${tp}-lookup
+	summarize ${name} "$($RUN_BENCH ${name})"
+done
diff --git a/tools/testing/selftests/bpf/progs/str_htab_bench.c b/tools/testing/selftests/bpf/progs/str_htab_bench.c
new file mode 100644
index 000000000000..008c654e85cb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/str_htab_bench.c
@@ -0,0 +1,224 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2022. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#define MAX_ENTRY_NR 1000
+#define DFT_STR_KEY_SIZE 252
+
+struct htab_byte_key {
+	char name[DFT_STR_KEY_SIZE];
+};
+
+struct htab_str_key {
+	struct bpf_str_key_stor name;
+	char raw[DFT_STR_KEY_SIZE];
+};
+
+struct htab_int_byte_key {
+	int cookie;
+	char name[DFT_STR_KEY_SIZE];
+};
+
+struct htab_int_str_key {
+	int cookie;
+	struct bpf_str_key_stor name;
+	char raw[DFT_STR_KEY_SIZE];
+};
+
+struct htab_int_bytes_key {
+	int cookie;
+	char name[DFT_STR_KEY_SIZE / 2];
+	char addr[DFT_STR_KEY_SIZE / 2];
+};
+
+struct htab_int_strs_key {
+	int cookie;
+	struct bpf_str_key_desc name;
+	struct bpf_str_key_desc addr;
+	struct bpf_str_key_stor stor;
+	char raw[DFT_STR_KEY_SIZE];
+};
+
+/* max_entries will be set by htab benchmark */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, struct htab_byte_key);
+	__uint(value_size, 4);
+	__type(value, __u32);
+} byte_htab SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, struct htab_int_byte_key);
+	__type(value, __u32);
+} int_byte_htab SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, struct htab_int_bytes_key);
+	__type(value, __u32);
+} int_bytes_htab SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, struct htab_str_key);
+	__type(value, __u32);
+	__uint(map_flags, BPF_F_STR_IN_KEY);
+} str_htab SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, struct htab_int_str_key);
+	__type(value, __u32);
+	__uint(map_flags, BPF_F_STR_IN_KEY);
+} int_str_htab SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__type(key, struct htab_int_strs_key);
+	__type(value, __u32);
+	__uint(map_flags, BPF_F_STR_IN_KEY);
+} int_strs_htab SEC(".maps");
+
+char _license[] SEC("license") = "GPL";
+
+struct htab_byte_key byte_keys[MAX_ENTRY_NR];
+struct htab_str_key str_keys[MAX_ENTRY_NR];
+struct htab_int_byte_key int_byte_keys[MAX_ENTRY_NR];
+struct htab_int_str_key int_str_keys[MAX_ENTRY_NR];
+struct htab_int_bytes_key int_bytes_keys[MAX_ENTRY_NR];
+struct htab_int_strs_key int_strs_keys[MAX_ENTRY_NR];
+
+unsigned int loops = 0;
+unsigned int key_type = 0;
+long hits = 0;
+long drops = 0;
+
+static int lookup_byte(__u32 index, void *data)
+{
+	unsigned int *value;
+
+	if (index >= MAX_ENTRY_NR)
+		return 1;
+
+	value = bpf_map_lookup_elem(&byte_htab, &byte_keys[index]);
+	if (value && *value == index)
+		__sync_add_and_fetch(&hits, 1);
+	else
+		__sync_add_and_fetch(&drops, 1);
+
+	return 0;
+}
+
+static int lookup_int_byte(__u32 index, void *data)
+{
+	unsigned int *value;
+
+	if (index >= MAX_ENTRY_NR)
+		return 1;
+
+	value = bpf_map_lookup_elem(&int_byte_htab, &int_byte_keys[index]);
+	if (value && *value == index)
+		__sync_add_and_fetch(&hits, 1);
+	else
+		__sync_add_and_fetch(&drops, 1);
+
+	return 0;
+}
+
+static int lookup_int_bytes(__u32 index, void *data)
+{
+	unsigned int *value;
+
+	if (index >= MAX_ENTRY_NR)
+		return 1;
+
+	value = bpf_map_lookup_elem(&int_bytes_htab, &int_bytes_keys[index]);
+	if (value && *value == index)
+		__sync_add_and_fetch(&hits, 1);
+	else
+		__sync_add_and_fetch(&drops, 1);
+
+	return 0;
+}
+
+static int lookup_str(__u32 index, void *data)
+{
+	unsigned int *value;
+
+	if (index >= MAX_ENTRY_NR)
+		return 1;
+
+	/* Clear the hash value from previous lookup */
+	str_keys[index].name.hash = 0;
+	value = bpf_map_lookup_elem(&str_htab, &str_keys[index]);
+	if (value && *value == index)
+		__sync_add_and_fetch(&hits, 1);
+	else
+		__sync_add_and_fetch(&drops, 1);
+
+	return 0;
+}
+
+static int lookup_int_str(__u32 index, void *data)
+{
+	unsigned int *value;
+
+	if (index >= MAX_ENTRY_NR)
+		return 1;
+
+	int_str_keys[index].name.hash = 0;
+	value = bpf_map_lookup_elem(&int_str_htab, &int_str_keys[index]);
+	if (value && *value == index)
+		__sync_add_and_fetch(&hits, 1);
+	else
+		__sync_add_and_fetch(&drops, 1);
+
+	return 0;
+}
+
+static int lookup_int_strs(__u32 index, void *data)
+{
+	unsigned int *value;
+
+	if (index >= MAX_ENTRY_NR)
+		return 1;
+
+	int_strs_keys[index].stor.hash = 0;
+	value = bpf_map_lookup_elem(&int_strs_htab, &int_strs_keys[index]);
+	if (value && *value == index)
+		__sync_add_and_fetch(&hits, 1);
+	else
+		__sync_add_and_fetch(&drops, 1);
+
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_getpgid")
+int htab_byte_lookup(void *ctx)
+{
+	if (!key_type)
+		bpf_loop(loops, lookup_byte, NULL, 0);
+	else if (key_type == 1)
+		bpf_loop(loops, lookup_int_byte, NULL, 0);
+	else
+		bpf_loop(loops, lookup_int_bytes, NULL, 0);
+
+	return 0;
+}
+
+SEC("tp/syscalls/sys_enter_getpgid")
+int htab_str_lookup(void *ctx)
+{
+	if (!key_type)
+		bpf_loop(loops, lookup_str, NULL, 0);
+	else if (key_type == 1)
+		bpf_loop(loops, lookup_int_str, NULL, 0);
+	else
+		bpf_loop(loops, lookup_int_strs, NULL, 0);
+
+	return 0;
+}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab
  2022-02-14 11:13 [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Hou Tao
                   ` (2 preceding siblings ...)
  2022-02-14 11:13 ` [RFC PATCH bpf-next v2 3/3] selftests/bpf: add benchmark for string-key hash table Hou Tao
@ 2022-02-17  3:50 ` Alexei Starovoitov
  2022-02-18 13:53   ` Hou Tao
  3 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2022-02-17  3:50 UTC (permalink / raw)
  To: Hou Tao
  Cc: Alexei Starovoitov, Martin KaFai Lau, Yonghong Song,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, John Fastabend,
	netdev, bpf

On Mon, Feb 14, 2022 at 07:13:34PM +0800, Hou Tao wrote:
> Hi,
> 
> In order to use string as hash-table key, key_size must be the storage
> size of longest string. If there are large differencies in string
> length, the hash distribution will be sub-optimal due to the unused
> zero bytes in shorter strings and the lookup will be inefficient due to
> unnecessary memcmp().
> 
> Also it is possible the unused part of string key returned from bpf helper
> (e.g. bpf_d_path) is not mem-zeroed and if using it directly as lookup key,
> the lookup will fail with -ENOENT (as reported in [1]).
> 
> The patchset tries to address the inefficiency by adding support for
> string key. There is extensibility problem in v1 because the string key
> and its optimization is only available for string-only key. To make it
> extensible, v2 introduces bpf_str_key_stor and bpf_str_key_desc and enforce
> the layout of hash key struct through BTF as follows:
> 
> 	>the start of hash key
> 	...
> 	[struct bpf_str_key_desc m;]
> 	...
> 	[struct bpf_str_key_desc n;]
> 	...
> 	struct bpf_str_key_stor z;
> 	unsigned char raw[N];
> 	>the end of hash key

Sorry, but this is dead end.
The v2 didn't fundamentally change the design.
The bpf_str_key_desc has an offset field, but it's unused.
The len field is dynamically checked at run-time and all hash maps
users will be paying that penalty though majority will not be
using this feature.
This patch set is incredibly specific solution to one task.
It's far from being generic. We cannot extend bpf this way.
All building blocks must be as generic as possible.
If you want strings as a key then the key must be variable size.
This patch doesn't make them so. It still requires some
predefined fixed size for largest string. This is no go.
Variable sized key means truly variable to megabytes long.
The key would need to contain an object (a pointer wrap) to
a variable sized object. And this object can be arbitrary
blob of bytes. Not just null terminated string.
We've been thinking about "dynamic pointer" concept where
pointer + length will be represented as an object.
The program will be able to allocate it and persist into a map value
and potentially into a map key.
For storing a bunch of strings one can use a strong hash and store
that hash in the key while keeping full string as a variable sized
object inside the value.
Another approach is to introduce a trie to store strings, or dfa,
or aho-corasick, etc. There are plenty of data structures that are
more suitable for storing and searching strings depending on the use case.
Using hash table for strings has its downsides.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab
  2022-02-17  3:50 ` [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Alexei Starovoitov
@ 2022-02-18 13:53   ` Hou Tao
  2022-02-19 18:44     ` Alexei Starovoitov
  0 siblings, 1 reply; 9+ messages in thread
From: Hou Tao @ 2022-02-18 13:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, John Fastabend, netdev, bpf,
	Joanne Koong

Hi,

On 2/17/2022 11:50 AM, Alexei Starovoitov wrote:
> On Mon, Feb 14, 2022 at 07:13:34PM +0800, Hou Tao wrote:
>> Hi,
>>
>> In order to use string as hash-table key, key_size must be the storage
>> size of longest string. If there are large differencies in string
>> length, the hash distribution will be sub-optimal due to the unused
>> zero bytes in shorter strings and the lookup will be inefficient due to
>> unnecessary memcmp().
>>
>> Also it is possible the unused part of string key returned from bpf helper
>> (e.g. bpf_d_path) is not mem-zeroed and if using it directly as lookup key,
>> the lookup will fail with -ENOENT (as reported in [1]).
>>
>> The patchset tries to address the inefficiency by adding support for
>> string key. There is extensibility problem in v1 because the string key
>> and its optimization is only available for string-only key. To make it
>> extensible, v2 introduces bpf_str_key_stor and bpf_str_key_desc and enforce
>> the layout of hash key struct through BTF as follows:
>>
>> 	>the start of hash key
>> 	...
>> 	[struct bpf_str_key_desc m;]
>> 	...
>> 	[struct bpf_str_key_desc n;]
>> 	...
>> 	struct bpf_str_key_stor z;
>> 	unsigned char raw[N];
>> 	>the end of hash key
> Sorry, but this is dead end.
> The v2 didn't fundamentally change the design.
> The bpf_str_key_desc has an offset field, but it's unused.
It is used during key comparison to ensure the offset and length of
sub-strings are the same and it also can be used to locate the sub-string
from the raw array.
> The len field is dynamically checked at run-time and all hash maps
> users will be paying that penalty though majority will not be
> using this feature.
> This patch set is incredibly specific solution to one task.
> It's far from being generic. We cannot extend bpf this way.
> All building blocks must be as generic as possible.
Yes, you are right. I worked in the wrong direction. I tried to keep
the change set being small and good performance, but forget that it is
not generic enough.
> If you want strings as a key then the key must be variable size.
> This patch doesn't make them so. It still requires some
> predefined fixed size for largest string. This is no go.
> Variable sized key means truly variable to megabytes long.
> The key would need to contain an object (a pointer wrap) to
> a variable sized object. And this object can be arbitrary
> blob of bytes. Not just null terminated string.
> We've been thinking about "dynamic pointer" concept where
> pointer + length will be represented as an object.
I have seen the proposal from Joanne Koong on "dynamic pointers".
It can solve the storage problem of string, but the lookup of
string is still a problem. Hash is a option but can we support
two dynamic pointers points to the same internal object and use
the id of the internal object to represent the string ?
> The program will be able to allocate it and persist into a map value
> and potentially into a map key.
> For storing a bunch of strings one can use a strong hash and store
> that hash in the key while keeping full string as a variable sized
> object inside the value.
Will using a strong hash function impact the lookup performance because
each lookup will need to calculate the hash ?

> Another approach is to introduce a trie to store strings, or dfa,
> or aho-corasick, etc. There are plenty of data structures that are
> more suitable for storing and searching strings depending on the use case.
> Using hash table for strings has its downsides.
> .
Before add support for string key in hash table, we had implement tries,
ternary search tree and hash table in user-space for string lookup. hash
table shows better performance and memory usage, so we decide to do string
support in hash table. We will revisit our tests and investigate new string
data structures.

Regards,
Tao


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab
  2022-02-18 13:53   ` Hou Tao
@ 2022-02-19 18:44     ` Alexei Starovoitov
       [not found]       ` <ecc04a70-0b57-62ef-ab52-e7169845d789@huawei.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2022-02-19 18:44 UTC (permalink / raw)
  To: Hou Tao
  Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, John Fastabend, Network Development,
	bpf, Joanne Koong

On Fri, Feb 18, 2022 at 5:54 AM Hou Tao <houtao1@huawei.com> wrote:
>
> > We've been thinking about "dynamic pointer" concept where
> > pointer + length will be represented as an object.
> I have seen the proposal from Joanne Koong on "dynamic pointers".
> It can solve the storage problem of string, but the lookup of
> string is still a problem. Hash is a option but can we support
> two dynamic pointers points to the same internal object and use
> the id of the internal object to represent the string ?

Let's have a discussion about dynptr in that thread.

> > The program will be able to allocate it and persist into a map value
> > and potentially into a map key.
> > For storing a bunch of strings one can use a strong hash and store
> > that hash in the key while keeping full string as a variable sized
> > object inside the value.
> Will using a strong hash function impact the lookup performance because
> each lookup will need to calculate the hash ?
>
> > Another approach is to introduce a trie to store strings, or dfa,
> > or aho-corasick, etc. There are plenty of data structures that are
> > more suitable for storing and searching strings depending on the use case.
> > Using hash table for strings has its downsides.
> > .
> Before add support for string key in hash table, we had implement tries,
> ternary search tree and hash table in user-space for string lookup. hash
> table shows better performance and memory usage, so we decide to do string
> support in hash table. We will revisit our tests and investigate new string
> data structures.

What performance characteristics did you consider?
Just the speed of the lookup ?
How many elements are there in the map ?
Is it 80% or 10% full ?
Did you compare memory usage ?
Did you consider the speed of update/delete ?
preallocated or dynamic alloc ?

With dynamic pointers and further verifier improvements bpf programs
will be able to implement a hash table on their own.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab
       [not found]       ` <ecc04a70-0b57-62ef-ab52-e7169845d789@huawei.com>
@ 2022-02-27  3:08         ` Alexei Starovoitov
  2022-03-09 11:47           ` Hou Tao
  0 siblings, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2022-02-27  3:08 UTC (permalink / raw)
  To: Hou Tao
  Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, John Fastabend, Network Development,
	bpf, Joanne Koong

On Sat, Feb 26, 2022 at 4:16 AM Hou Tao <houtao1@huawei.com> wrote:
>
> For now, our case is a write-once case, so only lookup is considered.
> When data set is bigger than 128KB, hash table has better lookup performance as
> show below:
>
> | lookup all elem (ms) | 4K  | 16K  | 64K  | 128K  | 256K  | 512K  | 1M     | 2M     | 4M      | 7M      |
> | -------------------- | --- | ---- | ---- | ----- | ----- | ----- | ------ | ------ | ------- | ------- |
> | hash                 | 3.3 | 12.7 | 47   | 90.6  | 185.9 | 382.3 | 788.5  | 1622.4 | 3296    | 6248.7  |
> | tries                | 2   | 10   | 45.9 | 111.6 | 274.6 | 688.9 | 1747.2 | 4394.5 | 11229.8 | 27148.8 |
> | tst                  | 3.8 | 16.4 | 61.3 | 139.1 | 313.9 | 707.3 | 1641.3 | 3856.1 | 9002.3  | 19793.8 |

Yeah. It's hard to beat hash lookup when it's hitting a good case of O(1),
but what are the chances that it stays this way?
Are you saying you can size up the table and populate to good % just once?

If so it's probably better to replace all strings with something
like a good hash.
7M elements is not a lot. A hash producing 8 or 16 bytes will have close
to zero false positives.
And in case of "populate the table once" the hash seed can be
precomputed and adjusted, so you can guarantee zero collisions
for 7M strings. While lookup part can still have 0.001% chance
of a false positive there could be a way to deal with it after lookup.

> Ternary search tree always has better memory usage:
>
> | memory usage (MB) | 4K  | 16K | 64K  | 128K | 256K | 512K | 1M   | 2M    | 4M    | 7M     |
> | ----------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ----- | ----- | ------ |
> | hash              | 2.2 | 8.9 | 35.5 | 71   | 142  | 284  | 568  | 1136  | 2272  | 4302.5 |
> | tries             | 2.1 | 8.5 | 34   | 68   | 136  | 272  | 544  | 1088  | 2176  | 4106.9 |
> | tst               | 0.5 | 1.6 | 5.6  | 10.6 | 20.3 | 38.6 | 73.1 | 138.6 | 264.6 | 479.5  |
>

Ternary search tree looks amazing.
Since you have a prototype can you wrap it into a new type of bpf map
and post the patches?
I wonder what data structures look like to achieve such memory efficiency.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab
  2022-02-27  3:08         ` Alexei Starovoitov
@ 2022-03-09 11:47           ` Hou Tao
  0 siblings, 0 replies; 9+ messages in thread
From: Hou Tao @ 2022-03-09 11:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, John Fastabend, Network Development,
	bpf, Joanne Koong

Hi,

On 2/27/2022 11:08 AM, Alexei Starovoitov wrote:
> On Sat, Feb 26, 2022 at 4:16 AM Hou Tao <houtao1@huawei.com> wrote:
>>
>> For now, our case is a write-once case, so only lookup is considered.
>> When data set is bigger than 128KB, hash table has better lookup performance as
>> show below:
>>
>> | lookup all elem (ms) | 4K  | 16K  | 64K  | 128K  | 256K  | 512K  | 1M     | 2M     | 4M      | 7M      |
>> | -------------------- | --- | ---- | ---- | ----- | ----- | ----- | ------ | ------ | ------- | ------- |
>> | hash                 | 3.3 | 12.7 | 47   | 90.6  | 185.9 | 382.3 | 788.5  | 1622.4 | 3296    | 6248.7  |
>> | tries                | 2   | 10   | 45.9 | 111.6 | 274.6 | 688.9 | 1747.2 | 4394.5 | 11229.8 | 27148.8 |
>> | tst                  | 3.8 | 16.4 | 61.3 | 139.1 | 313.9 | 707.3 | 1641.3 | 3856.1 | 9002.3  | 19793.8 |
> 
> Yeah. It's hard to beat hash lookup when it's hitting a good case of O(1),
> but what are the chances that it stays this way?
> Are you saying you can size up the table and populate to good % just once?
>
Yes. for our case the hash table is populated only once. During these test the
hash table is populated firstly by inserting all strings into the table and then
do the lookup. The strings for all tests come from the same string set.

> If so it's probably better to replace all strings with something
> like a good hash
A strong one like sha1sum and using the string as hash-table value just
as proposed in previous email ?

> 7M elements is not a lot. A hash producing 8 or 16 bytes will have close
> to zero false positives.
> And in case of "populate the table once" the hash seed can be
> precomputed and adjusted, so you can guarantee zero collisions
> for 7M strings. While lookup part can still have 0.001% chance
> of a false positive there could be a way to deal with it after lookup.
>
I can try the above method. But the lookup procedure will be slowed done by
calculating a good hash and the memory usage will not reduced.

>> Ternary search tree always has better memory usage:
>>
>> | memory usage (MB) | 4K  | 16K | 64K  | 128K | 256K | 512K | 1M   | 2M    | 4M    | 7M     |
>> | ----------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ----- | ----- | ------ |
>> | hash              | 2.2 | 8.9 | 35.5 | 71   | 142  | 284  | 568  | 1136  | 2272  | 4302.5 |
>> | tries             | 2.1 | 8.5 | 34   | 68   | 136  | 272  | 544  | 1088  | 2176  | 4106.9 |
>> | tst               | 0.5 | 1.6 | 5.6  | 10.6 | 20.3 | 38.6 | 73.1 | 138.6 | 264.6 | 479.5  |
>>
> 
> Ternary search tree looks amazing.
> Since you have a prototype can you wrap it into a new type of bpf map
> and post the patches?
Will do.

> I wonder what data structures look like to achieve such memory efficiency.
The lower memory usage partially is due to the string set for test is full file
paths and these paths share the same prefix. And ternary search tree reduces the
memory usage by sharing the common prefix.
> .
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-09 11:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14 11:13 [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Hou Tao
2022-02-14 11:13 ` [RFC PATCH bpf-next v2 1/3] bpf: add support for string in hash table key Hou Tao
2022-02-14 11:13 ` [RFC PATCH bpf-next v2 2/3] selftests/bpf: add a simple test for htab str key Hou Tao
2022-02-14 11:13 ` [RFC PATCH bpf-next v2 3/3] selftests/bpf: add benchmark for string-key hash table Hou Tao
2022-02-17  3:50 ` [RFC PATCH bpf-next v2 0/3] bpf: support string key in htab Alexei Starovoitov
2022-02-18 13:53   ` Hou Tao
2022-02-19 18:44     ` Alexei Starovoitov
     [not found]       ` <ecc04a70-0b57-62ef-ab52-e7169845d789@huawei.com>
2022-02-27  3:08         ` Alexei Starovoitov
2022-03-09 11:47           ` Hou Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).