bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/4] Implement bloom filter map
@ 2021-09-14  4:04 Joanne Koong
  2021-09-14  4:04 ` [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation Joanne Koong
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Joanne Koong @ 2021-09-14  4:04 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patchset adds a new kind of bpf map: the bloom filter map.
Bloom filters are a space-efficient probabilistic data structure
used to quickly test whether an element exists in a set.
For a brief overview about how bloom filters work,
https://en.wikipedia.org/wiki/Bloom_filter
may be helpful.

One example use-case is an application leveraging a bloom filter
map to determine whether a computationally expensive hashmap
lookup can be avoided. If the element was not found in the bloom
filter map, the hashmap lookup can be skipped.

This patchset includes benchmarks for testing out the performance of
the bloom filter for different entry sizes and different number of
hash functions used, as well as comparisons for hashmap lookups
with vs. without the bloom filter.

v1 -> v2:
* Remove libbpf changes, and pass the number of hash functions through
map_flags instead.
* Default to using 5 hash functions if no number of hash functions
is specified.
* Use set_bit instead of spinlocks in the bloom filter bitmap. This
improved the speed significantly. For example, using 5 hash functions
with 100k entries, there was roughly a 35% speed increase.
* Use jhash2 (instead of jhash) for u32-aligned value sizes. This
increased the speed by roughly 5 to 15%. When using jhash2 on value
sizes non-u32 aligned (truncating any remainder bits), there was not
a noticeable difference.
* Add test for using the bloom filter as an inner map.
* Reran the benchmarks, updated the commit messages to correspond to
the new results.

Joanne Koong (4):
  bpf: Add bloom filter map implementation
  selftests/bpf: Add bloom filter map test cases
  bpf/benchs: Add benchmark test for bloom filter maps
  bpf/benchs: Add benchmarks for comparing hashmap lookups with vs.
    without bloom filter

 include/linux/bpf_types.h                     |   1 +
 include/uapi/linux/bpf.h                      |  10 +
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/bloom_filter.c                     | 205 +++++++++
 kernel/bpf/syscall.c                          |  14 +-
 kernel/bpf/verifier.c                         |  19 +-
 tools/include/uapi/linux/bpf.h                |  10 +
 tools/testing/selftests/bpf/Makefile          |   4 +-
 tools/testing/selftests/bpf/bench.c           |  57 ++-
 tools/testing/selftests/bpf/bench.h           |   3 +
 .../bpf/benchs/bench_bloom_filter_map.c       | 393 ++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  |  43 ++
 .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
 .../selftests/bpf/benchs/run_common.sh        |  60 +++
 .../bpf/prog_tests/bloom_filter_map.c         | 177 ++++++++
 .../selftests/bpf/progs/bloom_filter_map.c    | 156 +++++++
 16 files changed, 1144 insertions(+), 40 deletions(-)
 create mode 100644 kernel/bpf/bloom_filter.c
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
 create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_map.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation
  2021-09-14  4:04 [PATCH v2 bpf-next 0/4] Implement bloom filter map Joanne Koong
@ 2021-09-14  4:04 ` Joanne Koong
  2021-09-17 17:01   ` Alexei Starovoitov
  2021-09-17 21:48   ` Andrii Nakryiko
  2021-09-14  4:04 ` [PATCH v2 bpf-next 2/4] selftests/bpf: Add bloom filter map test cases Joanne Koong
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Joanne Koong @ 2021-09-14  4:04 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

Bloom filters are a space-efficient probabilistic data structure
used to quickly test whether an element exists in a set.
In a bloom filter, false positives are possible whereas false
negatives should never be.

This patch adds a bloom filter map for bpf programs.
The bloom filter map supports peek (determining whether an element
is present in the map) and push (adding an element to the map)
operations.These operations are exposed to userspace applications
through the already existing syscalls in the following way:

BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_UPDATE_ELEM -> push

The bloom filter map does not have keys, only values. In light of
this, the bloom filter map's API matches that of queue stack maps:
user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
APIs to query or add an element to the bloom filter map. When the
bloom filter map is created, it must be created with a key_size of 0.

For updates, the user will pass in the element to add to the map
as the value, with a NULL key. For lookups, the user will pass in the
element to query in the map as the value. In the verifier layer, this
requires us to modify the argument type of a bloom filter's
BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
the syscall layer, we need to copy over the user value so that in
bpf_map_peek_elem, we know which specific value to query.

A few things to please take note of:
 * If there are any concurrent lookups + updates, the user is
responsible for synchronizing this to ensure no false negative lookups
occur.
 * The number of hashes to use for the bloom filter is configurable from
userspace. If no number is specified, the default used will be 5 hash
functions. The benchmarks later in this patchset can help compare the
performance of using different number of hashes on different entry
sizes. In general, using more hashes decreases the speed of a lookup,
but increases the false positive rate of an element being detected in the
bloom filter.
 * Deleting an element in the bloom filter map is not supported.
 * The bloom filter map may be used as an inner map.
 * The "max_entries" size that is specified at map creation time is used to
approximate a reasonable bitmap size for the bloom filter, and is not
otherwise strictly enforced. If the user wishes to insert more entries into
the bloom filter than "max_entries", they may do so but they should be
aware that this may lead to a higher false positive rate.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 include/linux/bpf_types.h      |   1 +
 include/uapi/linux/bpf.h       |  10 ++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/bloom_filter.c      | 205 +++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  14 ++-
 kernel/bpf/verifier.c          |  19 ++-
 tools/include/uapi/linux/bpf.h |  10 ++
 7 files changed, 255 insertions(+), 6 deletions(-)
 create mode 100644 kernel/bpf/bloom_filter.c

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 9c81724e4b98..c4424ac2fa02 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
 
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 791f31dd0abe..1d82860fd98e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_RINGBUF,
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BLOOM_FILTER,
 };
 
 /* Note that tracing related programs such as
@@ -1210,6 +1211,15 @@ enum {
 
 /* Create a map that is suitable to be an inner map with dynamic max entries */
 	BPF_F_INNER_MAP		= (1U << 12),
+
+/* For bloom filter maps, the next 4 bits represent how many hashes to use.
+ * The maximum number of hash functions supported is 15. If this is not set,
+ * the default number of hash functions used will be 5.
+ */
+	BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
+	BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
+	BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
+	BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),
 };
 
 /* Flags for BPF_PROG_QUERY. */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 7f33098ca63f..cf6ca339f3cd 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -7,7 +7,7 @@ endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
-obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
+obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
new file mode 100644
index 000000000000..43a17c5b35ac
--- /dev/null
+++ b/kernel/bpf/bloom_filter.c
@@ -0,0 +1,205 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <linux/bitmap.h>
+#include <linux/bpf.h>
+#include <linux/err.h>
+#include <linux/jhash.h>
+#include <linux/random.h>
+
+#define BLOOM_FILTER_HASH_BITMASK \
+	 (BPF_F_BLOOM_FILTER_HASH_BIT_1 | BPF_F_BLOOM_FILTER_HASH_BIT_2 | \
+	 BPF_F_BLOOM_FILTER_HASH_BIT_3 | BPF_F_BLOOM_FILTER_HASH_BIT_4)
+
+#define BLOOM_FILTER_CREATE_FLAG_MASK \
+	(BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK | \
+	 BLOOM_FILTER_HASH_BITMASK)
+
+struct bpf_bloom_filter {
+	struct bpf_map map;
+	u32 bit_array_mask;
+	u32 hash_seed;
+	/* If the size of the values in the bloom filter is u32 aligned,
+	 * then it is more performant to use jhash2 as the underlying hash
+	 * function, else we use jhash. This tracks the number of u32s
+	 * in an u32-aligned value size. If the value size is not u32 aligned,
+	 * this will be 0.
+	 */
+	u32 aligned_u32_count;
+	u8 nr_hashes;
+	unsigned long bit_array[];
+};
+
+static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
+{
+	struct bpf_bloom_filter *bloom_filter =
+		container_of(map, struct bpf_bloom_filter, map);
+	u32 hash;
+	u8 i;
+
+	for (i = 0; i < bloom_filter->nr_hashes; i++) {
+		if (bloom_filter->aligned_u32_count)
+			hash = jhash2(value, bloom_filter->aligned_u32_count,
+				      bloom_filter->hash_seed + i) &
+				bloom_filter->bit_array_mask;
+		else
+			hash = jhash(value, map->value_size,
+				     bloom_filter->hash_seed + i) &
+				bloom_filter->bit_array_mask;
+
+		if (!test_bit(hash, bloom_filter->bit_array))
+			return -ENOENT;
+	}
+
+	return 0;
+}
+
+static u8 get_nr_hashes(u32 map_flags)
+{
+	u8 nr_hashes = (map_flags & BLOOM_FILTER_HASH_BITMASK) >>
+		ilog2(BPF_F_BLOOM_FILTER_HASH_BIT_1);
+
+	/* Default to 5 if no number of hashes was specified */
+	return nr_hashes == 0 ? 5 : nr_hashes;
+}
+
+static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
+{
+	u32 nr_bits, bit_array_bytes, bit_array_mask;
+	int numa_node = bpf_map_attr_numa_node(attr);
+	struct bpf_bloom_filter *bloom_filter;
+	u8 nr_hashes;
+
+	if (!bpf_capable())
+		return ERR_PTR(-EPERM);
+
+	if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
+	    attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags))
+		return ERR_PTR(-EINVAL);
+
+	nr_hashes = get_nr_hashes(attr->map_flags);
+
+	/* For the bloom filter, the optimal bit array size that minimizes the
+	 * false positive probability is n * k / ln(2) where n is the number of
+	 * expected entries in the bloom filter and k is the number of hash
+	 * functions. We use 7 / 5 to approximate 1 / ln(2).
+	 *
+	 * We round this up to the nearest power of two to enable more efficient
+	 * hashing using bitmasks. The bitmask will be the bit array size - 1.
+	 *
+	 * If this overflows a u32, the bit array size will have 2^32 (4
+	 * GB) bits.
+	 */
+	if (check_mul_overflow(attr->max_entries, (u32)nr_hashes, &nr_bits) ||
+	    check_mul_overflow(nr_bits / 5, (u32)7, &nr_bits) ||
+	    nr_bits > (1UL << 31)) {
+		/* The bit array size is 2^32 bits but to avoid overflowing the
+		 * u32, we use BITS_TO_BYTES(U32_MAX), which will round up to the
+		 * equivalent number of bytes
+		 */
+		bit_array_bytes = BITS_TO_BYTES(U32_MAX);
+		bit_array_mask = U32_MAX;
+	} else {
+		if (nr_bits <= BITS_PER_LONG)
+			nr_bits = BITS_PER_LONG;
+		else
+			nr_bits = roundup_pow_of_two(nr_bits);
+		bit_array_bytes = BITS_TO_BYTES(nr_bits);
+		bit_array_mask = nr_bits - 1;
+	}
+
+	bit_array_bytes = roundup(bit_array_bytes, sizeof(unsigned long));
+	bloom_filter = bpf_map_area_alloc(sizeof(*bloom_filter) + bit_array_bytes,
+					  numa_node);
+
+	if (!bloom_filter)
+		return ERR_PTR(-ENOMEM);
+
+	bpf_map_init_from_attr(&bloom_filter->map, attr);
+
+	bloom_filter->nr_hashes = nr_hashes;
+	bloom_filter->bit_array_mask = bit_array_mask;
+	if ((attr->value_size & (sizeof(u32) - 1)) == 0)
+		bloom_filter->aligned_u32_count = attr->value_size / sizeof(u32);
+
+	if (!(attr->map_flags & BPF_F_ZERO_SEED))
+		bloom_filter->hash_seed = get_random_int();
+
+	return &bloom_filter->map;
+}
+
+static void bloom_filter_map_free(struct bpf_map *map)
+{
+	struct bpf_bloom_filter *bloom_filter =
+		container_of(map, struct bpf_bloom_filter, map);
+
+	bpf_map_area_free(bloom_filter);
+}
+
+static int bloom_filter_map_push_elem(struct bpf_map *map, void *value,
+				      u64 flags)
+{
+	struct bpf_bloom_filter *bloom_filter =
+		container_of(map, struct bpf_bloom_filter, map);
+	u32 hash;
+	u8 i;
+
+	if (flags != BPF_ANY)
+		return -EINVAL;
+
+	for (i = 0; i < bloom_filter->nr_hashes; i++) {
+		if (bloom_filter->aligned_u32_count)
+			hash = jhash2(value, bloom_filter->aligned_u32_count,
+				      bloom_filter->hash_seed + i) &
+				bloom_filter->bit_array_mask;
+		else
+			hash = jhash(value, map->value_size,
+				     bloom_filter->hash_seed + i) &
+				bloom_filter->bit_array_mask;
+
+		set_bit(hash, bloom_filter->bit_array);
+	}
+
+	return 0;
+}
+
+static void *bloom_filter_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	/* The eBPF program should use map_peek_elem instead */
+	return ERR_PTR(-EINVAL);
+}
+
+static int bloom_filter_map_update_elem(struct bpf_map *map, void *key,
+					void *value, u64 flags)
+{
+	/* The eBPF program should use map_push_elem instead */
+	return -EINVAL;
+}
+
+static int bloom_filter_map_delete_elem(struct bpf_map *map, void *key)
+{
+	return -EOPNOTSUPP;
+}
+
+static int bloom_filter_map_get_next_key(struct bpf_map *map, void *key,
+					 void *next_key)
+{
+	return -EOPNOTSUPP;
+}
+
+static int bloom_filter_map_btf_id;
+const struct bpf_map_ops bloom_filter_map_ops = {
+	.map_meta_equal = bpf_map_meta_equal,
+	.map_alloc = bloom_filter_map_alloc,
+	.map_free = bloom_filter_map_free,
+	.map_push_elem = bloom_filter_map_push_elem,
+	.map_peek_elem = bloom_filter_map_peek_elem,
+	.map_lookup_elem = bloom_filter_map_lookup_elem,
+	.map_update_elem = bloom_filter_map_update_elem,
+	.map_delete_elem = bloom_filter_map_delete_elem,
+	.map_get_next_key = bloom_filter_map_get_next_key,
+	.map_check_btf = map_check_no_btf,
+	.map_btf_name = "bpf_bloom_filter",
+	.map_btf_id = &bloom_filter_map_btf_id,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4e50c0bfdb7d..9865b5b1e667 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -199,7 +199,8 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
 		err = bpf_fd_reuseport_array_update_elem(map, key, value,
 							 flags);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
+		   map->map_type == BPF_MAP_TYPE_STACK ||
+		   map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
 		err = map->ops->map_push_elem(map, value, flags);
 	} else {
 		rcu_read_lock();
@@ -238,7 +239,8 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
+		   map->map_type == BPF_MAP_TYPE_STACK ||
+		   map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
 		err = map->ops->map_peek_elem(map, value);
 	} else if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
 		/* struct_ops map requires directly updating "value" */
@@ -1080,6 +1082,14 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (!value)
 		goto free_key;
 
+	if (map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
+		if (copy_from_user(value, uvalue, value_size))
+			err = -EFAULT;
+		else
+			err = bpf_map_copy_value(map, key, value, attr->flags);
+		goto free_value;
+	}
+
 	err = bpf_map_copy_value(map, key, value, attr->flags);
 	if (err)
 		goto free_value;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 047ac4b4703b..5cbcff4c2222 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4813,7 +4813,10 @@ static int resolve_map_arg_type(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 		break;
-
+	case BPF_MAP_TYPE_BLOOM_FILTER:
+		if (meta->func_id == BPF_FUNC_map_peek_elem)
+			*arg_type = ARG_PTR_TO_MAP_VALUE;
+		break;
 	default:
 		break;
 	}
@@ -5388,6 +5391,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    func_id != BPF_FUNC_task_storage_delete)
 			goto error;
 		break;
+	case BPF_MAP_TYPE_BLOOM_FILTER:
+		if (func_id != BPF_FUNC_map_push_elem &&
+		    func_id != BPF_FUNC_map_peek_elem)
+			goto error;
+		break;
 	default:
 		break;
 	}
@@ -5455,13 +5463,18 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    map->map_type != BPF_MAP_TYPE_SOCKHASH)
 			goto error;
 		break;
-	case BPF_FUNC_map_peek_elem:
 	case BPF_FUNC_map_pop_elem:
-	case BPF_FUNC_map_push_elem:
 		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
 		    map->map_type != BPF_MAP_TYPE_STACK)
 			goto error;
 		break;
+	case BPF_FUNC_map_push_elem:
+	case BPF_FUNC_map_peek_elem:
+		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
+		    map->map_type != BPF_MAP_TYPE_STACK &&
+		    map->map_type != BPF_MAP_TYPE_BLOOM_FILTER)
+			goto error;
+		break;
 	case BPF_FUNC_sk_storage_get:
 	case BPF_FUNC_sk_storage_delete:
 		if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 791f31dd0abe..1d82860fd98e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_RINGBUF,
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BLOOM_FILTER,
 };
 
 /* Note that tracing related programs such as
@@ -1210,6 +1211,15 @@ enum {
 
 /* Create a map that is suitable to be an inner map with dynamic max entries */
 	BPF_F_INNER_MAP		= (1U << 12),
+
+/* For bloom filter maps, the next 4 bits represent how many hashes to use.
+ * The maximum number of hash functions supported is 15. If this is not set,
+ * the default number of hash functions used will be 5.
+ */
+	BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
+	BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
+	BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
+	BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),
 };
 
 /* Flags for BPF_PROG_QUERY. */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 2/4] selftests/bpf: Add bloom filter map test cases
  2021-09-14  4:04 [PATCH v2 bpf-next 0/4] Implement bloom filter map Joanne Koong
  2021-09-14  4:04 ` [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation Joanne Koong
@ 2021-09-14  4:04 ` Joanne Koong
  2021-09-14  4:04 ` [PATCH v2 bpf-next 3/4] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
  2021-09-14  4:04 ` [PATCH v2 bpf-next 4/4] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong
  3 siblings, 0 replies; 11+ messages in thread
From: Joanne Koong @ 2021-09-14  4:04 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patch adds test cases for bpf bloom filter maps. They include tests
checking against invalid operations by userspace, tests for using the bloom
filter map as an inner map, and a bpf program that queries the bloom filter
map for values added by a userspace program.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 .../bpf/prog_tests/bloom_filter_map.c         | 177 ++++++++++++++++++
 .../selftests/bpf/progs/bloom_filter_map.c    |  82 ++++++++
 2 files changed, 259 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_map.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c b/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
new file mode 100644
index 000000000000..eb81aab0d7be
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <sys/syscall.h>
+#include <test_progs.h>
+#include "bloom_filter_map.skel.h"
+
+static void test_bloom_filter_map_fail(void)
+{
+	struct bpf_create_map_attr xattr = {
+		.name = "bloom_filter_map",
+		.map_type = BPF_MAP_TYPE_BLOOM_FILTER,
+		.max_entries = 100,
+		.value_size = sizeof(__u32),
+		.map_flags = BPF_F_BLOOM_FILTER_HASH_BIT_2,
+	};
+	__u32 value;
+	int fd, err;
+
+	/* Invalid key size */
+	xattr.key_size = 4;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid key size"))
+		close(fd);
+	xattr.key_size = 0;
+
+	/* Invalid value size */
+	xattr.value_size = 0;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid value size"))
+		close(fd);
+	xattr.value_size = sizeof(__u32);
+
+	/* Invalid max entries size */
+	xattr.max_entries = 0;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid max entries size"))
+		close(fd);
+	xattr.max_entries = 100;
+
+	/* Bloom filter maps do not support BPF_F_NO_PREALLOC */
+	xattr.map_flags = BPF_F_NO_PREALLOC;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid flags"))
+		close(fd);
+	xattr.map_flags = 0;
+
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_GE(fd, 0, "bpf_create_map bloom filter"))
+		return;
+
+	/* Test invalid flags */
+	err = bpf_map_update_elem(fd, NULL, &value, -1);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_EXIST);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_F_LOCK);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_NOEXIST);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, 10000);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	close(fd);
+}
+
+static void bloom_filter_map(struct bloom_filter_map *skel)
+{
+	const int map_size = bpf_map__max_entries(skel->maps.map_random_data);
+	int err, map_random_data_fd, map_bloom_filter_fd, i;
+	__u64 val;
+	struct bpf_link *link;
+
+	map_random_data_fd = bpf_map__fd(skel->maps.map_random_data);
+	map_bloom_filter_fd = bpf_map__fd(skel->maps.map_bloom_filter);
+
+	/* Generate random values and add them to the maps */
+	for (i = 0; i < map_size; i++) {
+		val = rand();
+		err = bpf_map_update_elem(map_random_data_fd, &i, &val, BPF_ANY);
+		if (!ASSERT_OK(err, "Add random value to map_random_data"))
+			continue;
+
+		err = bpf_map_update_elem(map_bloom_filter_fd, NULL, &val, 0);
+		if (!ASSERT_OK(err, "Add random value to map_bloom_filter"))
+			return;
+	}
+
+	link = bpf_program__attach(skel->progs.prog_bloom_filter);
+	if (!ASSERT_OK_PTR(link, "link"))
+		return;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->error, 0, "error");
+
+	bpf_link__destroy(link);
+}
+
+static void bloom_filter_inner_map(struct bloom_filter_map *skel)
+{
+	const int map_size = bpf_map__max_entries(skel->maps.map_random_data);
+	int outer_map_fd, inner_map_fd, map_random_data_fd, err, i, key = 0;
+	struct bpf_create_map_attr xattr = {
+		.name = "bloom_filter_inner_map",
+		.map_type = BPF_MAP_TYPE_BLOOM_FILTER,
+		.max_entries = map_size,
+		.value_size = sizeof(__u64),
+	};
+	struct bpf_link *link;
+	__u64 val;
+
+	/* Create a bloom filter map that will be used as the inner map */
+	inner_map_fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_GE(inner_map_fd, 0, "bpf_create_map bloom filter map as inner map"))
+		return;
+
+	/* Generate random values and add them to the maps */
+	map_random_data_fd = bpf_map__fd(skel->maps.map_random_data);
+	for (i = 0; i < map_size; i++) {
+		val = rand();
+		err = bpf_map_update_elem(map_random_data_fd, &i, &val, BPF_ANY);
+		if (!ASSERT_OK(err, "Add random value to map_random_data"))
+			continue;
+
+		err = bpf_map_update_elem(inner_map_fd, NULL, &val, 0);
+		if (!ASSERT_OK(err, "Add random value to inner_map_fd"))
+			goto done;
+	}
+
+	outer_map_fd = bpf_map__fd(skel->maps.outer_map);
+	/* Add the bloom filter map to the outer map */
+	err = bpf_map_update_elem(outer_map_fd, &key, &inner_map_fd, 0);
+	if (!ASSERT_OK(err, "Add bloom filter map to outer map"))
+		goto done;
+
+	/* Attach the bloom_filter_inner_map prog */
+	link = bpf_program__attach(skel->progs.prog_bloom_filter_inner_map);
+	if (!ASSERT_OK_PTR(link, "link"))
+		goto delete_inner_map;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->error, 0, "error");
+
+	bpf_link__destroy(link);
+
+delete_inner_map:
+	/* Ensure the inner bloom filter map can be deleted */
+	err = bpf_map_delete_elem(outer_map_fd, &key);
+	ASSERT_OK(err, "Delete inner bloom filter map");
+
+done:
+	close(inner_map_fd);
+}
+
+void test_bloom_filter_map(void)
+{
+	struct bloom_filter_map *skel;
+
+	test_bloom_filter_map_fail();
+
+	skel = bloom_filter_map__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "bloom_filter_map__open_and_load"))
+		goto cleanup;
+
+	bloom_filter_map(skel);
+
+	bloom_filter_inner_map(skel);
+
+cleanup:
+	bloom_filter_map__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bloom_filter_map.c b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
new file mode 100644
index 000000000000..8b5bf8d61a40
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct bpf_map;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1000);
+	__type(key, __u32);
+	__type(value, __u64);
+} map_random_data SEC(".maps");
+
+struct map_bloom_filter_type {
+	__uint(type, BPF_MAP_TYPE_BLOOM_FILTER);
+	__uint(key_size, 0);
+	__uint(value_size, sizeof(__u64));
+	__uint(max_entries, 1000);
+} map_bloom_filter SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(int));
+	__uint(value_size, sizeof(int));
+	__array(values, struct map_bloom_filter_type);
+} outer_map SEC(".maps");
+
+struct callback_ctx {
+	struct map_bloom_filter_type *map;
+};
+
+int error = 0;
+
+static __u64
+check_elem(struct bpf_map *map, __u32 *key, __u64 *val,
+	   struct callback_ctx *data)
+{
+	int err;
+
+	err = bpf_map_peek_elem(data->map, val);
+	if (err) {
+		error |= 1;
+		return 1; /* stop the iteration */
+	}
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter(void *ctx)
+{
+	struct callback_ctx data;
+
+	data.map = &map_bloom_filter;
+	bpf_for_each_map_elem(&map_random_data, check_elem, &data, 0);
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter_inner_map(void *ctx)
+{
+	struct map_bloom_filter_type *inner_map;
+	struct callback_ctx data;
+	int key = 0;
+
+	inner_map = bpf_map_lookup_elem(&outer_map, &key);
+	if (!inner_map) {
+		error |= 2;
+		return 0;
+	}
+
+	data.map = inner_map;
+	bpf_for_each_map_elem(&map_random_data, check_elem, &data, 0);
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 3/4] bpf/benchs: Add benchmark test for bloom filter maps
  2021-09-14  4:04 [PATCH v2 bpf-next 0/4] Implement bloom filter map Joanne Koong
  2021-09-14  4:04 ` [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation Joanne Koong
  2021-09-14  4:04 ` [PATCH v2 bpf-next 2/4] selftests/bpf: Add bloom filter map test cases Joanne Koong
@ 2021-09-14  4:04 ` Joanne Koong
  2021-09-14  4:04 ` [PATCH v2 bpf-next 4/4] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong
  3 siblings, 0 replies; 11+ messages in thread
From: Joanne Koong @ 2021-09-14  4:04 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patch adds benchmark tests for the throughput and false positive
rate of bloom filter map lookups for a given number of entries and a
given number of hash functions.

These benchmarks show that as the number of hash functions increases,
the throughput and the false positive rate of the bloom filter map
decreases. From the benchmark data, the approximate average
false-positive rates are roughly as follows:

1 hash function = ~30%
2 hash functions = ~15%
3 hash functions = ~5%
4 hash functions = ~2.5%
5 hash functions = ~1%
6 hash functions = ~0.5%
7 hash functions  = ~0.35%
8 hash functions = ~0.15%
9 hash functions = ~0.1%
10 hash functions = ~0%

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 tools/testing/selftests/bpf/Makefile          |   4 +-
 tools/testing/selftests/bpf/bench.c           |  35 ++
 tools/testing/selftests/bpf/bench.h           |   3 +
 .../bpf/benchs/bench_bloom_filter_map.c       | 354 ++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
 .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
 .../selftests/bpf/benchs/run_common.sh        |  48 +++
 .../selftests/bpf/progs/bloom_filter_map.c    |  74 ++++
 8 files changed, 547 insertions(+), 29 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
 create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 866531c08e4f..3576fdff117c 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -519,13 +519,15 @@ $(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h
 $(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h
 $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
 			    $(OUTPUT)/perfbuf_bench.skel.h
+$(OUTPUT)/bench_bloom_filter_map.o: $(OUTPUT)/bloom_filter_map.skel.h
 $(OUTPUT)/bench.o: bench.h testing_helpers.h
 $(OUTPUT)/bench: LDLIBS += -lm
 $(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \
 		 $(OUTPUT)/bench_count.o \
 		 $(OUTPUT)/bench_rename.o \
 		 $(OUTPUT)/bench_trigger.o \
-		 $(OUTPUT)/bench_ringbufs.o
+		 $(OUTPUT)/bench_ringbufs.o \
+		 $(OUTPUT)/bench_bloom_filter_map.o
 	$(call msg,BINARY,,$@)
 	$(Q)$(CC) $(LDFLAGS) -o $@ $(filter %.a %.o,$^) $(LDLIBS)
 
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 6ea15b93a2f8..0bcbdb4405a3 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -51,6 +51,35 @@ void setup_libbpf()
 		fprintf(stderr, "failed to increase RLIMIT_MEMLOCK: %d", err);
 }
 
+void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns)
+{
+	long total = res->false_hits  + res->hits + res->drops;
+
+	printf("Iter %3d (%7.3lfus): ",
+	       iter, (delta_ns - 1000000000) / 1000.0);
+
+	printf("%ld false hits of %ld total operations. Percentage = %2.2f %%\n",
+	       res->false_hits, total, ((float)res->false_hits / total) * 100);
+}
+
+void false_hits_report_final(struct bench_res res[], int res_cnt)
+{
+	long total_hits = 0, total_drops = 0, total_false_hits = 0, total_ops = 0;
+	int i;
+
+	for (i = 0; i < res_cnt; i++) {
+		total_hits += res[i].hits;
+		total_false_hits += res[i].false_hits;
+		total_drops += res[i].drops;
+	}
+	total_ops = total_hits + total_false_hits + total_drops;
+
+	printf("Summary: %ld false hits of %ld total operations. ",
+	       total_false_hits, total_ops);
+	printf("Percentage =  %2.2f %%\n",
+	       ((float)total_false_hits / total_ops) * 100);
+}
+
 void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns)
 {
 	double hits_per_sec, drops_per_sec;
@@ -132,9 +161,11 @@ static const struct argp_option opts[] = {
 };
 
 extern struct argp bench_ringbufs_argp;
+extern struct argp bench_bloom_filter_map_argp;
 
 static const struct argp_child bench_parsers[] = {
 	{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
+	{ &bench_bloom_filter_map_argp, 0, "Bloom filter map benchmark", 0 },
 	{},
 };
 
@@ -323,6 +354,8 @@ extern const struct bench bench_rb_libbpf;
 extern const struct bench bench_rb_custom;
 extern const struct bench bench_pb_libbpf;
 extern const struct bench bench_pb_custom;
+extern const struct bench bench_bloom_filter_map;
+extern const struct bench bench_bloom_filter_false_positive;
 
 static const struct bench *benchs[] = {
 	&bench_count_global,
@@ -344,6 +377,8 @@ static const struct bench *benchs[] = {
 	&bench_rb_custom,
 	&bench_pb_libbpf,
 	&bench_pb_custom,
+	&bench_bloom_filter_map,
+	&bench_bloom_filter_false_positive,
 };
 
 static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/bench.h b/tools/testing/selftests/bpf/bench.h
index c1f48a473b02..624c6b11501f 100644
--- a/tools/testing/selftests/bpf/bench.h
+++ b/tools/testing/selftests/bpf/bench.h
@@ -33,6 +33,7 @@ struct env {
 struct bench_res {
 	long hits;
 	long drops;
+	long false_hits;
 };
 
 struct bench {
@@ -56,6 +57,8 @@ extern const struct bench *bench;
 void setup_libbpf();
 void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns);
 void hits_drops_report_final(struct bench_res res[], int res_cnt);
+void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns);
+void false_hits_report_final(struct bench_res res[], int res_cnt);
 
 static inline __u64 get_time_ns() {
 	struct timespec t;
diff --git a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
new file mode 100644
index 000000000000..2cce4f657646
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
@@ -0,0 +1,354 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <argp.h>
+#include <linux/log2.h>
+#include <pthread.h>
+#include "bench.h"
+#include "bloom_filter_map.skel.h"
+#include "bpf_util.h"
+
+static struct ctx {
+	struct bloom_filter_map *skel;
+	pthread_mutex_t map_done_mtx;
+	pthread_cond_t map_done;
+	bool map_prepare_err;
+	__u32 next_map_idx;
+} ctx = {
+	.map_done_mtx = PTHREAD_MUTEX_INITIALIZER,
+	.map_done = PTHREAD_COND_INITIALIZER,
+};
+
+static struct {
+	__u32 nr_entries;
+	__u8 nr_hashes;
+} args = {
+	.nr_entries = 1000,
+	.nr_hashes = 3,
+};
+
+enum {
+	ARG_NR_ENTRIES = 3000,
+	ARG_NR_HASHES = 3001,
+};
+
+static const struct argp_option opts[] = {
+	{ "nr_entries", ARG_NR_ENTRIES, "NR_ENTRIES", 0,
+		"Set number of entries in the bloom filter map"},
+	{ "nr_hashes", ARG_NR_HASHES, "NR_HASHES", 0,
+		"Set number of hashes in the bloom filter map"},
+	{},
+};
+
+static error_t parse_arg(int key, char *arg, struct argp_state *state)
+{
+	switch (key) {
+	case ARG_NR_ENTRIES:
+		args.nr_entries = strtol(arg, NULL, 10);
+		if (args.nr_entries == 0) {
+			fprintf(stderr, "Invalid nr_entries count.");
+			argp_usage(state);
+		}
+		break;
+	case ARG_NR_HASHES:
+		args.nr_hashes = strtol(arg, NULL, 10);
+		if (args.nr_hashes == 0) {
+			fprintf(stderr, "Cannot specify a bloom filter map with 0 hashes.");
+			argp_usage(state);
+		} else if (args.nr_hashes > 16) {
+			fprintf(stderr, "Bloom filter maps only support up to 16 hashes.");
+			argp_usage(state);
+		}
+		break;
+	default:
+		return ARGP_ERR_UNKNOWN;
+	}
+
+	return 0;
+}
+
+/* exported into benchmark runner */
+const struct argp bench_bloom_filter_map_argp = {
+	.options = opts,
+	.parser = parse_arg,
+};
+
+static void validate(void)
+{
+	if (env.consumer_cnt != 1) {
+		fprintf(stderr, "bloom filter map benchmark doesn't support multi-consumer!\n");
+		exit(1);
+	}
+}
+
+static inline void trigger_bpf_program(void)
+{
+	syscall(__NR_getpgid);
+}
+
+static void *producer(void *input)
+{
+	while (true)
+		trigger_bpf_program();
+
+	return NULL;
+}
+
+static void *map_prepare_thread(void *arg)
+{
+	int err, random_data_fd, bloom_filter_fd, hashmap_fd;
+	__u64 i, val;
+
+	bloom_filter_fd = bpf_map__fd(ctx.skel->maps.map_bloom_filter);
+	random_data_fd = bpf_map__fd(ctx.skel->maps.map_random_data);
+	hashmap_fd = bpf_map__fd(ctx.skel->maps.hashmap);
+
+	while (true) {
+		i = __atomic_add_fetch(&ctx.next_map_idx, 1, __ATOMIC_RELAXED);
+		if (i > args.nr_entries)
+			break;
+again:
+		err = syscall(__NR_getrandom, &val, sizeof(val), 0);
+		if (err != sizeof(val)) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr, "failed to get random value\n");
+			break;
+		}
+		err = bpf_map_update_elem(hashmap_fd, &val, &val, BPF_NOEXIST);
+		if (err) {
+			if (err != -EEXIST) {
+				ctx.map_prepare_err = true;
+				fprintf(stderr, "failed to add elem to hashmap: %d\n", -errno);
+				break;
+			}
+			goto again;
+		}
+
+		i--;
+		err = bpf_map_update_elem(random_data_fd, &i, &val, 0);
+		if (err) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr, "failed to add elem to array: %d\n", -errno);
+			break;
+		}
+
+		err = bpf_map_update_elem(bloom_filter_fd, NULL, &val, 0);
+		if (err) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr, "failed to add elem to bloom_filter: %d\n", -errno);
+			break;
+		}
+	}
+
+	pthread_mutex_lock(&ctx.map_done_mtx);
+	pthread_cond_signal(&ctx.map_done);
+	pthread_mutex_unlock(&ctx.map_done_mtx);
+
+	return NULL;
+}
+
+static void populate_maps(void)
+{
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	pthread_t map_thread;
+	int i, err;
+
+	for (i = 0; i < nr_cpus; i++) {
+		err = pthread_create(&map_thread, NULL, map_prepare_thread,
+				     NULL);
+		if (err) {
+			fprintf(stderr, "failed to create pthread: %d\n", -errno);
+			exit(1);
+		}
+	}
+
+	pthread_mutex_lock(&ctx.map_done_mtx);
+	pthread_cond_wait(&ctx.map_done, &ctx.map_done_mtx);
+	pthread_mutex_unlock(&ctx.map_done_mtx);
+
+	if (ctx.map_prepare_err)
+		exit(1);
+}
+
+static int set_nr_hashes(struct bpf_map *bloom_filter_map, u32 map_flags, u8 nr_hashes)
+{
+	map_flags = map_flags | (nr_hashes << ilog2(BPF_F_BLOOM_FILTER_HASH_BIT_1));
+	return bpf_map__set_map_flags(bloom_filter_map, map_flags);
+}
+
+static struct bloom_filter_map *setup_skeleton(void)
+{
+	struct bloom_filter_map *skel;
+	int err;
+
+	setup_libbpf();
+
+	skel = bloom_filter_map__open();
+	if (!skel) {
+		fprintf(stderr, "failed to open skeleton\n");
+		exit(1);
+	}
+
+	err = bpf_map__resize(skel->maps.map_random_data, args.nr_entries);
+	if (err) {
+		fprintf(stderr, "failed to resize map_random_data\n");
+		exit(1);
+	}
+
+	err = bpf_map__resize(skel->maps.hashmap, args.nr_entries);
+	if (err) {
+		fprintf(stderr, "failed to resize hashmap\n");
+		exit(1);
+	}
+
+	err = bpf_map__resize(skel->maps.map_bloom_filter, args.nr_entries);
+	if (err) {
+		fprintf(stderr, "failed to resize bloom filter\n");
+		exit(1);
+	}
+
+	err = set_nr_hashes(skel->maps.map_bloom_filter, 0, args.nr_hashes);
+	if (err) {
+		fprintf(stderr, "failed to set %u hashes\n", args.nr_hashes);
+		exit(1);
+	}
+
+	if (bloom_filter_map__load(skel)) {
+		fprintf(stderr, "failed to load skeleton\n");
+		exit(1);
+	}
+
+	return skel;
+}
+
+static void bloom_filter_map_setup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton();
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
+static void hashmap_lookup_setup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton();
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_hashmap_lookup);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
+static void measure(struct bench_res *res)
+{
+	long total_hits = 0, total_drops = 0, total_false_hits = 0;
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	BPF_DECLARE_PERCPU(__u64, zeroed_values);
+	BPF_DECLARE_PERCPU(__u64, false_hits);
+	BPF_DECLARE_PERCPU(__u64, drops);
+	BPF_DECLARE_PERCPU(__u64, hits);
+	int err, i, percpu_array_fd;
+	__u32 key;
+
+	if (ctx.skel->bss->error != 0) {
+		fprintf(stderr, "error (%d) when searching the bloom filter\n",
+			ctx.skel->bss->error);
+		exit(1);
+	}
+
+	key = ctx.skel->rodata->hit_key;
+	percpu_array_fd = bpf_map__fd(ctx.skel->maps.percpu_array);
+	err = bpf_map_lookup_elem(percpu_array_fd, &key, hits);
+	if (err) {
+		fprintf(stderr, "lookup in the percpu array  for 'hits' failed: %d\n",
+			-errno);
+		exit(1);
+	}
+
+	key = ctx.skel->rodata->drop_key;
+	err = bpf_map_lookup_elem(percpu_array_fd, &key, drops);
+	if (err) {
+		fprintf(stderr, "lookup in the percpu array for 'drops' failed: %d\n",
+			-errno);
+		exit(1);
+	}
+
+	key = ctx.skel->rodata->false_hit_key;
+	err = bpf_map_lookup_elem(percpu_array_fd, &key, false_hits);
+	if (err) {
+		fprintf(stderr, "lookup in the percpu array for 'false hits' failed: %d\n",
+			-errno);
+		exit(1);
+	}
+
+	for (i = 0; i < nr_cpus; i++) {
+		total_hits += bpf_percpu(hits, i);
+		total_drops += bpf_percpu(drops, i);
+		total_false_hits += bpf_percpu(false_hits, i);
+	}
+
+	res->hits = total_hits;
+	res->drops = total_drops;
+	res->false_hits = total_false_hits;
+
+	memset(zeroed_values, 0, sizeof(zeroed_values));
+
+	/* zero out the percpu array */
+	key = ctx.skel->rodata->hit_key;
+	err = bpf_map_update_elem(percpu_array_fd, &key, zeroed_values, BPF_ANY);
+	if (err) {
+		fprintf(stderr, "zeroing the percpu array failed: %d\n", -errno);
+		exit(1);
+	}
+	key = ctx.skel->rodata->drop_key;
+	err = bpf_map_update_elem(percpu_array_fd, &key, zeroed_values, BPF_ANY);
+	if (err) {
+		fprintf(stderr, "zeroing the percpu array failed: %d\n", -errno);
+		exit(1);
+	}
+	key = ctx.skel->rodata->false_hit_key;
+	err = bpf_map_update_elem(percpu_array_fd, &key, zeroed_values, BPF_ANY);
+	if (err) {
+		fprintf(stderr, "zeroing the percpu array failed: %d\n", -errno);
+		exit(1);
+	}
+}
+
+static void *consumer(void *input)
+{
+	return NULL;
+}
+
+const struct bench bench_bloom_filter_map = {
+	.name = "bloom-filter-map",
+	.validate = validate,
+	.setup = bloom_filter_map_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_bloom_filter_false_positive = {
+	.name = "bloom-filter-false-positive",
+	.validate = validate,
+	.setup = hashmap_lookup_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = false_hits_report_progress,
+	.report_final = false_hits_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
new file mode 100755
index 000000000000..8f2de6e39313
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source ./benchs/run_common.sh
+
+set -eufo pipefail
+
+header "Bloom filter map"
+for t in 1 4 8; do
+for h in {1..10}; do
+subtitle "# threads: $t, # hashes: $h"
+	for e in 10000 50000 75000 100000 250000 500000 750000 1000000 2500000 5000000; do
+		printf "%'d entries -\n" $e
+		printf "\t"
+		summarize "Total operations: " \
+			"$($RUN_BENCH -p $t --nr_hashes $h --nr_entries $e bloom-filter-map)"
+		printf "\t"
+		summarize_percentage "False positive rate: " \
+			"$($RUN_BENCH -p $t --nr_hashes $h --nr_entries $e bloom-filter-false-positive)"
+	done
+	printf "\n"
+done
+done
+
+header "Bloom filter map, multi-producer contention"
+for t in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do
+	summarize "$t threads - " "$($RUN_BENCH -p $t bloom-filter-map)"
+done
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
index af4aa04caba6..ada028aa9007 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
@@ -1,34 +1,8 @@
 #!/bin/bash
 
-set -eufo pipefail
-
-RUN_BENCH="sudo ./bench -w3 -d10 -a"
-
-function hits()
-{
-	echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
-}
-
-function drops()
-{
-	echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
-}
+source ./benchs/run_common.sh
 
-function header()
-{
-	local len=${#1}
-
-	printf "\n%s\n" "$1"
-	for i in $(seq 1 $len); do printf '='; done
-	printf '\n'
-}
-
-function summarize()
-{
-	bench="$1"
-	summary=$(echo $2 | tail -n1)
-	printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
-}
+set -eufo pipefail
 
 header "Single-producer, parallel producer"
 for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
diff --git a/tools/testing/selftests/bpf/benchs/run_common.sh b/tools/testing/selftests/bpf/benchs/run_common.sh
new file mode 100644
index 000000000000..670f23b037c4
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_common.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+RUN_BENCH="sudo ./bench -w3 -d10 -a"
+
+function header()
+{
+	local len=${#1}
+
+	printf "\n%s\n" "$1"
+	for i in $(seq 1 $len); do printf '='; done
+	printf '\n'
+}
+
+function subtitle()
+{
+	local len=${#1}
+	printf "\t%s\n" "$1"
+}
+
+function hits()
+{
+	echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
+function drops()
+{
+	echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
+function percentage()
+{
+	echo "$*" | sed -E "s/.*Percentage\s=\s+([0-9]+\.[0-9]+).*/\1/"
+}
+
+function summarize()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
+}
+
+function summarize_percentage()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s%%\n" "$bench" "$(percentage $summary)"
+}
diff --git a/tools/testing/selftests/bpf/progs/bloom_filter_map.c b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
index 8b5bf8d61a40..d6808a291a42 100644
--- a/tools/testing/selftests/bpf/progs/bloom_filter_map.c
+++ b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
@@ -1,7 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2021 Facebook */
 
+#include <errno.h>
 #include <linux/bpf.h>
+#include <stdbool.h>
 #include <bpf/bpf_helpers.h>
 
 char _license[] SEC("license") = "GPL";
@@ -34,8 +36,38 @@ struct callback_ctx {
 	struct map_bloom_filter_type *map;
 };
 
+/* Tracks the number of hits, drops, and false hits */
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 3);
+	__type(key, __u32);
+	__type(value, __u64);
+} percpu_array SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1000);
+	__type(key, __u64);
+	__type(value, __u64);
+} hashmap SEC(".maps");
+
+const __u32 hit_key  = 0;
+const __u32 drop_key  = 1;
+const __u32 false_hit_key = 2;
+
+bool hashmap_use_bloom_filter = true;
+
 int error = 0;
 
+static __always_inline void log_result(__u32 key)
+{
+	__u64 *count;
+
+	count = bpf_map_lookup_elem(&percpu_array, &key);
+	if (count)
+		*count += 1;
+}
+
 static __u64
 check_elem(struct bpf_map *map, __u32 *key, __u64 *val,
 	   struct callback_ctx *data)
@@ -48,6 +80,8 @@ check_elem(struct bpf_map *map, __u32 *key, __u64 *val,
 		return 1; /* stop the iteration */
 	}
 
+	log_result(hit_key);
+
 	return 0;
 }
 
@@ -80,3 +114,43 @@ int prog_bloom_filter_inner_map(void *ctx)
 
 	return 0;
 }
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter_hashmap_lookup(void *ctx)
+{
+	__u64 *result;
+	int i, err;
+
+	union {
+		__u64 data64;
+		__u32 data32[2];
+	} val;
+
+	for (i = 0; i < 512; i++) {
+		val.data32[0] = bpf_get_prandom_u32();
+		val.data32[1] = bpf_get_prandom_u32();
+
+		if (hashmap_use_bloom_filter) {
+			err = bpf_map_peek_elem(&map_bloom_filter, &val);
+			if (err) {
+				if (err != -ENOENT) {
+					error |= 3;
+					return 0;
+				}
+				log_result(drop_key);
+				continue;
+			}
+		}
+
+		result = bpf_map_lookup_elem(&hashmap, &val);
+		if (result) {
+			log_result(hit_key);
+		} else {
+			if (hashmap_use_bloom_filter)
+				log_result(false_hit_key);
+			log_result(drop_key);
+		}
+	}
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 bpf-next 4/4] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter
  2021-09-14  4:04 [PATCH v2 bpf-next 0/4] Implement bloom filter map Joanne Koong
                   ` (2 preceding siblings ...)
  2021-09-14  4:04 ` [PATCH v2 bpf-next 3/4] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
@ 2021-09-14  4:04 ` Joanne Koong
  3 siblings, 0 replies; 11+ messages in thread
From: Joanne Koong @ 2021-09-14  4:04 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patch adds benchmark tests for comparing the performance of hashmap
lookups without the bloom filter vs. hashmap lookups with the bloom filter.

Checking the bloom filter first for whether the element exists should
overall enable a higher throughput for hashmap lookups, since if the
element does not exist in the bloom filter, we can avoid a costly lookup in
the hashmap.

On average, using 5 hash functions in the bloom filter tended to perform
the best across the widest range of different entry sizes. The benchmark
results using 5 hash functions (running on 8 threads on a machine with one
numa node, and taking the average of 3 runs) were roughly as follows:

value_size = 4 bytes -
	10k entries: 30% faster
	50k entries: 50% faster
	100k entries: 55% faster
	500k entres: 80% faster
	1 million entries: 120% faster
	5 million entries: 135% faster

value_size = 8 bytes -
	10k entries: 35% faster
	50k entries: 55% faster
	100k entries: 70% faster
	500k entres: 110% faster
	1 million entries: 215% faster
	5 million entries: 215% faster

value_size = 16 bytes -
	10k entries: 5% slower
	50k entries: 25% faster
	100k entries: 35% faster
	500k entres: 105% faster
	1 million entries: 130% faster
	5 million entries: 105% faster

value_size = 40 bytes -
	10k entries: 5% slower
	50k entries: 10% faster
	100k entries: 20% faster
	500k entres: 45% faster
	1 million entries: 60% faster
	5 million entries: 75% faster

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 tools/testing/selftests/bpf/bench.c           | 22 ++++++++---
 .../bpf/benchs/bench_bloom_filter_map.c       | 39 +++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  | 15 +++++++
 .../selftests/bpf/benchs/run_common.sh        | 12 ++++++
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 0bcbdb4405a3..7da1589a9fe0 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -92,20 +92,21 @@ void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns)
 	printf("Iter %3d (%7.3lfus): ",
 	       iter, (delta_ns - 1000000000) / 1000.0);
 
-	printf("hits %8.3lfM/s (%7.3lfM/prod), drops %8.3lfM/s\n",
-	       hits_per_sec, hits_per_prod, drops_per_sec);
+	printf("hits %8.3lfM/s (%7.3lfM/prod), drops %8.3lfM/s, total operations %8.3lfM/s\n",
+	       hits_per_sec, hits_per_prod, drops_per_sec, hits_per_sec + drops_per_sec);
 }
 
 void hits_drops_report_final(struct bench_res res[], int res_cnt)
 {
 	int i;
-	double hits_mean = 0.0, drops_mean = 0.0;
-	double hits_stddev = 0.0, drops_stddev = 0.0;
+	double hits_mean = 0.0, drops_mean = 0.0, total_ops_mean = 0.0;
+	double hits_stddev = 0.0, drops_stddev = 0.0, total_ops_stddev = 0.0;
 
 	for (i = 0; i < res_cnt; i++) {
 		hits_mean += res[i].hits / 1000000.0 / (0.0 + res_cnt);
 		drops_mean += res[i].drops / 1000000.0 / (0.0 + res_cnt);
 	}
+	total_ops_mean = hits_mean + drops_mean;
 
 	if (res_cnt > 1)  {
 		for (i = 0; i < res_cnt; i++) {
@@ -115,14 +116,21 @@ void hits_drops_report_final(struct bench_res res[], int res_cnt)
 			drops_stddev += (drops_mean - res[i].drops / 1000000.0) *
 					(drops_mean - res[i].drops / 1000000.0) /
 					(res_cnt - 1.0);
+			total_ops_stddev += (total_ops_mean -
+					(res[i].hits + res[i].drops) / 1000000.0) *
+					(total_ops_mean - (res[i].hits + res[i].drops) / 1000000.0)
+					/ (res_cnt - 1.0);
 		}
 		hits_stddev = sqrt(hits_stddev);
 		drops_stddev = sqrt(drops_stddev);
+		total_ops_stddev = sqrt(total_ops_stddev);
 	}
 	printf("Summary: hits %8.3lf \u00B1 %5.3lfM/s (%7.3lfM/prod), ",
 	       hits_mean, hits_stddev, hits_mean / env.producer_cnt);
-	printf("drops %8.3lf \u00B1 %5.3lfM/s\n",
+	printf("drops %8.3lf \u00B1 %5.3lfM/s, ",
 	       drops_mean, drops_stddev);
+	printf("total operations %8.3lf \u00B1 %5.3lfM/s\n",
+	       total_ops_mean, total_ops_stddev);
 }
 
 const char *argp_program_version = "benchmark";
@@ -356,6 +364,8 @@ extern const struct bench bench_pb_libbpf;
 extern const struct bench bench_pb_custom;
 extern const struct bench bench_bloom_filter_map;
 extern const struct bench bench_bloom_filter_false_positive;
+extern const struct bench bench_hashmap_without_bloom_filter;
+extern const struct bench bench_hashmap_with_bloom_filter;
 
 static const struct bench *benchs[] = {
 	&bench_count_global,
@@ -379,6 +389,8 @@ static const struct bench *benchs[] = {
 	&bench_pb_custom,
 	&bench_bloom_filter_map,
 	&bench_bloom_filter_false_positive,
+	&bench_hashmap_without_bloom_filter,
+	&bench_hashmap_with_bloom_filter,
 };
 
 static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
index 2cce4f657646..6fee88320c3d 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
@@ -251,6 +251,23 @@ static void hashmap_lookup_setup(void)
 	}
 }
 
+static void hashmap_no_bloom_filter_setup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton();
+
+	ctx.skel->data->hashmap_use_bloom_filter = false;
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_hashmap_lookup);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
 static void measure(struct bench_res *res)
 {
 	long total_hits = 0, total_drops = 0, total_false_hits = 0;
@@ -352,3 +369,25 @@ const struct bench bench_bloom_filter_false_positive = {
 	.report_progress = false_hits_report_progress,
 	.report_final = false_hits_report_final,
 };
+
+const struct bench bench_hashmap_without_bloom_filter = {
+	.name = "hashmap-without-bloom-filter",
+	.validate = validate,
+	.setup = hashmap_no_bloom_filter_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_hashmap_with_bloom_filter = {
+	.name = "hashmap-with-bloom-filter",
+	.validate = validate,
+	.setup = hashmap_lookup_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
index 8f2de6e39313..53c14da00a3b 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
@@ -26,3 +26,18 @@ header "Bloom filter map, multi-producer contention"
 for t in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do
 	summarize "$t threads - " "$($RUN_BENCH -p $t bloom-filter-map)"
 done
+
+header "Hashmap without bloom filter vs. hashmap with bloom filter (throughput, 8 threads)"
+for h in {1..10}; do
+subtitle "# hashes: $h"
+	for e in 10000 50000 75000 100000 250000 500000 750000 1000000 2500000 5000000; do
+		printf "%'d entries -\n" $e
+		printf "\t"
+		summarize_total "Hashmap without bloom filter: " \
+			"$($RUN_BENCH --nr_hashes $h --nr_entries $e -p 8 hashmap-without-bloom-filter)"
+		printf "\t"
+		summarize_total "Hashmap with bloom filter: " \
+			"$($RUN_BENCH --nr_hashes $h --nr_entries $e -p 8 hashmap-with-bloom-filter)"
+	done
+	printf "\n"
+done
diff --git a/tools/testing/selftests/bpf/benchs/run_common.sh b/tools/testing/selftests/bpf/benchs/run_common.sh
index 670f23b037c4..9a16be78b180 100644
--- a/tools/testing/selftests/bpf/benchs/run_common.sh
+++ b/tools/testing/selftests/bpf/benchs/run_common.sh
@@ -33,6 +33,11 @@ function percentage()
 	echo "$*" | sed -E "s/.*Percentage\s=\s+([0-9]+\.[0-9]+).*/\1/"
 }
 
+function total()
+{
+	echo "$*" | sed -E "s/.*total operations\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
 function summarize()
 {
 	bench="$1"
@@ -46,3 +51,10 @@ function summarize_percentage()
 	summary=$(echo $2 | tail -n1)
 	printf "%-20s %s%%\n" "$bench" "$(percentage $summary)"
 }
+
+function summarize_total()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s\n" "$bench" "$(total $summary)"
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation
  2021-09-14  4:04 ` [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation Joanne Koong
@ 2021-09-17 17:01   ` Alexei Starovoitov
  2021-09-20 20:58     ` Andrii Nakryiko
  2021-09-20 21:03     ` Joanne Koong
  2021-09-17 21:48   ` Andrii Nakryiko
  1 sibling, 2 replies; 11+ messages in thread
From: Alexei Starovoitov @ 2021-09-17 17:01 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel-team

On Mon, Sep 13, 2021 at 09:04:30PM -0700, Joanne Koong wrote:
> +
> +/* For bloom filter maps, the next 4 bits represent how many hashes to use.
> + * The maximum number of hash functions supported is 15. If this is not set,
> + * the default number of hash functions used will be 5.
> + */
> +	BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
> +	BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
> +	BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
> +	BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),

The bit selection is unintuitive.
Since key_size has to be zero may be used that instead to indicate the number of hash
functions in the rare case when 5 is not good enough?
Or use inner_map_fd since there is no possibility of having an inner map in bloomfilter.
It could be a union:
    __u32   max_entries;    /* max number of entries in a map */
    __u32   map_flags;      /* BPF_MAP_CREATE related
                             * flags defined above.
                             */
    union {
       __u32  inner_map_fd;   /* fd pointing to the inner map */
       __u32  nr_hash_funcs;  /* or number of hash functions */
    };
    __u32   numa_node;      /* numa node */

> +struct bpf_bloom_filter {
> +	struct bpf_map map;
> +	u32 bit_array_mask;
> +	u32 hash_seed;
> +	/* If the size of the values in the bloom filter is u32 aligned,
> +	 * then it is more performant to use jhash2 as the underlying hash
> +	 * function, else we use jhash. This tracks the number of u32s
> +	 * in an u32-aligned value size. If the value size is not u32 aligned,
> +	 * this will be 0.
> +	 */
> +	u32 aligned_u32_count;

what is the performance difference?
May be we enforce 4-byte sized value for simplicity?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation
  2021-09-14  4:04 ` [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation Joanne Koong
  2021-09-17 17:01   ` Alexei Starovoitov
@ 2021-09-17 21:48   ` Andrii Nakryiko
  1 sibling, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2021-09-17 21:48 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel Team

On Mon, Sep 13, 2021 at 9:09 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> Bloom filters are a space-efficient probabilistic data structure
> used to quickly test whether an element exists in a set.
> In a bloom filter, false positives are possible whereas false
> negatives should never be.
>
> This patch adds a bloom filter map for bpf programs.
> The bloom filter map supports peek (determining whether an element
> is present in the map) and push (adding an element to the map)
> operations.These operations are exposed to userspace applications
> through the already existing syscalls in the following way:
>
> BPF_MAP_LOOKUP_ELEM -> peek
> BPF_MAP_UPDATE_ELEM -> push
>
> The bloom filter map does not have keys, only values. In light of
> this, the bloom filter map's API matches that of queue stack maps:
> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> APIs to query or add an element to the bloom filter map. When the
> bloom filter map is created, it must be created with a key_size of 0.
>
> For updates, the user will pass in the element to add to the map
> as the value, with a NULL key. For lookups, the user will pass in the
> element to query in the map as the value. In the verifier layer, this
> requires us to modify the argument type of a bloom filter's
> BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> the syscall layer, we need to copy over the user value so that in
> bpf_map_peek_elem, we know which specific value to query.
>
> A few things to please take note of:
>  * If there are any concurrent lookups + updates, the user is
> responsible for synchronizing this to ensure no false negative lookups
> occur.
>  * The number of hashes to use for the bloom filter is configurable from
> userspace. If no number is specified, the default used will be 5 hash
> functions. The benchmarks later in this patchset can help compare the
> performance of using different number of hashes on different entry
> sizes. In general, using more hashes decreases the speed of a lookup,
> but increases the false positive rate of an element being detected in the
> bloom filter.
>  * Deleting an element in the bloom filter map is not supported.
>  * The bloom filter map may be used as an inner map.
>  * The "max_entries" size that is specified at map creation time is used to
> approximate a reasonable bitmap size for the bloom filter, and is not
> otherwise strictly enforced. If the user wishes to insert more entries into
> the bloom filter than "max_entries", they may do so but they should be
> aware that this may lead to a higher false positive rate.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---
>  include/linux/bpf_types.h      |   1 +
>  include/uapi/linux/bpf.h       |  10 ++
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/bloom_filter.c      | 205 +++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c           |  14 ++-
>  kernel/bpf/verifier.c          |  19 ++-
>  tools/include/uapi/linux/bpf.h |  10 ++
>  7 files changed, 255 insertions(+), 6 deletions(-)
>  create mode 100644 kernel/bpf/bloom_filter.c
>
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 9c81724e4b98..c4424ac2fa02 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
>  #endif
>  BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
> +BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
>
>  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 791f31dd0abe..1d82860fd98e 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -906,6 +906,7 @@ enum bpf_map_type {
>         BPF_MAP_TYPE_RINGBUF,
>         BPF_MAP_TYPE_INODE_STORAGE,
>         BPF_MAP_TYPE_TASK_STORAGE,
> +       BPF_MAP_TYPE_BLOOM_FILTER,
>  };
>
>  /* Note that tracing related programs such as
> @@ -1210,6 +1211,15 @@ enum {
>
>  /* Create a map that is suitable to be an inner map with dynamic max entries */
>         BPF_F_INNER_MAP         = (1U << 12),
> +
> +/* For bloom filter maps, the next 4 bits represent how many hashes to use.
> + * The maximum number of hash functions supported is 15. If this is not set,
> + * the default number of hash functions used will be 5.
> + */
> +       BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
> +       BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
> +       BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
> +       BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),

I didn't realize all those map_flags are sequentially numbered, but I
guess we are not yet running out of space, so this might be ok. But to
be usable from BPF program nicely, I would be better to define this as
offset of a first hash bit and number of bits:

BPF_F_BLOOM_NR_HASH_OFF = 13,
BPF_F_BLOOM_NR_HASH_CNT = 4,

So that in BPF map definiton in BPF program we could do

struct {
    __uint(type, BPF_MAP_TYPE_BLOOM_FILTER),
    ...
    __uint(map_flags, 5 << BPF_F_BLOOM_NR_HASH_OFF),
};

It's still quite ugly, but given it's unlikely to be used very
frequently, might be ok.


[...]

> +
> +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
> +{
> +       struct bpf_bloom_filter *bloom_filter =
> +               container_of(map, struct bpf_bloom_filter, map);
> +       u32 hash;
> +       u8 i;
> +
> +       for (i = 0; i < bloom_filter->nr_hashes; i++) {
> +               if (bloom_filter->aligned_u32_count)
> +                       hash = jhash2(value, bloom_filter->aligned_u32_count,
> +                                     bloom_filter->hash_seed + i) &
> +                               bloom_filter->bit_array_mask;
> +               else
> +                       hash = jhash(value, map->value_size,
> +                                    bloom_filter->hash_seed + i) &
> +                               bloom_filter->bit_array_mask;

this looks like a good candidate for helper function used in at least two places

> +
> +               if (!test_bit(hash, bloom_filter->bit_array))
> +                       return -ENOENT;
> +       }
> +
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation
  2021-09-17 17:01   ` Alexei Starovoitov
@ 2021-09-20 20:58     ` Andrii Nakryiko
  2021-09-20 22:52       ` Joanne Koong
  2021-09-20 21:03     ` Joanne Koong
  1 sibling, 1 reply; 11+ messages in thread
From: Andrii Nakryiko @ 2021-09-20 20:58 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Joanne Koong, bpf, Kernel Team

On Fri, Sep 17, 2021 at 6:08 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Sep 13, 2021 at 09:04:30PM -0700, Joanne Koong wrote:
> > +
> > +/* For bloom filter maps, the next 4 bits represent how many hashes to use.
> > + * The maximum number of hash functions supported is 15. If this is not set,
> > + * the default number of hash functions used will be 5.
> > + */
> > +     BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
> > +     BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
> > +     BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
> > +     BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),
>
> The bit selection is unintuitive.
> Since key_size has to be zero may be used that instead to indicate the number of hash
> functions in the rare case when 5 is not good enough?

Hm... I was initially thinking about proposing something like that,
but it felt a bit ugly at the time. But now thinking about this a bit
more, I think this would be a bit more meaningful if we change the
terminology a bit. Instead of saying that Bloom filter has values and
no keys, we actually have keys and no values. So all those bytes that
are hashed are treated as keys (which is actually how sets are
implemented on top of maps, where you have keys and no values, or at
least the value is always true).

So with that we'll have key/key_size to specify number of bytes that
needs to be hashed (and it's type info). And then we can squint a bit
and say that number of hashes are specified by value_size, as in
values are those nr_hash bits that we set in Bloom filter.

Still a bit of terminology stretch, but won't necessitate those
specialized fields just for Bloom filter map. But if default value is
going to be good enough for most cases and most cases won't need to
adjust number of hashes, this seems to be pretty clean to me.

> Or use inner_map_fd since there is no possibility of having an inner map in bloomfilter.
> It could be a union:
>     __u32   max_entries;    /* max number of entries in a map */
>     __u32   map_flags;      /* BPF_MAP_CREATE related
>                              * flags defined above.
>                              */
>     union {
>        __u32  inner_map_fd;   /* fd pointing to the inner map */
>        __u32  nr_hash_funcs;  /* or number of hash functions */
>     };

This works as well. A bit more Bloom filter-only terminology
throughout UAPI and libbpf, but I'd be fine with that as well.


>     __u32   numa_node;      /* numa node */
>
> > +struct bpf_bloom_filter {
> > +     struct bpf_map map;
> > +     u32 bit_array_mask;
> > +     u32 hash_seed;
> > +     /* If the size of the values in the bloom filter is u32 aligned,
> > +      * then it is more performant to use jhash2 as the underlying hash
> > +      * function, else we use jhash. This tracks the number of u32s
> > +      * in an u32-aligned value size. If the value size is not u32 aligned,
> > +      * this will be 0.
> > +      */
> > +     u32 aligned_u32_count;
>
> what is the performance difference?
> May be we enforce 4-byte sized value for simplicity?

This might be a bit too surprising, especially if keys are just some
strings, where people might not expect that it has to 4-byte multiple
size. And debugging this without extra tooling (like retsnoop) is
going to be nightmarish.

If the performance diff is huge and that if/else logic is
unacceptable, we can also internally pad with up to 3 zero bytes and
include those into the hash.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation
  2021-09-17 17:01   ` Alexei Starovoitov
  2021-09-20 20:58     ` Andrii Nakryiko
@ 2021-09-20 21:03     ` Joanne Koong
  1 sibling, 0 replies; 11+ messages in thread
From: Joanne Koong @ 2021-09-20 21:03 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf, Kernel-team

On 9/17/21 10:01 AM, Alexei Starovoitov wrote:

> On Mon, Sep 13, 2021 at 09:04:30PM -0700, Joanne Koong wrote:
>> +
>> +/* For bloom filter maps, the next 4 bits represent how many hashes to use.
>> + * The maximum number of hash functions supported is 15. If this is not set,
>> + * the default number of hash functions used will be 5.
>> + */
>> +	BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
>> +	BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
>> +	BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
>> +	BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),
> The bit selection is unintuitive.
> Since key_size has to be zero may be used that instead to indicate the number of hash
> functions in the rare case when 5 is not good enough?
> Or use inner_map_fd since there is no possibility of having an inner map in bloomfilter.
> It could be a union:
>      __u32   max_entries;    /* max number of entries in a map */
>      __u32   map_flags;      /* BPF_MAP_CREATE related
>                               * flags defined above.
>                               */
>      union {
>         __u32  inner_map_fd;   /* fd pointing to the inner map */
>         __u32  nr_hash_funcs;  /* or number of hash functions */
>      };
>      __u32   numa_node;      /* numa node */
I really like the idea of union-ing inner_map_fd with the number of hash 
functions (my worry with
using key_size is that it might be a confusing / non-intuitive API quirk 
for users), but I think this
would later require us to add some bloom filter specific APIs to libbpf 
(such as bpf_map__set_nr_hashes).

To make the bit selection more intuitive, Andrii suggested defining some 
helper like

BPF_F_BLOOM_NR_HASH_OFF = 13

where the user could then do something like

struct {
     __uint(type, BPF_MAP_TYPE_BLOOM_FILTER),
     ...
     __uint(map_flags, 5 << BPF_F_BLOOM_NR_HASH_OFF),
};

to set the number of hash functions.

Would this approach address your concerns about the unintuitiveness of 
the bit selection?

>> +struct bpf_bloom_filter {
>> +	struct bpf_map map;
>> +	u32 bit_array_mask;
>> +	u32 hash_seed;
>> +	/* If the size of the values in the bloom filter is u32 aligned,
>> +	 * then it is more performant to use jhash2 as the underlying hash
>> +	 * function, else we use jhash. This tracks the number of u32s
>> +	 * in an u32-aligned value size. If the value size is not u32 aligned,
>> +	 * this will be 0.
>> +	 */
>> +	u32 aligned_u32_count;
> what is the performance difference?

Using results from the hashmap benchmark tests, using jhash2 instead of 
jhash for 4-byte
aligned value sizes improved the performance by roughly 5% to 15%. For 
non-4-byte aligned
value sizes, there wasn't a noticeable difference between using jhash2 
(and truncating the
remainder bits) vs. using jhash.

> May be we enforce 4-byte sized value for simplicity?
Sounds great! And if in the future this becomes too restrictive, we 
could always loosen this
as well

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation
  2021-09-20 20:58     ` Andrii Nakryiko
@ 2021-09-20 22:52       ` Joanne Koong
  2021-09-20 23:21         ` Andrii Nakryiko
  0 siblings, 1 reply; 11+ messages in thread
From: Joanne Koong @ 2021-09-20 22:52 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov; +Cc: bpf, Kernel Team

My previous email replied to Alexei's email before I saw Andrii's new 
email, so please
feel free to disregard my previous email.

On 9/20/21 1:58 PM, Andrii Nakryiko wrote:

> On Fri, Sep 17, 2021 at 6:08 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Mon, Sep 13, 2021 at 09:04:30PM -0700, Joanne Koong wrote:
>>> +
>>> +/* For bloom filter maps, the next 4 bits represent how many hashes to use.
>>> + * The maximum number of hash functions supported is 15. If this is not set,
>>> + * the default number of hash functions used will be 5.
>>> + */
>>> +     BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
>>> +     BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
>>> +     BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
>>> +     BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),
>> The bit selection is unintuitive.
>> Since key_size has to be zero may be used that instead to indicate the number of hash
>> functions in the rare case when 5 is not good enough?
> Hm... I was initially thinking about proposing something like that,
> but it felt a bit ugly at the time. But now thinking about this a bit
> more, I think this would be a bit more meaningful if we change the
> terminology a bit. Instead of saying that Bloom filter has values and
> no keys, we actually have keys and no values. So all those bytes that
> are hashed are treated as keys (which is actually how sets are
> implemented on top of maps, where you have keys and no values, or at
> least the value is always true).
>
> So with that we'll have key/key_size to specify number of bytes that
> needs to be hashed (and it's type info). And then we can squint a bit
> and say that number of hashes are specified by value_size, as in
> values are those nr_hash bits that we set in Bloom filter.
>
> Still a bit of terminology stretch, but won't necessitate those
> specialized fields just for Bloom filter map. But if default value is
> going to be good enough for most cases and most cases won't need to
> adjust number of hashes, this seems to be pretty clean to me.

With having bloom filter map keys instead of values,  I think this would
lead to messier code in the kernel for handling map_lookup_elem
and map_update_elem calls, due to the fact that the bloom filter map
is a non-associative map and the current APIs for non-associative map types
(peek_elem/push_elem/pop_elem) all have the map data as the value and
not the key.

For example, for map_update_elem, the API from the eBPF program side is

int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 
flags);

This would require us to either

a) Add some custom logic in syscall.c so that we bypass the 
copy_from_bpfptr call on
bloom filter map values (necessary because memcpying 0 bytes still 
requires the src pointer
to be valid), which would allow us to pass in a NULL value

b) Add a new function like

int (*map_push_key)(struct bpf_map *map, void *key, u64 flags)

that eBPF programs can call instead of map_update_elem.

or

c) Try to repurpose the existing map_push_elem API by passing in the key 
instead of the value,
which would lead to inconsistent use of the API

I think if we could change the non-associative map types (currently only 
stack maps and queue
maps, I believe) to have their data be a key instead of a value, and 
have the pop/peek APIs use
keys instead of values, then this would be cleaner, since we could then 
just use the existing peek/pop
APIs.

>> Or use inner_map_fd since there is no possibility of having an inner map in bloomfilter.
>> It could be a union:
>>      __u32   max_entries;    /* max number of entries in a map */
>>      __u32   map_flags;      /* BPF_MAP_CREATE related
>>                               * flags defined above.
>>                               */
>>      union {
>>         __u32  inner_map_fd;   /* fd pointing to the inner map */
>>         __u32  nr_hash_funcs;  /* or number of hash functions */
>>      };
> This works as well. A bit more Bloom filter-only terminology
> throughout UAPI and libbpf, but I'd be fine with that as well.
>
Great, it looks like this is the consensus - I will go with this option 
for v3!
>>      __u32   numa_node;      /* numa node */
>>
>>> +struct bpf_bloom_filter {
>>> +     struct bpf_map map;
>>> +     u32 bit_array_mask;
>>> +     u32 hash_seed;
>>> +     /* If the size of the values in the bloom filter is u32 aligned,
>>> +      * then it is more performant to use jhash2 as the underlying hash
>>> +      * function, else we use jhash. This tracks the number of u32s
>>> +      * in an u32-aligned value size. If the value size is not u32 aligned,
>>> +      * this will be 0.
>>> +      */
>>> +     u32 aligned_u32_count;
>> what is the performance difference?
>> May be we enforce 4-byte sized value for simplicity?
> This might be a bit too surprising, especially if keys are just some
> strings, where people might not expect that it has to 4-byte multiple
> size. And debugging this without extra tooling (like retsnoop) is
> going to be nightmarish.
>
> If the performance diff is huge and that if/else logic is
> unacceptable, we can also internally pad with up to 3 zero bytes and
> include those into the hash.
I think the if/else logic is unavoidable if we support non 4-byte 
aligned value sizes,
unless we are okay with truncating any remainder bytes of non 4-byte 
aligned values
and stipulating that a bloom filter map value size has to be greater 
than 4 bytes (these
conditions would allow us to use jhash2 for every value without an 
if/else check). If we
internally pad, we will have to pad on every update and lookup, which 
would also
require an if/else.
Thanks for the comments and reviews, Alexei and Andrii. They are much 
appreciated!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation
  2021-09-20 22:52       ` Joanne Koong
@ 2021-09-20 23:21         ` Andrii Nakryiko
  0 siblings, 0 replies; 11+ messages in thread
From: Andrii Nakryiko @ 2021-09-20 23:21 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Alexei Starovoitov, bpf, Kernel Team

On Mon, Sep 20, 2021 at 3:52 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> My previous email replied to Alexei's email before I saw Andrii's new
> email, so please
> feel free to disregard my previous email.

Never got that reply of yours. Alexei's email arrived a few hours
after I've already replied to you. It was a time warp anomaly :)

>
> On 9/20/21 1:58 PM, Andrii Nakryiko wrote:
>
> > On Fri, Sep 17, 2021 at 6:08 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >> On Mon, Sep 13, 2021 at 09:04:30PM -0700, Joanne Koong wrote:
> >>> +
> >>> +/* For bloom filter maps, the next 4 bits represent how many hashes to use.
> >>> + * The maximum number of hash functions supported is 15. If this is not set,
> >>> + * the default number of hash functions used will be 5.
> >>> + */
> >>> +     BPF_F_BLOOM_FILTER_HASH_BIT_1 = (1U << 13),
> >>> +     BPF_F_BLOOM_FILTER_HASH_BIT_2 = (1U << 14),
> >>> +     BPF_F_BLOOM_FILTER_HASH_BIT_3 = (1U << 15),
> >>> +     BPF_F_BLOOM_FILTER_HASH_BIT_4 = (1U << 16),
> >> The bit selection is unintuitive.
> >> Since key_size has to be zero may be used that instead to indicate the number of hash
> >> functions in the rare case when 5 is not good enough?
> > Hm... I was initially thinking about proposing something like that,
> > but it felt a bit ugly at the time. But now thinking about this a bit
> > more, I think this would be a bit more meaningful if we change the
> > terminology a bit. Instead of saying that Bloom filter has values and
> > no keys, we actually have keys and no values. So all those bytes that
> > are hashed are treated as keys (which is actually how sets are
> > implemented on top of maps, where you have keys and no values, or at
> > least the value is always true).
> >
> > So with that we'll have key/key_size to specify number of bytes that
> > needs to be hashed (and it's type info). And then we can squint a bit
> > and say that number of hashes are specified by value_size, as in
> > values are those nr_hash bits that we set in Bloom filter.
> >
> > Still a bit of terminology stretch, but won't necessitate those
> > specialized fields just for Bloom filter map. But if default value is
> > going to be good enough for most cases and most cases won't need to
> > adjust number of hashes, this seems to be pretty clean to me.
>
> With having bloom filter map keys instead of values,  I think this would
> lead to messier code in the kernel for handling map_lookup_elem
> and map_update_elem calls, due to the fact that the bloom filter map
> is a non-associative map and the current APIs for non-associative map types
> (peek_elem/push_elem/pop_elem) all have the map data as the value and
> not the key.
>
> For example, for map_update_elem, the API from the eBPF program side is
>
> int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64
> flags);
>
> This would require us to either
>
> a) Add some custom logic in syscall.c so that we bypass the
> copy_from_bpfptr call on
> bloom filter map values (necessary because memcpying 0 bytes still
> requires the src pointer
> to be valid), which would allow us to pass in a NULL value
>
> b) Add a new function like
>
> int (*map_push_key)(struct bpf_map *map, void *key, u64 flags)
>
> that eBPF programs can call instead of map_update_elem.
>
> or
>
> c) Try to repurpose the existing map_push_elem API by passing in the key
> instead of the value,
> which would lead to inconsistent use of the API
>
> I think if we could change the non-associative map types (currently only
> stack maps and queue
> maps, I believe) to have their data be a key instead of a value, and
> have the pop/peek APIs use
> keys instead of values, then this would be cleaner, since we could then
> just use the existing peek/pop
> APIs.

I don't think we can change existing map APIs anymore, unfortunately.

>
> >> Or use inner_map_fd since there is no possibility of having an inner map in bloomfilter.
> >> It could be a union:
> >>      __u32   max_entries;    /* max number of entries in a map */
> >>      __u32   map_flags;      /* BPF_MAP_CREATE related
> >>                               * flags defined above.
> >>                               */
> >>      union {
> >>         __u32  inner_map_fd;   /* fd pointing to the inner map */
> >>         __u32  nr_hash_funcs;  /* or number of hash functions */
> >>      };
> > This works as well. A bit more Bloom filter-only terminology
> > throughout UAPI and libbpf, but I'd be fine with that as well.
> >
> Great, it looks like this is the consensus - I will go with this option
> for v3!
> >>      __u32   numa_node;      /* numa node */
> >>
> >>> +struct bpf_bloom_filter {
> >>> +     struct bpf_map map;
> >>> +     u32 bit_array_mask;
> >>> +     u32 hash_seed;
> >>> +     /* If the size of the values in the bloom filter is u32 aligned,
> >>> +      * then it is more performant to use jhash2 as the underlying hash
> >>> +      * function, else we use jhash. This tracks the number of u32s
> >>> +      * in an u32-aligned value size. If the value size is not u32 aligned,
> >>> +      * this will be 0.
> >>> +      */
> >>> +     u32 aligned_u32_count;
> >> what is the performance difference?
> >> May be we enforce 4-byte sized value for simplicity?
> > This might be a bit too surprising, especially if keys are just some
> > strings, where people might not expect that it has to 4-byte multiple
> > size. And debugging this without extra tooling (like retsnoop) is
> > going to be nightmarish.
> >
> > If the performance diff is huge and that if/else logic is
> > unacceptable, we can also internally pad with up to 3 zero bytes and
> > include those into the hash.
> I think the if/else logic is unavoidable if we support non 4-byte
> aligned value sizes,
> unless we are okay with truncating any remainder bytes of non 4-byte
> aligned values
> and stipulating that a bloom filter map value size has to be greater
> than 4 bytes (these
> conditions would allow us to use jhash2 for every value without an
> if/else check). If we
> internally pad, we will have to pad on every update and lookup, which
> would also
> require an if/else.
> Thanks for the comments and reviews, Alexei and Andrii. They are much
> appreciated!

I don't think truncation is an option. And I also forgot that we don't
really store values, so there is nothing to pad, really. So yeah, I'd
keep it as is, especially if that is not expensive (which I assume
it's not). As I mentioned before, that logic can be encapsulated in a
dedicated helper function and reused in a few places.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-21  1:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-14  4:04 [PATCH v2 bpf-next 0/4] Implement bloom filter map Joanne Koong
2021-09-14  4:04 ` [PATCH v2 bpf-next 1/4] bpf: Add bloom filter map implementation Joanne Koong
2021-09-17 17:01   ` Alexei Starovoitov
2021-09-20 20:58     ` Andrii Nakryiko
2021-09-20 22:52       ` Joanne Koong
2021-09-20 23:21         ` Andrii Nakryiko
2021-09-20 21:03     ` Joanne Koong
2021-09-17 21:48   ` Andrii Nakryiko
2021-09-14  4:04 ` [PATCH v2 bpf-next 2/4] selftests/bpf: Add bloom filter map test cases Joanne Koong
2021-09-14  4:04 ` [PATCH v2 bpf-next 3/4] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
2021-09-14  4:04 ` [PATCH v2 bpf-next 4/4] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).