All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 0/5] Implement bitset maps, with bloom filter
@ 2021-10-06 22:20 Joanne Koong
  2021-10-06 22:20 ` [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities Joanne Koong
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Joanne Koong @ 2021-10-06 22:20 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patchset adds a new kind of bpf map: the bitset map.

A bitset is an array data structure that compactly stores bits. It is a
non-associative data type and is often utilized to denote whether an element
exists in a set. A bitset is effective at exploiting bit-level parallelism in
hardware to perform operations quickly. For more information, please see
https://en.wikipedia.org/wiki/Bit_array

When a special flag is set, the bitset can be utilized as a bloom filter.
A bloom filter is a space-efficient probabilistic data structure used to
quickly test whether whether an element exists in a set. In a bloom filter,
false positives are possible whereas false negatives should never be. For a
more thorough overview about how bloom filters work,
https://en.wikipedia.org/wiki/Bloom_filter may be helpful.

One example use-case is an application leveraging a bloom filter map to
determine whether a computationally expensive hashmap lookup can be avoided. If
the element was not found in the bloom filter map, the hashmap lookup can be
skipped.

A high level overview of this patchset is as follows:
1/5 - kernel changes for the bitset map, with bloom filter capabilities
2/5 - libbpf changes for adding map_extra flags
3/5 - tests for the bitset map and for bloom filter capabilities
4/5 - benchmarks for bloom filter lookup/update throughput and false positive
rate
5/5 - benchmarks for how hashmap lookups perform with vs. without the bloom
filter

v3 -> v4:
* Generalize the bloom filter map to be a bitset map with bloom filter
capabilities
* Add map_extra flags; pass in nr_hash_funcs through lower 4 bits of map_extra
for the bitset map
* Add tests for the bitset map (non-bloom filter) functionality
* In the benchmarks, stats are computed only as monotonic increases. Placed
stats in a struct instead of as a percpu_array bpf map

v2 -> v3:
* Add libbpf changes for supporting nr_hash_funcs, instead of passing the
number of hash functions through map_flags.
* Separate the hashing logic in kernel/bpf/bloom_filter.c into a helper
function

v1 -> v2:
* Remove libbpf changes, and pass the number of hash functions through
map_flags instead.
* Default to using 5 hash functions if no number of hash functions
is specified.
* Use set_bit instead of spinlocks in the bloom filter bitmap. This
improved the speed significantly. For example, using 5 hash functions
with 100k entries, there was roughly a 35% speed increase.
* Use jhash2 (instead of jhash) for u32-aligned value sizes. This
increased the speed by roughly 5 to 15%. When using jhash2 on value
sizes non-u32 aligned (truncating any remainder bits), there was not
a noticeable difference.
* Add test for using the bloom filter as an inner map.
* Reran the benchmarks, updated the commit messages to correspond to
the new results.


Joanne Koong (5):
  bpf: Add bitset map with bloom filter capabilities
  libbpf: Add "map_extra" as a per-map-type extra flag
  selftests/bpf: Add bitset map test cases
  bpf/benchs: Add benchmark tests for bloom filter throughput + false
    positive
  bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out
    bloom filter

 include/linux/bpf.h                           |   2 +
 include/linux/bpf_types.h                     |   1 +
 include/uapi/linux/bpf.h                      |  10 +
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/bitset.c                           | 256 ++++++++++
 kernel/bpf/syscall.c                          |  25 +-
 kernel/bpf/verifier.c                         |  10 +-
 tools/include/uapi/linux/bpf.h                |  10 +
 tools/lib/bpf/bpf.c                           |   1 +
 tools/lib/bpf/bpf.h                           |   1 +
 tools/lib/bpf/bpf_helpers.h                   |   1 +
 tools/lib/bpf/libbpf.c                        |  25 +-
 tools/lib/bpf/libbpf.h                        |   4 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_internal.h               |   4 +-
 tools/testing/selftests/bpf/Makefile          |   6 +-
 tools/testing/selftests/bpf/bench.c           |  59 ++-
 tools/testing/selftests/bpf/bench.h           |   3 +
 .../bpf/benchs/bench_bloom_filter_map.c       | 448 ++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  |  45 ++
 .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
 .../selftests/bpf/benchs/run_common.sh        |  60 +++
 tools/testing/selftests/bpf/bpf_util.h        |  11 +
 .../selftests/bpf/prog_tests/bitset_map.c     | 279 +++++++++++
 .../testing/selftests/bpf/progs/bitset_map.c  | 115 +++++
 .../selftests/bpf/progs/bloom_filter_bench.c  | 146 ++++++
 26 files changed, 1513 insertions(+), 43 deletions(-)
 create mode 100644 kernel/bpf/bitset.c
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
 create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bitset_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitset_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_bench.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-06 22:20 [PATCH bpf-next v4 0/5] Implement bitset maps, with bloom filter Joanne Koong
@ 2021-10-06 22:20 ` Joanne Koong
  2021-10-07 14:20   ` Toke Høiland-Jørgensen
  2021-10-08 23:05   ` Andrii Nakryiko
  2021-10-06 22:21 ` [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag Joanne Koong
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 34+ messages in thread
From: Joanne Koong @ 2021-10-06 22:20 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patch adds the kernel-side changes for the implementation of
a bitset map with bloom filter capabilities.

The bitset map does not have keys, only values since it is a
non-associative data type. When the bitset map is created, it must
be created with a key_size of 0, and the max_entries value should be the
desired size of the bitset, in number of bits.

The bitset map supports peek (determining whether a bit is set in the
map), push (setting a bit in the map), and pop (clearing a bit in the
map) operations. These operations are exposed to userspace applications
through the already existing syscalls in the following way:

BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem

For updates, the user will pass in a NULL key and the index of the
bit to set in the bitmap as the value. For lookups, the user will pass
in the index of the bit to check as the value. If the bit is set, 0
will be returned, else -ENOENT. For clearing the bit, the user will pass
in the index of the bit to clear as the value.

Since we need to pass in the index of the bit to set/clear in the bitmap
whenever we do a lookup/clear, in the verifier layer this requires us to
modify the argument type of a bitset's BPF_FUNC_map_peek_elem and
BPF_FUNC_map_pop_elem calls to ARG_PTR_TO_MAP_VALUE; correspondingly, in
the syscall layer, we need to copy over the user value so that in
bpf_map_peek_elem and bpf_map_pop_elem, we have the specific
value to check.

The bitset map may be used as an inner map.

The bitset map may also have additional bloom filter capabilities. The
lower 4 bits of the map_extra flags for the bitset map denote how many
hash functions to use. If more than 0 hash functions is specified, the
bitset map will operate as a bloom filter where a set of bits are
set/checked where the bits are the results from the bloom filter
functions. Right now, jhash is function used; in the future, this can be
expanded to use map_extra to specify which hash function to use.

A few things to additionally please take note of:
 * If there are any concurrent lookups + updates in the bloom filter, the
user is responsible for synchronizing this to ensure no false negative
lookups occur.
 * Deleting an element in the bloom filter map is not supported.
 * The benchmarks later in this patchset can help compare the performance
of using different number of hashes on different entry sizes. In general,
using more hashes increases the false positive rate of an element being
detected in the bloom filter but decreases the speed of a lookup.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 include/linux/bpf.h            |   2 +
 include/linux/bpf_types.h      |   1 +
 include/uapi/linux/bpf.h       |   9 ++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/bitset.c            | 256 +++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  25 +++-
 kernel/bpf/verifier.c          |  10 +-
 tools/include/uapi/linux/bpf.h |   9 ++
 8 files changed, 308 insertions(+), 6 deletions(-)
 create mode 100644 kernel/bpf/bitset.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d604c8251d88..eac5bcecc6a7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -193,6 +193,8 @@ struct bpf_map {
 	struct work_struct work;
 	struct mutex freeze_mutex;
 	u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */
+
+	u32 map_extra; /* any per-map-type extra fields */
 };
 
 static inline bool map_value_has_spin_lock(const struct bpf_map *map)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 9c81724e4b98..85339faeca71 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_BITSET, bitset_map_ops)
 
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6fc59d61937a..b40fa1a72a75 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_RINGBUF,
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BITSET,
 };
 
 /* Note that tracing related programs such as
@@ -1252,6 +1253,13 @@ struct bpf_stack_build_id {
 
 #define BPF_OBJ_NAME_LEN 16U
 
+/* map_extra flags for bitset maps
+ *
+ * The lowest 4 bits are reserved for indicating the number of hash functions.
+ * If the number of hash functions is greater than 0, the bitset map will
+ * be used as a bloom filter.
+ */
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -1274,6 +1282,7 @@ union bpf_attr {
 						   * struct stored as the
 						   * map value
 						   */
+		__u32	map_extra;	/* any per-map-type extra fields */
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 7f33098ca63f..72e381adc708 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -7,7 +7,7 @@ endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
-obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
+obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bitset.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
diff --git a/kernel/bpf/bitset.c b/kernel/bpf/bitset.c
new file mode 100644
index 000000000000..a5fca0ebb520
--- /dev/null
+++ b/kernel/bpf/bitset.c
@@ -0,0 +1,256 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <linux/bitmap.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/err.h>
+#include <linux/jhash.h>
+#include <linux/random.h>
+
+#define BITSET_MAP_CREATE_FLAG_MASK \
+	(BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
+
+struct bpf_bitset_map {
+	struct bpf_map map;
+
+	/* If the number of hash functions at map creation time is greater
+	 * than 0, the bitset map will function as a bloom filter and the fields
+	 * in the struct below will be initialized accordingly.
+	 */
+	struct {
+		u32 nr_hash_funcs;
+		u32 bitset_mask;
+		u32 hash_seed;
+		/* If the size of the values in the bloom filter is u32 aligned,
+		 * then it is more performant to use jhash2 as the underlying hash
+		 * function, else we use jhash. This tracks the number of u32s
+		 * in an u32-aligned value size. If the value size is not u32 aligned,
+		 * this will be 0.
+		 */
+		u32 aligned_u32_count;
+	} bloom_filter;
+
+	unsigned long bitset[];
+};
+
+static inline bool use_bloom_filter(struct bpf_bitset_map *map)
+{
+	return map->bloom_filter.nr_hash_funcs > 0;
+}
+
+static u32 hash(struct bpf_bitset_map *map, void *value,
+		u64 value_size, u32 index)
+{
+	u32 h;
+
+	if (map->bloom_filter.aligned_u32_count)
+		h = jhash2(value, map->bloom_filter.aligned_u32_count,
+			   map->bloom_filter.hash_seed + index);
+	else
+		h = jhash(value, value_size, map->bloom_filter.hash_seed + index);
+
+	return h & map->bloom_filter.bitset_mask;
+}
+
+static int bitset_map_push_elem(struct bpf_map *map, void *value,
+				u64 flags)
+{
+	struct bpf_bitset_map *bitset_map =
+		container_of(map, struct bpf_bitset_map, map);
+	u32 i, h, bitset_index;
+
+	if (flags != BPF_ANY)
+		return -EINVAL;
+
+	if (use_bloom_filter(bitset_map)) {
+		for (i = 0; i < bitset_map->bloom_filter.nr_hash_funcs; i++) {
+			h = hash(bitset_map, value, map->value_size, i);
+			set_bit(h, bitset_map->bitset);
+		}
+	} else {
+		bitset_index = *(u32 *)value;
+
+		if (bitset_index >= map->max_entries)
+			return -EINVAL;
+
+		set_bit(bitset_index, bitset_map->bitset);
+	}
+
+	return 0;
+}
+
+static int bitset_map_peek_elem(struct bpf_map *map, void *value)
+{
+	struct bpf_bitset_map *bitset_map =
+		container_of(map, struct bpf_bitset_map, map);
+	u32 i, h, bitset_index;
+
+	if (use_bloom_filter(bitset_map)) {
+		for (i = 0; i < bitset_map->bloom_filter.nr_hash_funcs; i++) {
+			h = hash(bitset_map, value, map->value_size, i);
+			if (!test_bit(h, bitset_map->bitset))
+				return -ENOENT;
+		}
+	} else {
+		bitset_index = *(u32 *)value;
+		if (bitset_index  >= map->max_entries)
+			return -EINVAL;
+
+		if (!test_bit(bitset_index, bitset_map->bitset))
+			return -ENOENT;
+	}
+
+	return 0;
+}
+
+static int bitset_map_pop_elem(struct bpf_map *map, void *value)
+{
+	struct bpf_bitset_map *bitset_map =
+		container_of(map, struct bpf_bitset_map, map);
+	u32 bitset_index;
+
+	if (use_bloom_filter(bitset_map))
+		return -EOPNOTSUPP;
+
+	bitset_index = *(u32 *)value;
+
+	if (!test_and_clear_bit(bitset_index, bitset_map->bitset))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void init_bloom_filter(struct bpf_bitset_map *bitset_map, union bpf_attr *attr,
+			      u32 nr_hash_funcs, u32 bitset_mask)
+{
+	bitset_map->bloom_filter.nr_hash_funcs = nr_hash_funcs;
+	bitset_map->bloom_filter.bitset_mask = bitset_mask;
+
+	/* Check whether the value size is u32-aligned */
+	if ((attr->value_size & (sizeof(u32) - 1)) == 0)
+		bitset_map->bloom_filter.aligned_u32_count =
+			attr->value_size / sizeof(u32);
+
+	if (!(attr->map_flags & BPF_F_ZERO_SEED))
+		bitset_map->bloom_filter.hash_seed = get_random_int();
+}
+
+static struct bpf_map *bitset_map_alloc(union bpf_attr *attr)
+{
+	int numa_node = bpf_map_attr_numa_node(attr);
+	u32 bitset_bytes, bitset_mask, nr_hash_funcs;
+	struct bpf_bitset_map *bitset_map;
+	u64 nr_bits_roundup_pow2;
+
+	if (!bpf_capable())
+		return ERR_PTR(-EPERM);
+
+	if (attr->key_size != 0 || attr->max_entries == 0 ||
+	    attr->map_flags & ~BITSET_MAP_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags))
+		return ERR_PTR(-EINVAL);
+
+	if (attr->map_extra & ~0xF)
+		return ERR_PTR(-EINVAL);
+
+	/* The lower 4 bits of map_extra specify the number of hash functions */
+	nr_hash_funcs = attr->map_extra & 0xF;
+
+	if (!nr_hash_funcs) {
+		if (attr->value_size != sizeof(u32))
+			return ERR_PTR(-EINVAL);
+
+		/* Round up to the size of an unsigned long since a bit gets set
+		 * at the granularity of an unsigned long.
+		 */
+		bitset_bytes = roundup(BITS_TO_BYTES(attr->max_entries),
+				       sizeof(unsigned long));
+	} else {
+		/* If the number of hash functions > 0, then the map will
+		 * function as a bloom filter
+		 */
+
+		if (attr->value_size == 0)
+			return ERR_PTR(-EINVAL);
+
+		/* We round up the size of the bitset to the nearest power of two to
+		 * enable more efficient hashing using a bitmask. The bitmask will be
+		 * the bitset size - 1.
+		 */
+		nr_bits_roundup_pow2 = roundup_pow_of_two(attr->max_entries);
+		bitset_mask = nr_bits_roundup_pow2 - 1;
+
+		bitset_bytes = roundup(BITS_TO_BYTES(nr_bits_roundup_pow2),
+				       sizeof(unsigned long));
+	}
+
+	bitset_map = bpf_map_area_alloc(sizeof(*bitset_map) + bitset_bytes,
+					numa_node);
+	if (!bitset_map)
+		return ERR_PTR(-ENOMEM);
+
+	bpf_map_init_from_attr(&bitset_map->map, attr);
+
+	if (nr_hash_funcs)
+		init_bloom_filter(bitset_map, attr, nr_hash_funcs, bitset_mask);
+
+	return &bitset_map->map;
+}
+
+static void bitset_map_free(struct bpf_map *map)
+{
+	struct bpf_bitset_map *bitset_map =
+		container_of(map, struct bpf_bitset_map, map);
+
+	bpf_map_area_free(bitset_map);
+}
+
+static void *bitset_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	/* The eBPF program should use map_peek_elem instead */
+	return ERR_PTR(-EINVAL);
+}
+
+static int bitset_map_update_elem(struct bpf_map *map, void *key,
+				  void *value, u64 flags)
+{
+	/* The eBPF program should use map_push_elem instead */
+	return -EINVAL;
+}
+
+static int bitset_map_delete_elem(struct bpf_map *map, void *key)
+{
+	return -EOPNOTSUPP;
+}
+
+static int bitset_map_get_next_key(struct bpf_map *map, void *key,
+				   void *next_key)
+{
+	return -EOPNOTSUPP;
+}
+
+static int bitset_map_check_btf(const struct bpf_map *map, const struct btf *btf,
+				const struct btf_type *key_type,
+				const struct btf_type *value_type)
+{
+	/* Bitset maps are keyless */
+	return btf_type_is_void(key_type) ? 0 : -EINVAL;
+}
+
+static int bpf_bitset_map_btf_id;
+const struct bpf_map_ops bitset_map_ops = {
+	.map_meta_equal = bpf_map_meta_equal,
+	.map_alloc = bitset_map_alloc,
+	.map_free = bitset_map_free,
+	.map_push_elem = bitset_map_push_elem,
+	.map_peek_elem = bitset_map_peek_elem,
+	.map_pop_elem = bitset_map_pop_elem,
+	.map_lookup_elem = bitset_map_lookup_elem,
+	.map_update_elem = bitset_map_update_elem,
+	.map_delete_elem = bitset_map_delete_elem,
+	.map_get_next_key = bitset_map_get_next_key,
+	.map_check_btf = bitset_map_check_btf,
+	.map_btf_name = "bpf_bitset_map",
+	.map_btf_id = &bpf_bitset_map_btf_id,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4e50c0bfdb7d..7726774d972a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -199,7 +199,8 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
 		err = bpf_fd_reuseport_array_update_elem(map, key, value,
 							 flags);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
+		   map->map_type == BPF_MAP_TYPE_STACK ||
+		   map->map_type == BPF_MAP_TYPE_BITSET) {
 		err = map->ops->map_push_elem(map, value, flags);
 	} else {
 		rcu_read_lock();
@@ -238,7 +239,8 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
+		   map->map_type == BPF_MAP_TYPE_STACK ||
+		   map->map_type == BPF_MAP_TYPE_BITSET) {
 		err = map->ops->map_peek_elem(map, value);
 	} else if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
 		/* struct_ops map requires directly updating "value" */
@@ -348,6 +350,7 @@ void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
 	map->max_entries = attr->max_entries;
 	map->map_flags = bpf_map_flags_retain_permanent(attr->map_flags);
 	map->numa_node = bpf_map_attr_numa_node(attr);
+	map->map_extra = attr->map_extra;
 }
 
 static int bpf_map_alloc_id(struct bpf_map *map)
@@ -553,6 +556,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   "value_size:\t%u\n"
 		   "max_entries:\t%u\n"
 		   "map_flags:\t%#x\n"
+		   "map_extra:\t%#x\n"
 		   "memlock:\t%lu\n"
 		   "map_id:\t%u\n"
 		   "frozen:\t%u\n",
@@ -561,6 +565,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   map->value_size,
 		   map->max_entries,
 		   map->map_flags,
+		   map->map_extra,
 		   bpf_map_memory_footprint(map),
 		   map->id,
 		   READ_ONCE(map->frozen));
@@ -810,7 +815,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD btf_vmlinux_value_type_id
+#define BPF_MAP_CREATE_LAST_FIELD map_extra
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
@@ -1080,6 +1085,14 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (!value)
 		goto free_key;
 
+	if (map->map_type == BPF_MAP_TYPE_BITSET) {
+		if (copy_from_user(value, uvalue, value_size))
+			err = -EFAULT;
+		else
+			err = bpf_map_copy_value(map, key, value, attr->flags);
+		goto free_value;
+	}
+
 	err = bpf_map_copy_value(map, key, value, attr->flags);
 	if (err)
 		goto free_value;
@@ -1549,6 +1562,12 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 	if (map->map_type == BPF_MAP_TYPE_QUEUE ||
 	    map->map_type == BPF_MAP_TYPE_STACK) {
 		err = map->ops->map_pop_elem(map, value);
+	} else if (map->map_type == BPF_MAP_TYPE_BITSET) {
+		if (copy_from_user(value, uvalue, value_size))
+			err = -EFAULT;
+		else
+			err = map->ops->map_pop_elem(map, value);
+		goto free_value;
 	} else if (map->map_type == BPF_MAP_TYPE_HASH ||
 		   map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
 		   map->map_type == BPF_MAP_TYPE_LRU_HASH ||
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 20900a1bac12..731cc90b6e98 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5007,7 +5007,11 @@ static int resolve_map_arg_type(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 		break;
-
+	case BPF_MAP_TYPE_BITSET:
+		if (meta->func_id == BPF_FUNC_map_peek_elem ||
+		    meta->func_id == BPF_FUNC_map_pop_elem)
+			*arg_type = ARG_PTR_TO_MAP_VALUE;
+		break;
 	default:
 		break;
 	}
@@ -5562,6 +5566,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		break;
 	case BPF_MAP_TYPE_QUEUE:
 	case BPF_MAP_TYPE_STACK:
+	case BPF_MAP_TYPE_BITSET:
 		if (func_id != BPF_FUNC_map_peek_elem &&
 		    func_id != BPF_FUNC_map_pop_elem &&
 		    func_id != BPF_FUNC_map_push_elem)
@@ -5653,7 +5658,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_FUNC_map_pop_elem:
 	case BPF_FUNC_map_push_elem:
 		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
-		    map->map_type != BPF_MAP_TYPE_STACK)
+		    map->map_type != BPF_MAP_TYPE_STACK &&
+		    map->map_type != BPF_MAP_TYPE_BITSET)
 			goto error;
 		break;
 	case BPF_FUNC_sk_storage_get:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6fc59d61937a..b40fa1a72a75 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_RINGBUF,
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BITSET,
 };
 
 /* Note that tracing related programs such as
@@ -1252,6 +1253,13 @@ struct bpf_stack_build_id {
 
 #define BPF_OBJ_NAME_LEN 16U
 
+/* map_extra flags for bitset maps
+ *
+ * The lowest 4 bits are reserved for indicating the number of hash functions.
+ * If the number of hash functions is greater than 0, the bitset map will
+ * be used as a bloom filter.
+ */
+
 union bpf_attr {
 	struct { /* anonymous struct used by BPF_MAP_CREATE command */
 		__u32	map_type;	/* one of enum bpf_map_type */
@@ -1274,6 +1282,7 @@ union bpf_attr {
 						   * struct stored as the
 						   * map value
 						   */
+		__u32	map_extra;	/* any per-map-type extra fields */
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag
  2021-10-06 22:20 [PATCH bpf-next v4 0/5] Implement bitset maps, with bloom filter Joanne Koong
  2021-10-06 22:20 ` [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities Joanne Koong
@ 2021-10-06 22:21 ` Joanne Koong
  2021-10-08 23:19   ` Andrii Nakryiko
  2021-10-09  2:12   ` Andrii Nakryiko
  2021-10-06 22:21 ` [PATCH bpf-next v4 3/5] selftests/bpf: Add bitset map test cases Joanne Koong
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 34+ messages in thread
From: Joanne Koong @ 2021-10-06 22:21 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patch adds the libbpf infrastructure for supporting a
per-map-type "map_extra" field, whose definition will be
idiosyncratic depending on map type.

For example, for the bitset map, the lower 4 bits of map_extra
is used to denote the number of hash functions.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 include/uapi/linux/bpf.h        |  1 +
 tools/include/uapi/linux/bpf.h  |  1 +
 tools/lib/bpf/bpf.c             |  1 +
 tools/lib/bpf/bpf.h             |  1 +
 tools/lib/bpf/bpf_helpers.h     |  1 +
 tools/lib/bpf/libbpf.c          | 25 ++++++++++++++++++++++++-
 tools/lib/bpf/libbpf.h          |  4 ++++
 tools/lib/bpf/libbpf.map        |  2 ++
 tools/lib/bpf/libbpf_internal.h |  4 +++-
 9 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b40fa1a72a75..a6f225e9c95a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5639,6 +5639,7 @@ struct bpf_map_info {
 	__u32 btf_id;
 	__u32 btf_key_type_id;
 	__u32 btf_value_type_id;
+	__u32 map_extra;
 } __attribute__((aligned(8)));
 
 struct bpf_btf_info {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b40fa1a72a75..a6f225e9c95a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5639,6 +5639,7 @@ struct bpf_map_info {
 	__u32 btf_id;
 	__u32 btf_key_type_id;
 	__u32 btf_value_type_id;
+	__u32 map_extra;
 } __attribute__((aligned(8)));
 
 struct bpf_btf_info {
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 7d1741ceaa32..41e3e85e7789 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -97,6 +97,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
 	attr.btf_key_type_id = create_attr->btf_key_type_id;
 	attr.btf_value_type_id = create_attr->btf_value_type_id;
 	attr.map_ifindex = create_attr->map_ifindex;
+	attr.map_extra = create_attr->map_extra;
 	if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
 		attr.btf_vmlinux_value_type_id =
 			create_attr->btf_vmlinux_value_type_id;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 6fffb3cdf39b..c4049f2d63cc 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -50,6 +50,7 @@ struct bpf_create_map_attr {
 		__u32 inner_map_fd;
 		__u32 btf_vmlinux_value_type_id;
 	};
+	__u32 map_extra;
 };
 
 LIBBPF_API int
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 963b1060d944..bce5a0090f3f 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -133,6 +133,7 @@ struct bpf_map_def {
 	unsigned int value_size;
 	unsigned int max_entries;
 	unsigned int map_flags;
+	unsigned int map_extra;
 };
 
 enum libbpf_pin_type {
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ed313fd491bd..12a9ecd45a78 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2274,6 +2274,10 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
 			}
 			map_def->pinning = val;
 			map_def->parts |= MAP_DEF_PINNING;
+		} else if (strcmp(name, "map_extra") == 0) {
+			if (!get_map_field_int(map_name, btf, m, &map_def->map_extra))
+				return -EINVAL;
+			map_def->parts |= MAP_DEF_MAP_EXTRA;
 		} else {
 			if (strict) {
 				pr_warn("map '%s': unknown field '%s'.\n", map_name, name);
@@ -2298,6 +2302,7 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
 	map->def.value_size = def->value_size;
 	map->def.max_entries = def->max_entries;
 	map->def.map_flags = def->map_flags;
+	map->def.map_extra = def->map_extra;
 
 	map->numa_node = def->numa_node;
 	map->btf_key_type_id = def->key_type_id;
@@ -2322,6 +2327,8 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
 		pr_debug("map '%s': found max_entries = %u.\n", map->name, def->max_entries);
 	if (def->parts & MAP_DEF_MAP_FLAGS)
 		pr_debug("map '%s': found map_flags = %u.\n", map->name, def->map_flags);
+	if (def->parts & MAP_DEF_MAP_EXTRA)
+		pr_debug("map '%s': found map_extra = %u.\n", map->name, def->map_extra);
 	if (def->parts & MAP_DEF_PINNING)
 		pr_debug("map '%s': found pinning = %u.\n", map->name, def->pinning);
 	if (def->parts & MAP_DEF_NUMA_NODE)
@@ -4017,6 +4024,7 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
 	map->def.value_size = info.value_size;
 	map->def.max_entries = info.max_entries;
 	map->def.map_flags = info.map_flags;
+	map->def.map_extra = info.map_extra;
 	map->btf_key_type_id = info.btf_key_type_id;
 	map->btf_value_type_id = info.btf_value_type_id;
 	map->reused = true;
@@ -4534,7 +4542,8 @@ static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
 		map_info.key_size == map->def.key_size &&
 		map_info.value_size == map->def.value_size &&
 		map_info.max_entries == map->def.max_entries &&
-		map_info.map_flags == map->def.map_flags);
+		map_info.map_flags == map->def.map_flags &&
+		map_info.map_extra == map->def.map_extra);
 }
 
 static int
@@ -4631,6 +4640,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 	create_attr.key_size = def->key_size;
 	create_attr.value_size = def->value_size;
 	create_attr.numa_node = map->numa_node;
+	create_attr.map_extra = def->map_extra;
 
 	if (def->type == BPF_MAP_TYPE_PERF_EVENT_ARRAY && !def->max_entries) {
 		int nr_cpus;
@@ -8637,6 +8647,19 @@ int bpf_map__set_map_flags(struct bpf_map *map, __u32 flags)
 	return 0;
 }
 
+__u32 bpf_map__map_extra(const struct bpf_map *map)
+{
+	return map->def.map_extra;
+}
+
+int bpf_map__set_map_extra(struct bpf_map *map, __u32 map_extra)
+{
+	if (map->fd >= 0)
+		return libbpf_err(-EBUSY);
+	map->def.map_extra = map_extra;
+	return 0;
+}
+
 __u32 bpf_map__numa_node(const struct bpf_map *map)
 {
 	return map->numa_node;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 89ca9c83ed4e..55e8dfe6f3e1 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -486,6 +486,7 @@ struct bpf_map_def {
 	unsigned int value_size;
 	unsigned int max_entries;
 	unsigned int map_flags;
+	unsigned int map_extra;
 };
 
 /**
@@ -562,6 +563,9 @@ LIBBPF_API __u32 bpf_map__btf_value_type_id(const struct bpf_map *map);
 /* get/set map if_index */
 LIBBPF_API __u32 bpf_map__ifindex(const struct bpf_map *map);
 LIBBPF_API int bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
+/* get/set map map_extra flags */
+LIBBPF_API __u32 bpf_map__map_extra(const struct bpf_map *map);
+LIBBPF_API int bpf_map__set_map_extra(struct bpf_map *map, __u32 map_extra);
 
 typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void *);
 LIBBPF_API int bpf_map__set_priv(struct bpf_map *map, void *priv,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index f270d25e4af3..308378b3f20b 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -395,4 +395,6 @@ LIBBPF_0.6.0 {
 		bpf_object__prev_program;
 		btf__add_btf;
 		btf__add_tag;
+		bpf_map__map_extra;
+		bpf_map__set_map_extra;
 } LIBBPF_0.5.0;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index f7fd3944d46d..188db854d9c2 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -193,8 +193,9 @@ enum map_def_parts {
 	MAP_DEF_NUMA_NODE	= 0x080,
 	MAP_DEF_PINNING		= 0x100,
 	MAP_DEF_INNER_MAP	= 0x200,
+	MAP_DEF_MAP_EXTRA	= 0x400,
 
-	MAP_DEF_ALL		= 0x3ff, /* combination of all above */
+	MAP_DEF_ALL		= 0x7ff, /* combination of all above */
 };
 
 struct btf_map_def {
@@ -208,6 +209,7 @@ struct btf_map_def {
 	__u32 map_flags;
 	__u32 numa_node;
 	__u32 pinning;
+	__u32 map_extra;
 };
 
 int parse_btf_map_def(const char *map_name, struct btf *btf,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH bpf-next v4 3/5] selftests/bpf: Add bitset map test cases
  2021-10-06 22:20 [PATCH bpf-next v4 0/5] Implement bitset maps, with bloom filter Joanne Koong
  2021-10-06 22:20 ` [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities Joanne Koong
  2021-10-06 22:21 ` [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag Joanne Koong
@ 2021-10-06 22:21 ` Joanne Koong
  2021-10-06 22:21 ` [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive Joanne Koong
  2021-10-06 22:21 ` [PATCH bpf-next v4 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out bloom filter Joanne Koong
  4 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2021-10-06 22:21 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patch adds test cases for bpf bitset maps. They include tests
checking against invalid operations by userspace, tests for using the
bitset  map as an inner map, a bpf program that queries the bitset
map for values added by userspace, and well as tests for the bloom
filter capabilities of the bitset map.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 .../selftests/bpf/prog_tests/bitset_map.c     | 279 ++++++++++++++++++
 .../testing/selftests/bpf/progs/bitset_map.c  | 115 ++++++++
 2 files changed, 394 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bitset_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bitset_map.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bitset_map.c b/tools/testing/selftests/bpf/prog_tests/bitset_map.c
new file mode 100644
index 000000000000..8929cc598b9d
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bitset_map.c
@@ -0,0 +1,279 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <sys/syscall.h>
+#include <test_progs.h>
+#include "bitset_map.skel.h"
+
+static void test_bitset_map_fail(bool bloom_filter)
+{
+	struct bpf_create_map_attr xattr = {
+		.name = "bitset_map",
+		.map_type = BPF_MAP_TYPE_BITSET,
+		.max_entries = 100,
+		.value_size = bloom_filter ? 11 : sizeof(__u32),
+		.map_extra = bloom_filter ? 5 : 0,
+	};
+	__u32 value;
+	int fd, err;
+
+	/* Invalid key size */
+	xattr.key_size = 4;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bitset invalid key size"))
+		close(fd);
+	xattr.key_size = 0;
+
+	/* Invalid value size */
+	xattr.value_size = 0;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bitset invalid value size 0"))
+		close(fd);
+	if (!bloom_filter) {
+		/* For bitset maps that are not bloom filters, the value size must
+		 * be a __u32.
+		 */
+		xattr.value_size = sizeof(__u64);
+		fd = bpf_create_map_xattr(&xattr);
+		if (!ASSERT_LT(fd, 0, "bpf_create_map bitset invalid value size u64"))
+			close(fd);
+	}
+	xattr.value_size = bloom_filter ? 11 : sizeof(__u32);
+
+	/* Invalid max entries size */
+	xattr.max_entries = 0;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bitset invalid max entries size"))
+		close(fd);
+	xattr.max_entries = 100;
+
+	/* bitset maps do not support BPF_F_NO_PREALLOC */
+	xattr.map_flags = BPF_F_NO_PREALLOC;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bitset invalid flags"))
+		close(fd);
+	xattr.map_flags = 0;
+
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_GE(fd, 0, "bpf_create_map bitset"))
+		return;
+
+	/* Test invalid flags */
+	err = bpf_map_update_elem(fd, NULL, &value, -1);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bitset invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_EXIST);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bitset invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_F_LOCK);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bitset invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_NOEXIST);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bitset invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, 10000);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bitset invalid flags");
+
+	if (bloom_filter) {
+		err = bpf_map_update_elem(fd, NULL, &value, 0);
+		ASSERT_OK(err, "bpf_map_update_elem bitset invalid flags");
+
+		/* Clearing a bit is not allowed */
+		err = bpf_map_lookup_and_delete_elem(fd, NULL, &value);
+		ASSERT_EQ(err, -EOPNOTSUPP, "bpf_map_lookup_and_delete invalid");
+	} else {
+		/* Try clearing a bit that wasn't set */
+		err = bpf_map_lookup_and_delete_elem(fd, NULL, &value);
+		ASSERT_EQ(err, -EINVAL, "bpf_map_lookup_and_delete invalid bit");
+
+		/* Try setting a bit that is outside the bitset range */
+		value = xattr.max_entries;
+		err = bpf_map_update_elem(fd, NULL, &value, 0);
+		ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bitset out of range");
+	}
+
+	/* bpf_map_delete is not supported. Only use bpf_map_lookup_and_delete */
+	err = bpf_map_delete_elem(fd, &value);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_delete_elem");
+
+	close(fd);
+}
+
+static void test_bitset_map_clear(void)
+{
+	int fd, err;
+	__u32 val;
+
+	fd = bpf_create_map(BPF_MAP_TYPE_BITSET, 0, sizeof(__u32), 10, 0);
+	if (!ASSERT_GE(fd, 0, "bpf_create_map"))
+		return;
+
+	val = 3;
+	err = bpf_map_update_elem(fd, NULL, &val, 0);
+	if (!ASSERT_OK(err, "Set bit in bitmap"))
+		goto done;
+
+	err = bpf_map_lookup_elem(fd, NULL, &val);
+	if (!ASSERT_OK(err, "Check bit in bitmap"))
+		goto done;
+
+	err = bpf_map_lookup_and_delete_elem(fd, NULL, &val);
+	if (!ASSERT_OK(err, "Clear bit in bitmap"))
+		goto done;
+
+	err = bpf_map_lookup_elem(fd, NULL, &val);
+	if (!ASSERT_EQ(err, -ENOENT, "Check cleared bit in bitmap"))
+		goto done;
+
+done:
+	close(fd);
+}
+
+static void bitset_map(struct bitset_map *skel, struct bpf_program *prog)
+{
+	struct bpf_link *link;
+
+	link = bpf_program__attach(prog);
+	if (!ASSERT_OK_PTR(link, "link"))
+		return;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->error, 0, "error");
+
+	bpf_link__destroy(link);
+}
+
+static void bitset_inner_map(struct bitset_map *skel, const __u32 *rand_vals,
+			     __u32 nr_rand_vals)
+{
+	int outer_map_fd, inner_map_fd, err, i, key = 0;
+	struct bpf_create_map_attr xattr = {
+		.name = "bitset_inner_map",
+		.map_type = BPF_MAP_TYPE_BITSET,
+		.value_size = sizeof(__u32),
+		.max_entries = 1 << 16,
+	};
+	struct bpf_link *link;
+
+	/* Create a bitset map that will be used as the inner map */
+	inner_map_fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_GE(inner_map_fd, 0, "bpf_create_map bitset map as inner map"))
+		return;
+
+	for (i = 0; i < nr_rand_vals; i++) {
+		err = bpf_map_update_elem(inner_map_fd, NULL, rand_vals + i, BPF_ANY);
+		if (!ASSERT_OK(err, "Add random value to inner_map_fd"))
+			goto done;
+	}
+
+	/* Add the bitset map to the outer map */
+	outer_map_fd = bpf_map__fd(skel->maps.outer_map);
+	err = bpf_map_update_elem(outer_map_fd, &key, &inner_map_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "Add bitset map to outer map"))
+		goto done;
+
+	/* Attach the bitset_inner_map prog */
+	link = bpf_program__attach(skel->progs.prog_bitset_inner_map);
+	if (!ASSERT_OK_PTR(link, "link"))
+		goto delete_inner_map;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->error, 0, "error");
+
+	bpf_link__destroy(link);
+
+delete_inner_map:
+	/* Ensure the inner bitset map can be deleted */
+	err = bpf_map_delete_elem(outer_map_fd, &key);
+	ASSERT_OK(err, "Delete inner bitset map");
+
+done:
+	close(inner_map_fd);
+}
+
+static int setup_bitset_progs(struct bitset_map **out_skel, __u32 **out_rand_vals,
+			      __u32 *out_nr_rand_vals)
+{
+	int random_data_fd, bitset_fd, bloom_filter_fd;
+	struct bitset_map *skel;
+	__u32 *rand_vals = NULL;
+	__u32 map_size;
+	__u32 val;
+	int err, i;
+
+	/* Set up a bitset map skeleton */
+	skel = bitset_map__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "bitset_map__open_and_load"))
+		return -EINVAL;
+
+	/* Set up rand_vals */
+	map_size = bpf_map__max_entries(skel->maps.map_random_data);
+	rand_vals = malloc(sizeof(*rand_vals) * map_size);
+	if (!rand_vals) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	/* Generate random values and populate both skeletons */
+	random_data_fd = bpf_map__fd(skel->maps.map_random_data);
+	bitset_fd = bpf_map__fd(skel->maps.map_bitset);
+	bloom_filter_fd = bpf_map__fd(skel->maps.map_bloom_filter);
+	for (i = 0; i < map_size; i++) {
+		val = rand();
+
+		err = bpf_map_update_elem(random_data_fd, &i, &val, BPF_ANY);
+		if (!ASSERT_OK(err, "Add random value to map_random_data"))
+			goto error;
+
+		err = bpf_map_update_elem(bloom_filter_fd, NULL, &val, BPF_ANY);
+		if (!ASSERT_OK(err, "Add random value to map_bloom_filter"))
+			goto error;
+
+		/* Take the lower 16 bits */
+		val &= 0xFFFF;
+
+		rand_vals[i] = val;
+
+		err = bpf_map_update_elem(bitset_fd, NULL, &val, BPF_ANY);
+		if (!ASSERT_OK(err, "Add random value to map_bitset"))
+			goto error;
+	}
+
+	*out_skel = skel;
+	*out_rand_vals = rand_vals;
+	*out_nr_rand_vals = map_size;
+
+	return 0;
+
+error:
+	bitset_map__destroy(skel);
+	if (rand_vals)
+		free(rand_vals);
+	return err;
+}
+
+void test_bitset_map(void)
+{
+	__u32 *rand_vals, nr_rand_vals;
+	struct bitset_map *skel;
+	int err;
+
+	test_bitset_map_fail(false);
+	test_bitset_map_fail(true);
+
+	test_bitset_map_clear();
+
+	err = setup_bitset_progs(&skel, &rand_vals, &nr_rand_vals);
+	if (err)
+		return;
+
+	bitset_inner_map(skel, rand_vals, nr_rand_vals);
+	free(rand_vals);
+
+	bitset_map(skel, skel->progs.prog_bitset);
+	bitset_map(skel, skel->progs.prog_bloom_filter);
+
+	bitset_map__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bitset_map.c b/tools/testing/selftests/bpf/progs/bitset_map.c
new file mode 100644
index 000000000000..c284ade37db1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bitset_map.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct bpf_map;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1000);
+	__type(key, __u32);
+	__type(value, __u32);
+} map_random_data SEC(".maps");
+
+struct map_bitset_type {
+	__uint(type, BPF_MAP_TYPE_BITSET);
+	__uint(value_size, sizeof(__u32));
+	__uint(max_entries, 1 << 16);
+} map_bitset SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_BITSET);
+	__uint(value_size, sizeof(__u32));
+	__uint(max_entries, 10000);
+	__uint(map_extra, 5);
+} map_bloom_filter SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(int));
+	__uint(value_size, sizeof(int));
+	__array(values, struct map_bitset_type);
+} outer_map SEC(".maps");
+
+struct callback_ctx {
+	struct bpf_map *map;
+};
+
+int error = 0;
+
+static __u64
+check_bit(struct bpf_map *map, __u32 *key, __u32 *val,
+	  struct callback_ctx *data)
+{
+	__u32 index = *val & 0xFFFF;
+	int err;
+
+	err = bpf_map_peek_elem(data->map, &index);
+	if (err) {
+		error |= 1;
+		return 1; /* stop the iteration */
+	}
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bitset(void *ctx)
+{
+	struct callback_ctx data;
+
+	data.map = (struct bpf_map *)&map_bitset;
+	bpf_for_each_map_elem(&map_random_data, check_bit, &data, 0);
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bitset_inner_map(void *ctx)
+{
+	struct bpf_map *inner_map;
+	struct callback_ctx data;
+	int key = 0;
+
+	inner_map = bpf_map_lookup_elem(&outer_map, &key);
+	if (!inner_map) {
+		error |= 2;
+		return 0;
+	}
+
+	data.map = inner_map;
+	bpf_for_each_map_elem(&map_random_data, check_bit, &data, 0);
+
+	return 0;
+}
+
+static __u64
+check_elem(struct bpf_map *map, __u32 *key, __u32 *val,
+	   struct callback_ctx *data)
+{
+	int err;
+
+	err = bpf_map_peek_elem(data->map, val);
+	if (err) {
+		error |= 3;
+		return 1; /* stop the iteration */
+	}
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter(void *ctx)
+{
+	struct callback_ctx data;
+
+	data.map = (struct bpf_map *)&map_bloom_filter;
+	bpf_for_each_map_elem(&map_random_data, check_elem, &data, 0);
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
  2021-10-06 22:20 [PATCH bpf-next v4 0/5] Implement bitset maps, with bloom filter Joanne Koong
                   ` (2 preceding siblings ...)
  2021-10-06 22:21 ` [PATCH bpf-next v4 3/5] selftests/bpf: Add bitset map test cases Joanne Koong
@ 2021-10-06 22:21 ` Joanne Koong
  2021-10-06 22:35   ` Joanne Koong
  2021-10-09  2:39   ` Andrii Nakryiko
  2021-10-06 22:21 ` [PATCH bpf-next v4 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out bloom filter Joanne Koong
  4 siblings, 2 replies; 34+ messages in thread
From: Joanne Koong @ 2021-10-06 22:21 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patch adds benchmark tests for the throughput (for lookups + updates)
and the false positive rate of bloom filter lookups, as well as some
minor refactoring of the bash script for running the benchmarks.

These benchmarks show that as the number of hash functions increases,
the throughput and the false positive rate of the bloom filter decreases.
From the benchmark data, the approximate average false-positive rates for
8-byte values are roughly as follows:

1 hash function = ~30%
2 hash functions = ~15%
3 hash functions = ~5%
4 hash functions = ~2.5%
5 hash functions = ~1%
6 hash functions = ~0.5%
7 hash functions  = ~0.35%
8 hash functions = ~0.15%
9 hash functions = ~0.1%
10 hash functions = ~0%

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 tools/testing/selftests/bpf/Makefile          |   6 +-
 tools/testing/selftests/bpf/bench.c           |  37 ++
 tools/testing/selftests/bpf/bench.h           |   3 +
 .../bpf/benchs/bench_bloom_filter_map.c       | 411 ++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
 .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
 .../selftests/bpf/benchs/run_common.sh        |  48 ++
 tools/testing/selftests/bpf/bpf_util.h        |  11 +
 .../selftests/bpf/progs/bloom_filter_bench.c  | 146 +++++++
 9 files changed, 690 insertions(+), 30 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
 create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
 create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_bench.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index c5c9a9f50d8d..66e1ad17acef 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -517,18 +517,20 @@ $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
 # Benchmark runner
 $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h $(BPFOBJ)
 	$(call msg,CC,,$@)
-	$(Q)$(CC) $(CFLAGS) -c $(filter %.c,$^) $(LDLIBS) -o $@
+	$(Q)$(CC) $(CFLAGS) -O2 -c $(filter %.c,$^) $(LDLIBS) -o $@
 $(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h
 $(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h
 $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
 			    $(OUTPUT)/perfbuf_bench.skel.h
+$(OUTPUT)/bench_bloom_filter_map.o: $(OUTPUT)/bloom_filter_bench.skel.h
 $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ)
 $(OUTPUT)/bench: LDLIBS += -lm
 $(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \
 		 $(OUTPUT)/bench_count.o \
 		 $(OUTPUT)/bench_rename.o \
 		 $(OUTPUT)/bench_trigger.o \
-		 $(OUTPUT)/bench_ringbufs.o
+		 $(OUTPUT)/bench_ringbufs.o \
+		 $(OUTPUT)/bench_bloom_filter_map.o
 	$(call msg,BINARY,,$@)
 	$(Q)$(CC) $(LDFLAGS) -o $@ $(filter %.a %.o,$^) $(LDLIBS)
 
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 6ea15b93a2f8..e836bacd6f9d 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -51,6 +51,35 @@ void setup_libbpf()
 		fprintf(stderr, "failed to increase RLIMIT_MEMLOCK: %d", err);
 }
 
+void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns)
+{
+	long total = res->false_hits  + res->hits + res->drops;
+
+	printf("Iter %3d (%7.3lfus): ",
+	       iter, (delta_ns - 1000000000) / 1000.0);
+
+	printf("%ld false hits of %ld total operations. Percentage = %2.2f %%\n",
+	       res->false_hits, total, ((float)res->false_hits / total) * 100);
+}
+
+void false_hits_report_final(struct bench_res res[], int res_cnt)
+{
+	long total_hits = 0, total_drops = 0, total_false_hits = 0, total_ops = 0;
+	int i;
+
+	for (i = 0; i < res_cnt; i++) {
+		total_hits += res[i].hits;
+		total_false_hits += res[i].false_hits;
+		total_drops += res[i].drops;
+	}
+	total_ops = total_hits + total_false_hits + total_drops;
+
+	printf("Summary: %ld false hits of %ld total operations. ",
+	       total_false_hits, total_ops);
+	printf("Percentage =  %2.2f %%\n",
+	       ((float)total_false_hits / total_ops) * 100);
+}
+
 void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns)
 {
 	double hits_per_sec, drops_per_sec;
@@ -132,9 +161,11 @@ static const struct argp_option opts[] = {
 };
 
 extern struct argp bench_ringbufs_argp;
+extern struct argp bench_bloom_filter_map_argp;
 
 static const struct argp_child bench_parsers[] = {
 	{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
+	{ &bench_bloom_filter_map_argp, 0, "Bloom filter map benchmark", 0 },
 	{},
 };
 
@@ -323,6 +354,9 @@ extern const struct bench bench_rb_libbpf;
 extern const struct bench bench_rb_custom;
 extern const struct bench bench_pb_libbpf;
 extern const struct bench bench_pb_custom;
+extern const struct bench bench_bloom_filter_lookup;
+extern const struct bench bench_bloom_filter_update;
+extern const struct bench bench_bloom_filter_false_positive;
 
 static const struct bench *benchs[] = {
 	&bench_count_global,
@@ -344,6 +378,9 @@ static const struct bench *benchs[] = {
 	&bench_rb_custom,
 	&bench_pb_libbpf,
 	&bench_pb_custom,
+	&bench_bloom_filter_lookup,
+	&bench_bloom_filter_update,
+	&bench_bloom_filter_false_positive,
 };
 
 static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/bench.h b/tools/testing/selftests/bpf/bench.h
index c1f48a473b02..624c6b11501f 100644
--- a/tools/testing/selftests/bpf/bench.h
+++ b/tools/testing/selftests/bpf/bench.h
@@ -33,6 +33,7 @@ struct env {
 struct bench_res {
 	long hits;
 	long drops;
+	long false_hits;
 };
 
 struct bench {
@@ -56,6 +57,8 @@ extern const struct bench *bench;
 void setup_libbpf();
 void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns);
 void hits_drops_report_final(struct bench_res res[], int res_cnt);
+void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns);
+void false_hits_report_final(struct bench_res res[], int res_cnt);
 
 static inline __u64 get_time_ns() {
 	struct timespec t;
diff --git a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
new file mode 100644
index 000000000000..25d6b8179c8e
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
@@ -0,0 +1,411 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <argp.h>
+#include <linux/log2.h>
+#include <pthread.h>
+#include "bench.h"
+#include "bloom_filter_bench.skel.h"
+#include "bpf_util.h"
+
+static struct ctx {
+	struct bloom_filter_bench *skel;
+
+	int bloom_filter_fd;
+	int hashmap_fd;
+	int array_map_fd;
+
+	pthread_mutex_t map_done_mtx;
+	pthread_cond_t map_done;
+	bool map_prepare_err;
+
+	__u32 next_map_idx;
+
+} ctx = {
+	.map_done_mtx = PTHREAD_MUTEX_INITIALIZER,
+	.map_done = PTHREAD_COND_INITIALIZER,
+};
+
+struct stat {
+	__u32 stats[3];
+};
+
+static struct {
+	__u32 nr_entries;
+	__u8 nr_hash_funcs;
+	__u32 value_size;
+} args = {
+	.nr_entries = 1000,
+	.nr_hash_funcs = 3,
+	.value_size = 8,
+};
+
+enum {
+	ARG_NR_ENTRIES = 3000,
+	ARG_NR_HASH_FUNCS = 3001,
+	ARG_VALUE_SIZE = 3002,
+};
+
+static const struct argp_option opts[] = {
+	{ "nr_entries", ARG_NR_ENTRIES, "NR_ENTRIES", 0,
+		"Set number of expected unique entries in the bloom filter"},
+	{ "nr_hash_funcs", ARG_NR_HASH_FUNCS, "NR_HASH_FUNCS", 0,
+		"Set number of hash functions in the bloom filter"},
+	{ "value_size", ARG_VALUE_SIZE, "VALUE_SIZE", 0,
+		"Set value size (in bytes) of bloom filter entries"},
+	{},
+};
+
+static error_t parse_arg(int key, char *arg, struct argp_state *state)
+{
+	switch (key) {
+	case ARG_NR_ENTRIES:
+		args.nr_entries = strtol(arg, NULL, 10);
+		if (args.nr_entries == 0) {
+			fprintf(stderr, "Invalid nr_entries count.");
+			argp_usage(state);
+		}
+		break;
+	case ARG_NR_HASH_FUNCS:
+		args.nr_hash_funcs = strtol(arg, NULL, 10);
+		if (args.nr_hash_funcs == 0 || args.nr_hash_funcs > 15) {
+			fprintf(stderr,
+				"The bloom filter must use 1 to 15 hash functions.");
+			argp_usage(state);
+		}
+		break;
+	case ARG_VALUE_SIZE:
+		args.value_size = strtol(arg, NULL, 10);
+		if (args.value_size < 2 || args.value_size > 256) {
+			fprintf(stderr,
+				"Invalid value size. Must be between 2 and 256 bytes");
+			argp_usage(state);
+		}
+		break;
+	default:
+		return ARGP_ERR_UNKNOWN;
+	}
+
+	return 0;
+}
+
+/* exported into benchmark runner */
+const struct argp bench_bloom_filter_map_argp = {
+	.options = opts,
+	.parser = parse_arg,
+};
+
+static void validate(void)
+{
+	if (env.consumer_cnt != 1) {
+		fprintf(stderr,
+			"The bloom filter benchmarks do not support multi-consumer use\n");
+		exit(1);
+	}
+}
+
+static inline void trigger_bpf_program(void)
+{
+	syscall(__NR_getpgid);
+}
+
+static void *producer(void *input)
+{
+	while (true)
+		trigger_bpf_program();
+
+	return NULL;
+}
+
+static void *map_prepare_thread(void *arg)
+{
+	__u32 val_size, i;
+	void *val = NULL;
+	int err;
+
+	val_size = args.value_size;
+	val = malloc(val_size);
+	if (!val) {
+		ctx.map_prepare_err = true;
+		goto done;
+	}
+
+	while (true) {
+		i = __atomic_add_fetch(&ctx.next_map_idx, 1, __ATOMIC_RELAXED);
+		if (i > args.nr_entries)
+			break;
+
+again:
+		/* Populate hashmap, bloom filter map, and array map with the same
+		 * random values
+		 */
+		err = syscall(__NR_getrandom, val, val_size, 0);
+		if (err != val_size) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr, "failed to get random value: %d\n", -errno);
+			break;
+		}
+
+		err = bpf_map_update_elem(ctx.hashmap_fd, val, val, BPF_NOEXIST);
+		if (err) {
+			if (err != -EEXIST) {
+				ctx.map_prepare_err = true;
+				fprintf(stderr, "failed to add elem to hashmap: %d\n",
+					-errno);
+				break;
+			}
+			goto again;
+		}
+
+		i--;
+
+		err = bpf_map_update_elem(ctx.array_map_fd, &i, val, 0);
+		if (err) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr, "failed to add elem to array map: %d\n", -errno);
+			break;
+		}
+
+		err = bpf_map_update_elem(ctx.bloom_filter_fd, NULL, val, 0);
+		if (err) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr,
+				"failed to add elem to bloom filter map: %d\n", -errno);
+			break;
+		}
+	}
+done:
+	pthread_mutex_lock(&ctx.map_done_mtx);
+	pthread_cond_signal(&ctx.map_done);
+	pthread_mutex_unlock(&ctx.map_done_mtx);
+
+	if (val)
+		free(val);
+
+	return NULL;
+}
+
+static void populate_maps(void)
+{
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	pthread_t map_thread;
+	int i, err;
+
+	ctx.bloom_filter_fd = bpf_map__fd(ctx.skel->maps.bloom_filter_map);
+	ctx.hashmap_fd = bpf_map__fd(ctx.skel->maps.hashmap);
+	ctx.array_map_fd = bpf_map__fd(ctx.skel->maps.array_map);
+
+	for (i = 0; i < nr_cpus; i++) {
+		err = pthread_create(&map_thread, NULL, map_prepare_thread,
+				     NULL);
+		if (err) {
+			fprintf(stderr, "failed to create pthread: %d\n", -errno);
+			exit(1);
+		}
+	}
+
+	pthread_mutex_lock(&ctx.map_done_mtx);
+	pthread_cond_wait(&ctx.map_done, &ctx.map_done_mtx);
+	pthread_mutex_unlock(&ctx.map_done_mtx);
+
+	if (ctx.map_prepare_err)
+		exit(1);
+}
+
+static struct bloom_filter_bench *setup_skeleton(bool hashmap_use_bloom_filter)
+{
+	struct bloom_filter_bench *skel;
+	int err;
+
+	setup_libbpf();
+
+	skel = bloom_filter_bench__open();
+	if (!skel) {
+		fprintf(stderr, "failed to open skeleton\n");
+		exit(1);
+	}
+
+	skel->rodata->hashmap_use_bloom_filter = hashmap_use_bloom_filter;
+
+	/* Resize number of entries */
+	err = bpf_map__resize(skel->maps.hashmap, args.nr_entries);
+	if (err) {
+		fprintf(stderr, "failed to resize hashmap\n");
+		exit(1);
+	}
+
+	err = bpf_map__resize(skel->maps.array_map, args.nr_entries);
+	if (err) {
+		fprintf(stderr, "failed to resize array map\n");
+		exit(1);
+	}
+
+	err = bpf_map__resize(skel->maps.bloom_filter_map,
+			      BPF_BLOOM_FILTER_BITSET_SZ(args.nr_entries,
+							 args.nr_hash_funcs));
+	if (err) {
+		fprintf(stderr, "failed to resize bloom filter\n");
+		exit(1);
+	}
+
+	/* Set value size */
+	err = bpf_map__set_value_size(skel->maps.array_map, args.value_size);
+	if (err) {
+		fprintf(stderr, "failed to set array map value size\n");
+		exit(1);
+	}
+
+	err = bpf_map__set_value_size(skel->maps.bloom_filter_map, args.value_size);
+	if (err) {
+		fprintf(stderr, "failed to set bloom filter map value size\n");
+		exit(1);
+	}
+
+	err = bpf_map__set_value_size(skel->maps.hashmap, args.value_size);
+	if (err) {
+		fprintf(stderr, "failed to set hashmap value size\n");
+		exit(1);
+	}
+	/* For the hashmap, we use the value as the key as well */
+	err = bpf_map__set_key_size(skel->maps.hashmap, args.value_size);
+	if (err) {
+		fprintf(stderr, "failed to set hashmap value size\n");
+		exit(1);
+	}
+
+	skel->bss->value_sz_nr_u32s = (args.value_size + sizeof(__u32) - 1)
+		/ sizeof(__u32);
+
+	/* Set number of hash functions */
+	err = bpf_map__set_map_extra(skel->maps.bloom_filter_map,
+				     args.nr_hash_funcs);
+	if (err) {
+		fprintf(stderr, "failed to set bloom filter nr_hash_funcs\n");
+		exit(1);
+	}
+
+	if (bloom_filter_bench__load(skel)) {
+		fprintf(stderr, "failed to load skeleton\n");
+		exit(1);
+	}
+
+	return skel;
+}
+
+static void bench_setup_lookup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton(true);
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_lookup);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
+static void bench_setup_update(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton(true);
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_update);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
+static void hashmap_lookup_setup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton(true);
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_hashmap_lookup);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
+static void measure(struct bench_res *res)
+{
+	long total_hits = 0, total_drops = 0, total_false_hits = 0;
+	static long last_hits, last_drops, last_false_hits;
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	int hit_key, drop_key, false_hit_key;
+	int i;
+
+	hit_key = ctx.skel->rodata->hit_key;
+	drop_key = ctx.skel->rodata->drop_key;
+	false_hit_key = ctx.skel->rodata->false_hit_key;
+
+	if (ctx.skel->bss->error != 0) {
+		fprintf(stderr, "error (%d) when searching the bitset\n",
+			ctx.skel->bss->error);
+		exit(1);
+	}
+
+	for (i = 0; i < nr_cpus; i++) {
+		struct stat *s = (void *)&ctx.skel->bss->percpu_stats[i];
+
+		total_hits += s->stats[hit_key];
+		total_drops += s->stats[drop_key];
+		total_false_hits += s->stats[false_hit_key];
+	}
+
+	res->hits = total_hits - last_hits;
+	res->drops = total_drops - last_drops;
+	res->false_hits = total_false_hits - last_false_hits;
+
+	last_hits = total_hits;
+	last_drops = total_drops;
+	last_false_hits = total_false_hits;
+}
+
+static void *consumer(void *input)
+{
+	return NULL;
+}
+
+const struct bench bench_bloom_filter_lookup = {
+	.name = "bloom-filter-lookup",
+	.validate = validate,
+	.setup = bench_setup_lookup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_bloom_filter_update = {
+	.name = "bloom-filter-update",
+	.validate = validate,
+	.setup = bench_setup_update,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_bloom_filter_false_positive = {
+	.name = "bloom-filter-false-positive",
+	.validate = validate,
+	.setup = hashmap_lookup_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = false_hits_report_progress,
+	.report_final = false_hits_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
new file mode 100755
index 000000000000..5d4f84ad9973
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source ./benchs/run_common.sh
+
+set -eufo pipefail
+
+header "Bitset map"
+for v in 2, 4, 8, 16, 40; do
+for t in 1 4 8 12 16; do
+for h in {1..10}; do
+subtitle "value_size: $v bytes, # threads: $t, # hashes: $h"
+	for e in 10000 50000 75000 100000 250000 500000 750000 1000000 2500000 5000000; do
+		printf "%'d entries -\n" $e
+		printf "\t"
+		summarize "Lookups, total operations: " \
+			"$($RUN_BENCH -p $t --nr_hash_funcs $h --nr_entries $e --value_size $v bloom-filter-lookup)"
+		printf "\t"
+		summarize "Updates, total operations: " \
+			"$($RUN_BENCH -p $t --nr_hash_funcs $h --nr_entries $e --value_size $v bloom-filter-update)"
+		printf "\t"
+		summarize_percentage "False positive rate: " \
+			"$($RUN_BENCH -p $t --nr_hash_funcs $h --nr_entries $e --value_size $v bloom-filter-false-positive)"
+	done
+	printf "\n"
+done
+done
+done
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
index af4aa04caba6..ada028aa9007 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
@@ -1,34 +1,8 @@
 #!/bin/bash
 
-set -eufo pipefail
-
-RUN_BENCH="sudo ./bench -w3 -d10 -a"
-
-function hits()
-{
-	echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
-}
-
-function drops()
-{
-	echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
-}
+source ./benchs/run_common.sh
 
-function header()
-{
-	local len=${#1}
-
-	printf "\n%s\n" "$1"
-	for i in $(seq 1 $len); do printf '='; done
-	printf '\n'
-}
-
-function summarize()
-{
-	bench="$1"
-	summary=$(echo $2 | tail -n1)
-	printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
-}
+set -eufo pipefail
 
 header "Single-producer, parallel producer"
 for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
diff --git a/tools/testing/selftests/bpf/benchs/run_common.sh b/tools/testing/selftests/bpf/benchs/run_common.sh
new file mode 100644
index 000000000000..670f23b037c4
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_common.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+RUN_BENCH="sudo ./bench -w3 -d10 -a"
+
+function header()
+{
+	local len=${#1}
+
+	printf "\n%s\n" "$1"
+	for i in $(seq 1 $len); do printf '='; done
+	printf '\n'
+}
+
+function subtitle()
+{
+	local len=${#1}
+	printf "\t%s\n" "$1"
+}
+
+function hits()
+{
+	echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
+function drops()
+{
+	echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
+function percentage()
+{
+	echo "$*" | sed -E "s/.*Percentage\s=\s+([0-9]+\.[0-9]+).*/\1/"
+}
+
+function summarize()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
+}
+
+function summarize_percentage()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s%%\n" "$bench" "$(percentage $summary)"
+}
diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h
index a3352a64c067..a260a963efda 100644
--- a/tools/testing/selftests/bpf/bpf_util.h
+++ b/tools/testing/selftests/bpf/bpf_util.h
@@ -40,4 +40,15 @@ static inline unsigned int bpf_num_possible_cpus(void)
 	(offsetof(TYPE, MEMBER)	+ sizeof_field(TYPE, MEMBER))
 #endif
 
+/* Helper macro for computing the optimal number of bits for a
+ * bloom filter map.
+ *
+ * Mathematically, the optimal bitset size that minimizes the
+ * false positive probability is n * k / ln(2) where n is the expected
+ * number of unique entries in the bloom filter and k is the number of
+ * hash functions. We use 7 / 5 to approximate 1 / ln(2).
+ */
+#define BPF_BLOOM_FILTER_BITSET_SZ(nr_uniq_entries, nr_hash_funcs) \
+	((nr_uniq_entries) * (nr_hash_funcs) / 5 * 7)
+
 #endif /* __BPF_UTIL__ */
diff --git a/tools/testing/selftests/bpf/progs/bloom_filter_bench.c b/tools/testing/selftests/bpf/progs/bloom_filter_bench.c
new file mode 100644
index 000000000000..a44a47ddc4d7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bloom_filter_bench.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <errno.h>
+#include <linux/bpf.h>
+#include <stdbool.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct bpf_map;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(key_size, sizeof(__u32));
+	/* max entries and value_size will be set programmatically.
+	 * They are configurable from the userspace bench program.
+	 */
+} array_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_BITSET);
+	/* max entries,  value_size, and # of hash functions will be set
+	 * programmatically. They are configurable from the userspace
+	 * bench program.
+	 */
+} bloom_filter_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	/* max entries, key_size, and value_size, will be set
+	 * programmatically. They are configurable from the userspace
+	 * bench program.
+	 */
+} hashmap SEC(".maps");
+
+struct callback_ctx {
+	struct bpf_map *map;
+	bool update;
+};
+
+/* Tracks the number of hits, drops, and false hits */
+struct {
+	__u32 stats[3];
+} __attribute__((__aligned__(256))) percpu_stats[256];
+
+__u8 value_sz_nr_u32s;
+
+const __u32 hit_key  = 0;
+const __u32 drop_key  = 1;
+const __u32 false_hit_key = 2;
+
+const volatile bool hashmap_use_bloom_filter = true;
+
+int error = 0;
+
+static __always_inline void log_result(__u32 key)
+{
+	__u32 cpu = bpf_get_smp_processor_id();
+
+	percpu_stats[cpu & 255].stats[key]++;
+}
+
+static __u64
+bloom_filter_callback(struct bpf_map *map, __u32 *key, void *val,
+		      struct callback_ctx *data)
+{
+	int err;
+
+	if (data->update)
+		err = bpf_map_push_elem(data->map, val, 0);
+	else
+		err = bpf_map_peek_elem(data->map, val);
+
+	if (err) {
+		error |= 1;
+		return 1; /* stop the iteration */
+	}
+
+	log_result(hit_key);
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter_lookup(void *ctx)
+{
+	struct callback_ctx data;
+
+	data.map = (struct bpf_map *)&bloom_filter_map;
+	data.update = false;
+
+	bpf_for_each_map_elem(&array_map, bloom_filter_callback, &data, 0);
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter_update(void *ctx)
+{
+	struct callback_ctx data;
+
+	data.map = (struct bpf_map *)&bloom_filter_map;
+	data.update = true;
+
+	bpf_for_each_map_elem(&array_map, bloom_filter_callback, &data, 0);
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter_hashmap_lookup(void *ctx)
+{
+	__u64 *result;
+	int i, j, err;
+
+	__u32 val[64] = {0};
+
+	for (i = 0; i < 1024; i++) {
+		for (j = 0; j < value_sz_nr_u32s && j < 64; j++)
+			val[j] = bpf_get_prandom_u32();
+
+		if (hashmap_use_bloom_filter) {
+			err = bpf_map_peek_elem(&bloom_filter_map, val);
+			if (err) {
+				if (err != -ENOENT) {
+					error |= 3;
+					return 0;
+				}
+				log_result(hit_key);
+				continue;
+			}
+		}
+
+		result = bpf_map_lookup_elem(&hashmap, val);
+		if (result) {
+			log_result(hit_key);
+		} else {
+			if (hashmap_use_bloom_filter)
+				log_result(false_hit_key);
+			log_result(drop_key);
+		}
+	}
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH bpf-next v4 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out bloom filter
  2021-10-06 22:20 [PATCH bpf-next v4 0/5] Implement bitset maps, with bloom filter Joanne Koong
                   ` (3 preceding siblings ...)
  2021-10-06 22:21 ` [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive Joanne Koong
@ 2021-10-06 22:21 ` Joanne Koong
  4 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2021-10-06 22:21 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team, Joanne Koong

This patch adds benchmark tests for comparing the performance of hashmap
lookups without the bloom filter vs. hashmap lookups with the bloom filter.

Checking the bloom filter first for whether the element exists should
overall enable a higher throughput for hashmap lookups, since if the
element does not exist in the bloom filter, we can avoid a costly lookup in
the hashmap.

On average, using 5 hash functions in the bloom filter tended to perform
the best across the widest range of different entry sizes. The benchmark
results using 5 hash functions (running on 8 threads on a machine with one
numa node, and taking the average of 3 runs) were roughly as follows:

value_size = 4 bytes -
	10k entries: 30% faster
	50k entries: 50% faster
	100k entries: 55% faster
	500k entres: 80% faster
	1 million entries: 120% faster
	5 million entries: 135% faster

value_size = 8 bytes -
	10k entries: 35% faster
	50k entries: 55% faster
	100k entries: 70% faster
	500k entres: 110% faster
	1 million entries: 215% faster
	5 million entries: 215% faster

value_size = 16 bytes -
	10k entries: 5% slower
	50k entries: 25% faster
	100k entries: 35% faster
	500k entres: 105% faster
	1 million entries: 130% faster
	5 million entries: 105% faster

value_size = 40 bytes -
	10k entries: 5% slower
	50k entries: 10% faster
	100k entries: 20% faster
	500k entres: 45% faster
	1 million entries: 60% faster
	5 million entries: 75% faster

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 tools/testing/selftests/bpf/bench.c           | 22 ++++++++---
 .../bpf/benchs/bench_bloom_filter_map.c       | 37 +++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  | 17 +++++++++
 .../selftests/bpf/benchs/run_common.sh        | 12 ++++++
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index e836bacd6f9d..d9661f343556 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -92,20 +92,21 @@ void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns)
 	printf("Iter %3d (%7.3lfus): ",
 	       iter, (delta_ns - 1000000000) / 1000.0);
 
-	printf("hits %8.3lfM/s (%7.3lfM/prod), drops %8.3lfM/s\n",
-	       hits_per_sec, hits_per_prod, drops_per_sec);
+	printf("hits %8.3lfM/s (%7.3lfM/prod), drops %8.3lfM/s, total operations %8.3lfM/s\n",
+	       hits_per_sec, hits_per_prod, drops_per_sec, hits_per_sec + drops_per_sec);
 }
 
 void hits_drops_report_final(struct bench_res res[], int res_cnt)
 {
 	int i;
-	double hits_mean = 0.0, drops_mean = 0.0;
-	double hits_stddev = 0.0, drops_stddev = 0.0;
+	double hits_mean = 0.0, drops_mean = 0.0, total_ops_mean = 0.0;
+	double hits_stddev = 0.0, drops_stddev = 0.0, total_ops_stddev = 0.0;
 
 	for (i = 0; i < res_cnt; i++) {
 		hits_mean += res[i].hits / 1000000.0 / (0.0 + res_cnt);
 		drops_mean += res[i].drops / 1000000.0 / (0.0 + res_cnt);
 	}
+	total_ops_mean = hits_mean + drops_mean;
 
 	if (res_cnt > 1)  {
 		for (i = 0; i < res_cnt; i++) {
@@ -115,14 +116,21 @@ void hits_drops_report_final(struct bench_res res[], int res_cnt)
 			drops_stddev += (drops_mean - res[i].drops / 1000000.0) *
 					(drops_mean - res[i].drops / 1000000.0) /
 					(res_cnt - 1.0);
+			total_ops_stddev += (total_ops_mean -
+					(res[i].hits + res[i].drops) / 1000000.0) *
+					(total_ops_mean - (res[i].hits + res[i].drops) / 1000000.0)
+					/ (res_cnt - 1.0);
 		}
 		hits_stddev = sqrt(hits_stddev);
 		drops_stddev = sqrt(drops_stddev);
+		total_ops_stddev = sqrt(total_ops_stddev);
 	}
 	printf("Summary: hits %8.3lf \u00B1 %5.3lfM/s (%7.3lfM/prod), ",
 	       hits_mean, hits_stddev, hits_mean / env.producer_cnt);
-	printf("drops %8.3lf \u00B1 %5.3lfM/s\n",
+	printf("drops %8.3lf \u00B1 %5.3lfM/s, ",
 	       drops_mean, drops_stddev);
+	printf("total operations %8.3lf \u00B1 %5.3lfM/s\n",
+	       total_ops_mean, total_ops_stddev);
 }
 
 const char *argp_program_version = "benchmark";
@@ -357,6 +365,8 @@ extern const struct bench bench_pb_custom;
 extern const struct bench bench_bloom_filter_lookup;
 extern const struct bench bench_bloom_filter_update;
 extern const struct bench bench_bloom_filter_false_positive;
+extern const struct bench bench_hashmap_without_bloom_filter;
+extern const struct bench bench_hashmap_with_bloom_filter;
 
 static const struct bench *benchs[] = {
 	&bench_count_global,
@@ -381,6 +391,8 @@ static const struct bench *benchs[] = {
 	&bench_bloom_filter_lookup,
 	&bench_bloom_filter_update,
 	&bench_bloom_filter_false_positive,
+	&bench_hashmap_without_bloom_filter,
+	&bench_hashmap_with_bloom_filter,
 };
 
 static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
index 25d6b8179c8e..7af8fd455dc0 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
@@ -337,6 +337,21 @@ static void hashmap_lookup_setup(void)
 	}
 }
 
+static void hashmap_no_bloom_filter_setup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton(false);
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_hashmap_lookup);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
 static void measure(struct bench_res *res)
 {
 	long total_hits = 0, total_drops = 0, total_false_hits = 0;
@@ -409,3 +424,25 @@ const struct bench bench_bloom_filter_false_positive = {
 	.report_progress = false_hits_report_progress,
 	.report_final = false_hits_report_final,
 };
+
+const struct bench bench_hashmap_without_bloom_filter = {
+	.name = "hashmap-without-bloom-filter",
+	.validate = validate,
+	.setup = hashmap_no_bloom_filter_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_hashmap_with_bloom_filter = {
+	.name = "hashmap-with-bloom-filter",
+	.validate = validate,
+	.setup = hashmap_lookup_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
index 5d4f84ad9973..a8cff64d0492 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
@@ -26,3 +26,20 @@ subtitle "value_size: $v bytes, # threads: $t, # hashes: $h"
 done
 done
 done
+
+header "Hashmap without bitset vs. hashmap with bitset (throughput, 8 threads)"
+for v in 2, 4, 8, 16, 40; do
+for h in {1..10}; do
+subtitle "# hashes: $h"
+	for e in 10000 50000 75000 100000 250000 500000 750000 1000000 2500000 5000000; do
+		printf "%'d entries -\n" $e
+		printf "\t"
+		summarize_total "Hashmap without bitset: " \
+			"$($RUN_BENCH --nr_hash_funcs $h --nr_entries $e --value_size $v -p 8 hashmap-without-bloom-filter)"
+		printf "\t"
+		summarize_total "Hashmap with bitset: " \
+			"$($RUN_BENCH --nr_hash_funcs $h --nr_entries $e --value_size $v -p 8 hashmap-with-bloom-filter)"
+	done
+	printf "\n"
+done
+done
diff --git a/tools/testing/selftests/bpf/benchs/run_common.sh b/tools/testing/selftests/bpf/benchs/run_common.sh
index 670f23b037c4..9a16be78b180 100644
--- a/tools/testing/selftests/bpf/benchs/run_common.sh
+++ b/tools/testing/selftests/bpf/benchs/run_common.sh
@@ -33,6 +33,11 @@ function percentage()
 	echo "$*" | sed -E "s/.*Percentage\s=\s+([0-9]+\.[0-9]+).*/\1/"
 }
 
+function total()
+{
+	echo "$*" | sed -E "s/.*total operations\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
 function summarize()
 {
 	bench="$1"
@@ -46,3 +51,10 @@ function summarize_percentage()
 	summary=$(echo $2 | tail -n1)
 	printf "%-20s %s%%\n" "$bench" "$(percentage $summary)"
 }
+
+function summarize_total()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s\n" "$bench" "$(total $summary)"
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
  2021-10-06 22:21 ` [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive Joanne Koong
@ 2021-10-06 22:35   ` Joanne Koong
  2021-10-09  2:54     ` Andrii Nakryiko
  2021-10-09  2:39   ` Andrii Nakryiko
  1 sibling, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2021-10-06 22:35 UTC (permalink / raw)
  To: bpf; +Cc: Kernel-team

On 10/6/21 3:21 PM, Joanne Koong wrote:

> This patch adds benchmark tests for the throughput (for lookups + updates)
> and the false positive rate of bloom filter lookups, as well as some
> minor refactoring of the bash script for running the benchmarks.
>
> These benchmarks show that as the number of hash functions increases,
> the throughput and the false positive rate of the bloom filter decreases.
>  From the benchmark data, the approximate average false-positive rates for
> 8-byte values are roughly as follows:
>
> 1 hash function = ~30%
> 2 hash functions = ~15%
> 3 hash functions = ~5%
> 4 hash functions = ~2.5%
> 5 hash functions = ~1%
> 6 hash functions = ~0.5%
> 7 hash functions  = ~0.35%
> 8 hash functions = ~0.15%
> 9 hash functions = ~0.1%
> 10 hash functions = ~0%
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---
>   tools/testing/selftests/bpf/Makefile          |   6 +-
>   tools/testing/selftests/bpf/bench.c           |  37 ++
>   tools/testing/selftests/bpf/bench.h           |   3 +
>   .../bpf/benchs/bench_bloom_filter_map.c       | 411 ++++++++++++++++++
>   .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
>   .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
>   .../selftests/bpf/benchs/run_common.sh        |  48 ++
>   tools/testing/selftests/bpf/bpf_util.h        |  11 +
>   .../selftests/bpf/progs/bloom_filter_bench.c  | 146 +++++++
>   9 files changed, 690 insertions(+), 30 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
>   create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
>   create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
>   create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_bench.c
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index c5c9a9f50d8d..66e1ad17acef 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -517,18 +517,20 @@ $(OUTPUT)/test_cpp: test_cpp.cpp $(OUTPUT)/test_core_extern.skel.h $(BPFOBJ)
>   # Benchmark runner
>   $(OUTPUT)/bench_%.o: benchs/bench_%.c bench.h $(BPFOBJ)
>   	$(call msg,CC,,$@)
> -	$(Q)$(CC) $(CFLAGS) -c $(filter %.c,$^) $(LDLIBS) -o $@
> +	$(Q)$(CC) $(CFLAGS) -O2 -c $(filter %.c,$^) $(LDLIBS) -o $@
>   $(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h
>   $(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h
>   $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
>   			    $(OUTPUT)/perfbuf_bench.skel.h
> +$(OUTPUT)/bench_bloom_filter_map.o: $(OUTPUT)/bloom_filter_bench.skel.h
>   $(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ)
>   $(OUTPUT)/bench: LDLIBS += -lm
>   $(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \
>   		 $(OUTPUT)/bench_count.o \
>   		 $(OUTPUT)/bench_rename.o \
>   		 $(OUTPUT)/bench_trigger.o \
> -		 $(OUTPUT)/bench_ringbufs.o
> +		 $(OUTPUT)/bench_ringbufs.o \
> +		 $(OUTPUT)/bench_bloom_filter_map.o
>   	$(call msg,BINARY,,$@)
>   	$(Q)$(CC) $(LDFLAGS) -o $@ $(filter %.a %.o,$^) $(LDLIBS)
>   
> diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
> index 6ea15b93a2f8..e836bacd6f9d 100644
> --- a/tools/testing/selftests/bpf/bench.c
> +++ b/tools/testing/selftests/bpf/bench.c
> @@ -51,6 +51,35 @@ void setup_libbpf()
>   		fprintf(stderr, "failed to increase RLIMIT_MEMLOCK: %d", err);
>   }
>   
> +void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns)
> +{
> +	long total = res->false_hits  + res->hits + res->drops;
> +
> +	printf("Iter %3d (%7.3lfus): ",
> +	       iter, (delta_ns - 1000000000) / 1000.0);
> +
> +	printf("%ld false hits of %ld total operations. Percentage = %2.2f %%\n",
> +	       res->false_hits, total, ((float)res->false_hits / total) * 100);
> +}
> +
> +void false_hits_report_final(struct bench_res res[], int res_cnt)
> +{
> +	long total_hits = 0, total_drops = 0, total_false_hits = 0, total_ops = 0;
> +	int i;
> +
> +	for (i = 0; i < res_cnt; i++) {
> +		total_hits += res[i].hits;
> +		total_false_hits += res[i].false_hits;
> +		total_drops += res[i].drops;
> +	}
> +	total_ops = total_hits + total_false_hits + total_drops;
> +
> +	printf("Summary: %ld false hits of %ld total operations. ",
> +	       total_false_hits, total_ops);
> +	printf("Percentage =  %2.2f %%\n",
> +	       ((float)total_false_hits / total_ops) * 100);
> +}
> +
>   void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns)
>   {
>   	double hits_per_sec, drops_per_sec;
> @@ -132,9 +161,11 @@ static const struct argp_option opts[] = {
>   };
>   
>   extern struct argp bench_ringbufs_argp;
> +extern struct argp bench_bloom_filter_map_argp;
>   
>   static const struct argp_child bench_parsers[] = {
>   	{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
> +	{ &bench_bloom_filter_map_argp, 0, "Bloom filter map benchmark", 0 },
>   	{},
>   };
>   
> @@ -323,6 +354,9 @@ extern const struct bench bench_rb_libbpf;
>   extern const struct bench bench_rb_custom;
>   extern const struct bench bench_pb_libbpf;
>   extern const struct bench bench_pb_custom;
> +extern const struct bench bench_bloom_filter_lookup;
> +extern const struct bench bench_bloom_filter_update;
> +extern const struct bench bench_bloom_filter_false_positive;
>   
>   static const struct bench *benchs[] = {
>   	&bench_count_global,
> @@ -344,6 +378,9 @@ static const struct bench *benchs[] = {
>   	&bench_rb_custom,
>   	&bench_pb_libbpf,
>   	&bench_pb_custom,
> +	&bench_bloom_filter_lookup,
> +	&bench_bloom_filter_update,
> +	&bench_bloom_filter_false_positive,
>   };
>   
>   static void setup_benchmark()
> diff --git a/tools/testing/selftests/bpf/bench.h b/tools/testing/selftests/bpf/bench.h
> index c1f48a473b02..624c6b11501f 100644
> --- a/tools/testing/selftests/bpf/bench.h
> +++ b/tools/testing/selftests/bpf/bench.h
> @@ -33,6 +33,7 @@ struct env {
>   struct bench_res {
>   	long hits;
>   	long drops;
> +	long false_hits;
>   };
>   
>   struct bench {
> @@ -56,6 +57,8 @@ extern const struct bench *bench;
>   void setup_libbpf();
>   void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns);
>   void hits_drops_report_final(struct bench_res res[], int res_cnt);
> +void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns);
> +void false_hits_report_final(struct bench_res res[], int res_cnt);
>   
>   static inline __u64 get_time_ns() {
>   	struct timespec t;
> diff --git a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
> new file mode 100644
> index 000000000000..25d6b8179c8e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
> @@ -0,0 +1,411 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +
> +#include <argp.h>
> +#include <linux/log2.h>
> +#include <pthread.h>
> +#include "bench.h"
> +#include "bloom_filter_bench.skel.h"
> +#include "bpf_util.h"
> +
> +static struct ctx {
> +	struct bloom_filter_bench *skel;
> +
> +	int bloom_filter_fd;
> +	int hashmap_fd;
> +	int array_map_fd;
> +
> +	pthread_mutex_t map_done_mtx;
> +	pthread_cond_t map_done;
> +	bool map_prepare_err;
> +
> +	__u32 next_map_idx;
> +
> +} ctx = {
> +	.map_done_mtx = PTHREAD_MUTEX_INITIALIZER,
> +	.map_done = PTHREAD_COND_INITIALIZER,
> +};
> +
> +struct stat {
> +	__u32 stats[3];
> +};
> +
> +static struct {
> +	__u32 nr_entries;
> +	__u8 nr_hash_funcs;
> +	__u32 value_size;
> +} args = {
> +	.nr_entries = 1000,
> +	.nr_hash_funcs = 3,
> +	.value_size = 8,
> +};
> +
> +enum {
> +	ARG_NR_ENTRIES = 3000,
> +	ARG_NR_HASH_FUNCS = 3001,
> +	ARG_VALUE_SIZE = 3002,
> +};
> +
> +static const struct argp_option opts[] = {
> +	{ "nr_entries", ARG_NR_ENTRIES, "NR_ENTRIES", 0,
> +		"Set number of expected unique entries in the bloom filter"},
> +	{ "nr_hash_funcs", ARG_NR_HASH_FUNCS, "NR_HASH_FUNCS", 0,
> +		"Set number of hash functions in the bloom filter"},
> +	{ "value_size", ARG_VALUE_SIZE, "VALUE_SIZE", 0,
> +		"Set value size (in bytes) of bloom filter entries"},
> +	{},
> +};
> +
> +static error_t parse_arg(int key, char *arg, struct argp_state *state)
> +{
> +	switch (key) {
> +	case ARG_NR_ENTRIES:
> +		args.nr_entries = strtol(arg, NULL, 10);
> +		if (args.nr_entries == 0) {
> +			fprintf(stderr, "Invalid nr_entries count.");
> +			argp_usage(state);
> +		}
> +		break;
> +	case ARG_NR_HASH_FUNCS:
> +		args.nr_hash_funcs = strtol(arg, NULL, 10);
> +		if (args.nr_hash_funcs == 0 || args.nr_hash_funcs > 15) {
> +			fprintf(stderr,
> +				"The bloom filter must use 1 to 15 hash functions.");
> +			argp_usage(state);
> +		}
> +		break;
> +	case ARG_VALUE_SIZE:
> +		args.value_size = strtol(arg, NULL, 10);
> +		if (args.value_size < 2 || args.value_size > 256) {
> +			fprintf(stderr,
> +				"Invalid value size. Must be between 2 and 256 bytes");
> +			argp_usage(state);
> +		}
> +		break;
> +	default:
> +		return ARGP_ERR_UNKNOWN;
> +	}
> +
> +	return 0;
> +}
> +
> +/* exported into benchmark runner */
> +const struct argp bench_bloom_filter_map_argp = {
> +	.options = opts,
> +	.parser = parse_arg,
> +};
> +
> +static void validate(void)
> +{
> +	if (env.consumer_cnt != 1) {
> +		fprintf(stderr,
> +			"The bloom filter benchmarks do not support multi-consumer use\n");
> +		exit(1);
> +	}
> +}
> +
> +static inline void trigger_bpf_program(void)
> +{
> +	syscall(__NR_getpgid);
> +}
> +
> +static void *producer(void *input)
> +{
> +	while (true)
> +		trigger_bpf_program();
> +
> +	return NULL;
> +}
> +
> +static void *map_prepare_thread(void *arg)
> +{
> +	__u32 val_size, i;
> +	void *val = NULL;
> +	int err;
> +
> +	val_size = args.value_size;
> +	val = malloc(val_size);
> +	if (!val) {
> +		ctx.map_prepare_err = true;
> +		goto done;
> +	}
> +
> +	while (true) {
> +		i = __atomic_add_fetch(&ctx.next_map_idx, 1, __ATOMIC_RELAXED);
> +		if (i > args.nr_entries)
> +			break;
> +
> +again:
> +		/* Populate hashmap, bloom filter map, and array map with the same
> +		 * random values
> +		 */
> +		err = syscall(__NR_getrandom, val, val_size, 0);
> +		if (err != val_size) {
> +			ctx.map_prepare_err = true;
> +			fprintf(stderr, "failed to get random value: %d\n", -errno);
> +			break;
> +		}
> +
> +		err = bpf_map_update_elem(ctx.hashmap_fd, val, val, BPF_NOEXIST);
> +		if (err) {
> +			if (err != -EEXIST) {
> +				ctx.map_prepare_err = true;
> +				fprintf(stderr, "failed to add elem to hashmap: %d\n",
> +					-errno);
> +				break;
> +			}
> +			goto again;
> +		}
> +
> +		i--;
> +
> +		err = bpf_map_update_elem(ctx.array_map_fd, &i, val, 0);
> +		if (err) {
> +			ctx.map_prepare_err = true;
> +			fprintf(stderr, "failed to add elem to array map: %d\n", -errno);
> +			break;
> +		}
> +
> +		err = bpf_map_update_elem(ctx.bloom_filter_fd, NULL, val, 0);
> +		if (err) {
> +			ctx.map_prepare_err = true;
> +			fprintf(stderr,
> +				"failed to add elem to bloom filter map: %d\n", -errno);
> +			break;
> +		}
> +	}
> +done:
> +	pthread_mutex_lock(&ctx.map_done_mtx);
> +	pthread_cond_signal(&ctx.map_done);
> +	pthread_mutex_unlock(&ctx.map_done_mtx);
> +
> +	if (val)
> +		free(val);
> +
> +	return NULL;
> +}
> +
> +static void populate_maps(void)
> +{
> +	unsigned int nr_cpus = bpf_num_possible_cpus();
> +	pthread_t map_thread;
> +	int i, err;
> +
> +	ctx.bloom_filter_fd = bpf_map__fd(ctx.skel->maps.bloom_filter_map);
> +	ctx.hashmap_fd = bpf_map__fd(ctx.skel->maps.hashmap);
> +	ctx.array_map_fd = bpf_map__fd(ctx.skel->maps.array_map);
> +
> +	for (i = 0; i < nr_cpus; i++) {
> +		err = pthread_create(&map_thread, NULL, map_prepare_thread,
> +				     NULL);
> +		if (err) {
> +			fprintf(stderr, "failed to create pthread: %d\n", -errno);
> +			exit(1);
> +		}
> +	}
> +
> +	pthread_mutex_lock(&ctx.map_done_mtx);
> +	pthread_cond_wait(&ctx.map_done, &ctx.map_done_mtx);
> +	pthread_mutex_unlock(&ctx.map_done_mtx);
> +
> +	if (ctx.map_prepare_err)
> +		exit(1);
> +}
> +
> +static struct bloom_filter_bench *setup_skeleton(bool hashmap_use_bloom_filter)
> +{
> +	struct bloom_filter_bench *skel;
> +	int err;
> +
> +	setup_libbpf();
> +
> +	skel = bloom_filter_bench__open();
> +	if (!skel) {
> +		fprintf(stderr, "failed to open skeleton\n");
> +		exit(1);
> +	}
> +
> +	skel->rodata->hashmap_use_bloom_filter = hashmap_use_bloom_filter;
> +
> +	/* Resize number of entries */
> +	err = bpf_map__resize(skel->maps.hashmap, args.nr_entries);
> +	if (err) {
> +		fprintf(stderr, "failed to resize hashmap\n");
> +		exit(1);
> +	}
> +
> +	err = bpf_map__resize(skel->maps.array_map, args.nr_entries);
> +	if (err) {
> +		fprintf(stderr, "failed to resize array map\n");
> +		exit(1);
> +	}
> +
> +	err = bpf_map__resize(skel->maps.bloom_filter_map,
> +			      BPF_BLOOM_FILTER_BITSET_SZ(args.nr_entries,
> +							 args.nr_hash_funcs));
> +	if (err) {
> +		fprintf(stderr, "failed to resize bloom filter\n");
> +		exit(1);
> +	}
> +
> +	/* Set value size */
> +	err = bpf_map__set_value_size(skel->maps.array_map, args.value_size);
> +	if (err) {
> +		fprintf(stderr, "failed to set array map value size\n");
> +		exit(1);
> +	}
> +
> +	err = bpf_map__set_value_size(skel->maps.bloom_filter_map, args.value_size);
> +	if (err) {
> +		fprintf(stderr, "failed to set bloom filter map value size\n");
> +		exit(1);
> +	}
> +
> +	err = bpf_map__set_value_size(skel->maps.hashmap, args.value_size);
> +	if (err) {
> +		fprintf(stderr, "failed to set hashmap value size\n");
> +		exit(1);
> +	}
> +	/* For the hashmap, we use the value as the key as well */
> +	err = bpf_map__set_key_size(skel->maps.hashmap, args.value_size);
> +	if (err) {
> +		fprintf(stderr, "failed to set hashmap value size\n");
> +		exit(1);
> +	}
> +
> +	skel->bss->value_sz_nr_u32s = (args.value_size + sizeof(__u32) - 1)
> +		/ sizeof(__u32);
> +
> +	/* Set number of hash functions */
> +	err = bpf_map__set_map_extra(skel->maps.bloom_filter_map,
> +				     args.nr_hash_funcs);
> +	if (err) {
> +		fprintf(stderr, "failed to set bloom filter nr_hash_funcs\n");
> +		exit(1);
> +	}
> +
> +	if (bloom_filter_bench__load(skel)) {
> +		fprintf(stderr, "failed to load skeleton\n");
> +		exit(1);
> +	}
> +
> +	return skel;
> +}
> +
> +static void bench_setup_lookup(void)
> +{
> +	struct bpf_link *link;
> +
> +	ctx.skel = setup_skeleton(true);
> +
> +	populate_maps();
> +
> +	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_lookup);
> +	if (!link) {
> +		fprintf(stderr, "failed to attach program!\n");
> +		exit(1);
> +	}
> +}
> +
> +static void bench_setup_update(void)
> +{
> +	struct bpf_link *link;
> +
> +	ctx.skel = setup_skeleton(true);
> +
> +	populate_maps();
> +
> +	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_update);
> +	if (!link) {
> +		fprintf(stderr, "failed to attach program!\n");
> +		exit(1);
> +	}
> +}
> +
> +static void hashmap_lookup_setup(void)
> +{
> +	struct bpf_link *link;
> +
> +	ctx.skel = setup_skeleton(true);
> +
> +	populate_maps();
> +
> +	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_hashmap_lookup);
> +	if (!link) {
> +		fprintf(stderr, "failed to attach program!\n");
> +		exit(1);
> +	}
> +}
> +
> +static void measure(struct bench_res *res)
> +{
> +	long total_hits = 0, total_drops = 0, total_false_hits = 0;
> +	static long last_hits, last_drops, last_false_hits;
> +	unsigned int nr_cpus = bpf_num_possible_cpus();
> +	int hit_key, drop_key, false_hit_key;
> +	int i;
> +
> +	hit_key = ctx.skel->rodata->hit_key;
> +	drop_key = ctx.skel->rodata->drop_key;
> +	false_hit_key = ctx.skel->rodata->false_hit_key;
> +
> +	if (ctx.skel->bss->error != 0) {
> +		fprintf(stderr, "error (%d) when searching the bitset\n",
> +			ctx.skel->bss->error);
> +		exit(1);
> +	}
> +
> +	for (i = 0; i < nr_cpus; i++) {
> +		struct stat *s = (void *)&ctx.skel->bss->percpu_stats[i];
> +
> +		total_hits += s->stats[hit_key];
> +		total_drops += s->stats[drop_key];
> +		total_false_hits += s->stats[false_hit_key];
> +	}
> +
> +	res->hits = total_hits - last_hits;
> +	res->drops = total_drops - last_drops;
> +	res->false_hits = total_false_hits - last_false_hits;
> +
> +	last_hits = total_hits;
> +	last_drops = total_drops;
> +	last_false_hits = total_false_hits;
> +}
> +
> +static void *consumer(void *input)
> +{
> +	return NULL;
> +}
> +
> +const struct bench bench_bloom_filter_lookup = {
> +	.name = "bloom-filter-lookup",
> +	.validate = validate,
> +	.setup = bench_setup_lookup,
> +	.producer_thread = producer,
> +	.consumer_thread = consumer,
> +	.measure = measure,
> +	.report_progress = hits_drops_report_progress,
> +	.report_final = hits_drops_report_final,
> +};
> +
> +const struct bench bench_bloom_filter_update = {
> +	.name = "bloom-filter-update",
> +	.validate = validate,
> +	.setup = bench_setup_update,
> +	.producer_thread = producer,
> +	.consumer_thread = consumer,
> +	.measure = measure,
> +	.report_progress = hits_drops_report_progress,
> +	.report_final = hits_drops_report_final,
> +};
> +
> +const struct bench bench_bloom_filter_false_positive = {
> +	.name = "bloom-filter-false-positive",
> +	.validate = validate,
> +	.setup = hashmap_lookup_setup,
> +	.producer_thread = producer,
> +	.consumer_thread = consumer,
> +	.measure = measure,
> +	.report_progress = false_hits_report_progress,
> +	.report_final = false_hits_report_final,
> +};
> diff --git a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
> new file mode 100755
> index 000000000000..5d4f84ad9973
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
> @@ -0,0 +1,28 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +source ./benchs/run_common.sh
> +
> +set -eufo pipefail
> +
> +header "Bitset map"
> +for v in 2, 4, 8, 16, 40; do
> +for t in 1 4 8 12 16; do
> +for h in {1..10}; do
> +subtitle "value_size: $v bytes, # threads: $t, # hashes: $h"
> +	for e in 10000 50000 75000 100000 250000 500000 750000 1000000 2500000 5000000; do
> +		printf "%'d entries -\n" $e
> +		printf "\t"
> +		summarize "Lookups, total operations: " \
> +			"$($RUN_BENCH -p $t --nr_hash_funcs $h --nr_entries $e --value_size $v bloom-filter-lookup)"
> +		printf "\t"
> +		summarize "Updates, total operations: " \
> +			"$($RUN_BENCH -p $t --nr_hash_funcs $h --nr_entries $e --value_size $v bloom-filter-update)"
> +		printf "\t"
> +		summarize_percentage "False positive rate: " \
> +			"$($RUN_BENCH -p $t --nr_hash_funcs $h --nr_entries $e --value_size $v bloom-filter-false-positive)"
> +	done
> +	printf "\n"
> +done
> +done
> +done
> diff --git a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
> index af4aa04caba6..ada028aa9007 100755
> --- a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
> +++ b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
> @@ -1,34 +1,8 @@
>   #!/bin/bash
>   
> -set -eufo pipefail
> -
> -RUN_BENCH="sudo ./bench -w3 -d10 -a"
> -
> -function hits()
> -{
> -	echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
> -}
> -
> -function drops()
> -{
> -	echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
> -}
> +source ./benchs/run_common.sh
>   
> -function header()
> -{
> -	local len=${#1}
> -
> -	printf "\n%s\n" "$1"
> -	for i in $(seq 1 $len); do printf '='; done
> -	printf '\n'
> -}
> -
> -function summarize()
> -{
> -	bench="$1"
> -	summary=$(echo $2 | tail -n1)
> -	printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
> -}
> +set -eufo pipefail
>   
>   header "Single-producer, parallel producer"
>   for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
> diff --git a/tools/testing/selftests/bpf/benchs/run_common.sh b/tools/testing/selftests/bpf/benchs/run_common.sh
> new file mode 100644
> index 000000000000..670f23b037c4
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/benchs/run_common.sh
> @@ -0,0 +1,48 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +RUN_BENCH="sudo ./bench -w3 -d10 -a"
> +
> +function header()
> +{
> +	local len=${#1}
> +
> +	printf "\n%s\n" "$1"
> +	for i in $(seq 1 $len); do printf '='; done
> +	printf '\n'
> +}
> +
> +function subtitle()
> +{
> +	local len=${#1}
> +	printf "\t%s\n" "$1"
> +}
> +
> +function hits()
> +{
> +	echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
> +}
> +
> +function drops()
> +{
> +	echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
> +}
> +
> +function percentage()
> +{
> +	echo "$*" | sed -E "s/.*Percentage\s=\s+([0-9]+\.[0-9]+).*/\1/"
> +}
> +
> +function summarize()
> +{
> +	bench="$1"
> +	summary=$(echo $2 | tail -n1)
> +	printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
> +}
> +
> +function summarize_percentage()
> +{
> +	bench="$1"
> +	summary=$(echo $2 | tail -n1)
> +	printf "%-20s %s%%\n" "$bench" "$(percentage $summary)"
> +}
> diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h
> index a3352a64c067..a260a963efda 100644
> --- a/tools/testing/selftests/bpf/bpf_util.h
> +++ b/tools/testing/selftests/bpf/bpf_util.h
> @@ -40,4 +40,15 @@ static inline unsigned int bpf_num_possible_cpus(void)
>   	(offsetof(TYPE, MEMBER)	+ sizeof_field(TYPE, MEMBER))
>   #endif
>   
> +/* Helper macro for computing the optimal number of bits for a
> + * bloom filter map.
> + *
> + * Mathematically, the optimal bitset size that minimizes the
> + * false positive probability is n * k / ln(2) where n is the expected
> + * number of unique entries in the bloom filter and k is the number of
> + * hash functions. We use 7 / 5 to approximate 1 / ln(2).
> + */
> +#define BPF_BLOOM_FILTER_BITSET_SZ(nr_uniq_entries, nr_hash_funcs) \
> +	((nr_uniq_entries) * (nr_hash_funcs) / 5 * 7)
> +
>   #endif /* __BPF_UTIL__ */
> diff --git a/tools/testing/selftests/bpf/progs/bloom_filter_bench.c b/tools/testing/selftests/bpf/progs/bloom_filter_bench.c
> new file mode 100644
> index 000000000000..a44a47ddc4d7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/bloom_filter_bench.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +
> +#include <errno.h>
> +#include <linux/bpf.h>
> +#include <stdbool.h>
> +#include <bpf/bpf_helpers.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +struct bpf_map;
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(key_size, sizeof(__u32));
> +	/* max entries and value_size will be set programmatically.
> +	 * They are configurable from the userspace bench program.
> +	 */
> +} array_map SEC(".maps");
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_BITSET);
> +	/* max entries,  value_size, and # of hash functions will be set
> +	 * programmatically. They are configurable from the userspace
> +	 * bench program.
> +	 */
> +} bloom_filter_map SEC(".maps");
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_HASH);
> +	/* max entries, key_size, and value_size, will be set
> +	 * programmatically. They are configurable from the userspace
> +	 * bench program.
> +	 */
> +} hashmap SEC(".maps");
> +
> +struct callback_ctx {
> +	struct bpf_map *map;
> +	bool update;
> +};
> +
> +/* Tracks the number of hits, drops, and false hits */
> +struct {
> +	__u32 stats[3];
> +} __attribute__((__aligned__(256))) percpu_stats[256];
> +
> +__u8 value_sz_nr_u32s;
> +
> +const __u32 hit_key  = 0;
> +const __u32 drop_key  = 1;
> +const __u32 false_hit_key = 2;
> +
> +const volatile bool hashmap_use_bloom_filter = true;
> +
> +int error = 0;
> +
> +static __always_inline void log_result(__u32 key)
> +{
> +	__u32 cpu = bpf_get_smp_processor_id();
> +
> +	percpu_stats[cpu & 255].stats[key]++;
> +}
> +
> +static __u64
> +bloom_filter_callback(struct bpf_map *map, __u32 *key, void *val,
> +		      struct callback_ctx *data)
> +{
> +	int err;
> +
> +	if (data->update)
> +		err = bpf_map_push_elem(data->map, val, 0);
> +	else
> +		err = bpf_map_peek_elem(data->map, val);
> +
> +	if (err) {
> +		error |= 1;
> +		return 1; /* stop the iteration */
> +	}
> +
> +	log_result(hit_key);
> +
> +	return 0;
> +}
> +
> +SEC("fentry/__x64_sys_getpgid")
> +int prog_bloom_filter_lookup(void *ctx)
> +{
> +	struct callback_ctx data;
> +
> +	data.map = (struct bpf_map *)&bloom_filter_map;
> +	data.update = false;
> +
> +	bpf_for_each_map_elem(&array_map, bloom_filter_callback, &data, 0);
> +
> +	return 0;
> +}
> +
> +SEC("fentry/__x64_sys_getpgid")
> +int prog_bloom_filter_update(void *ctx)
> +{
> +	struct callback_ctx data;
> +
> +	data.map = (struct bpf_map *)&bloom_filter_map;
> +	data.update = true;
> +
> +	bpf_for_each_map_elem(&array_map, bloom_filter_callback, &data, 0);
> +
> +	return 0;
> +}
> +
> +SEC("fentry/__x64_sys_getpgid")
> +int prog_bloom_filter_hashmap_lookup(void *ctx)
> +{
> +	__u64 *result;
> +	int i, j, err;
> +
> +	__u32 val[64] = {0};
> +
> +	for (i = 0; i < 1024; i++) {
> +		for (j = 0; j < value_sz_nr_u32s && j < 64; j++)
> +			val[j] = bpf_get_prandom_u32();
> +
I tried out prepopulating these random values from the userspace side
(where we generate a random sequence of 10000 bytes and put that
in a bpf array map, then iterate through the bpf array map in the bpf
program; when I tried implementing it using a global array, I saw
verifier errors when indexing into the array).

However, this runs into issues where we do not know ahead of time
how many random bytes will be needed (eg on a local qemu server, 1 million
may be enough whereas on other servers, 1 million is too little and
results in benchmark stats that show there is no discernible difference
with vs. without the bloom filter).

Additionally, this slows down the bench program as well, since we need
to generate all of these random values in the setup() portion of the 
program.
I'm not convinced that prepopulating the values ahead of time nets us
anything - if the concern is that this slows down the bpf program,
I think this slows down the program in both the hashmap with and without
bloom filter cases; since we are mainly only interested in the delta between
these two scenarios, I don't think this ends up mattering that much.

> +		if (hashmap_use_bloom_filter) {
> +			err = bpf_map_peek_elem(&bloom_filter_map, val);
> +			if (err) {
> +				if (err != -ENOENT) {
> +					error |= 3;
> +					return 0;
> +				}
> +				log_result(hit_key);
> +				continue;
> +			}
> +		}
> +
> +		result = bpf_map_lookup_elem(&hashmap, val);
> +		if (result) {
> +			log_result(hit_key);
> +		} else {
> +			if (hashmap_use_bloom_filter)
> +				log_result(false_hit_key);
> +			log_result(drop_key);
> +		}
> +	}
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-06 22:20 ` [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities Joanne Koong
@ 2021-10-07 14:20   ` Toke Høiland-Jørgensen
  2021-10-07 21:59     ` Joanne Koong
  2021-10-08 23:05   ` Andrii Nakryiko
  1 sibling, 1 reply; 34+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-07 14:20 UTC (permalink / raw)
  To: Joanne Koong, bpf; +Cc: Kernel-team

Joanne Koong <joannekoong@fb.com> writes:

> This patch adds the kernel-side changes for the implementation of
> a bitset map with bloom filter capabilities.
>
> The bitset map does not have keys, only values since it is a
> non-associative data type. When the bitset map is created, it must
> be created with a key_size of 0, and the max_entries value should be the
> desired size of the bitset, in number of bits.
>
> The bitset map supports peek (determining whether a bit is set in the
> map), push (setting a bit in the map), and pop (clearing a bit in the
> map) operations. These operations are exposed to userspace applications
> through the already existing syscalls in the following way:
>
> BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
> BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
> BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
>
> For updates, the user will pass in a NULL key and the index of the
> bit to set in the bitmap as the value. For lookups, the user will pass
> in the index of the bit to check as the value. If the bit is set, 0
> will be returned, else -ENOENT. For clearing the bit, the user will pass
> in the index of the bit to clear as the value.

This is interesting, and I can see other uses of such a data structure.
However, a couple of questions (talking mostly about the 'raw' bitmap
without the bloom filter enabled):

- How are you envisioning synchronisation to work? The code is using the
  atomic set_bit() operation, but there's no test_and_{set,clear}_bit().
  Any thoughts on how users would do this?

- It would be useful to expose the "find first set (ffs)" operation of
  the bitmap as well. This can be added later, but thinking about the
  API from the start would be good to avoid having to add a whole
  separate helper for this. My immediate thought is to reserve peek(-1)
  for this use - WDYT?

- Any thoughts on inlining the lookups? This should at least be feasible
  for the non-bloom-filter type, but I'm not quite sure if the use of
  map_extra allows the verifier to distinguish between the map types
  (I'm a little fuzzy on how the inlining actually works)? And can
  peek()/push()/pop() be inlined at all?

-Toke


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-07 14:20   ` Toke Høiland-Jørgensen
@ 2021-10-07 21:59     ` Joanne Koong
  2021-10-08 21:57       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2021-10-07 21:59 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, bpf; +Cc: Kernel-team

On 10/7/21 7:20 AM, Toke Høiland-Jørgensen wrote:

> Joanne Koong <joannekoong@fb.com> writes:
>
>> This patch adds the kernel-side changes for the implementation of
>> a bitset map with bloom filter capabilities.
>>
>> The bitset map does not have keys, only values since it is a
>> non-associative data type. When the bitset map is created, it must
>> be created with a key_size of 0, and the max_entries value should be the
>> desired size of the bitset, in number of bits.
>>
>> The bitset map supports peek (determining whether a bit is set in the
>> map), push (setting a bit in the map), and pop (clearing a bit in the
>> map) operations. These operations are exposed to userspace applications
>> through the already existing syscalls in the following way:
>>
>> BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
>> BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
>> BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
>>
>> For updates, the user will pass in a NULL key and the index of the
>> bit to set in the bitmap as the value. For lookups, the user will pass
>> in the index of the bit to check as the value. If the bit is set, 0
>> will be returned, else -ENOENT. For clearing the bit, the user will pass
>> in the index of the bit to clear as the value.
> This is interesting, and I can see other uses of such a data structure.
> However, a couple of questions (talking mostly about the 'raw' bitmap
> without the bloom filter enabled):
>
> - How are you envisioning synchronisation to work? The code is using the
>    atomic set_bit() operation, but there's no test_and_{set,clear}_bit().
>    Any thoughts on how users would do this?
I was thinking for users who are doing concurrent lookups + updates,
they are responsible for synchronizing the operations through mutexes.
Do you think this makes sense / is reasonable?
>
> - It would be useful to expose the "find first set (ffs)" operation of
>    the bitmap as well. This can be added later, but thinking about the
>    API from the start would be good to avoid having to add a whole
>    separate helper for this. My immediate thought is to reserve peek(-1)
>    for this use - WDYT?
I think using peek(-1) for "find first set" sounds like a great idea!
> - Any thoughts on inlining the lookups? This should at least be feasible
>    for the non-bloom-filter type, but I'm not quite sure if the use of
>    map_extra allows the verifier to distinguish between the map types
>    (I'm a little fuzzy on how the inlining actually works)? And can
>    peek()/push()/pop() be inlined at all?

I am not too familiar with how bpf instructions and inlining works, but
from a first glance, this looks doable for both the non-bloom filter
and bloom filter cases. From my cursory understanding of how it works,
it seems like we could have something like "bitset_map_gen_lookup" where
we parse the bpf_map->map_extra to see if the bloom filter is enabled;
if it is, we could call the hash function directly to compute which bit 
to look up,
and then use the same insn logic for looking up the bit in both cases
(the bitmap w/ and w/out the bloom filter).

I don't think there is support yet in the verifier for inlining
peek()/push()/pop(), but it seems like this should be doable as well.

I think these changes would maybe warrant a separate patchset
on top of this one. What are your thoughts?

> -Toke
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-07 21:59     ` Joanne Koong
@ 2021-10-08 21:57       ` Toke Høiland-Jørgensen
  2021-10-08 23:11         ` Andrii Nakryiko
  0 siblings, 1 reply; 34+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-08 21:57 UTC (permalink / raw)
  To: Joanne Koong, bpf; +Cc: Kernel-team

Joanne Koong <joannekoong@fb.com> writes:

> On 10/7/21 7:20 AM, Toke Høiland-Jørgensen wrote:
>
>> Joanne Koong <joannekoong@fb.com> writes:
>>
>>> This patch adds the kernel-side changes for the implementation of
>>> a bitset map with bloom filter capabilities.
>>>
>>> The bitset map does not have keys, only values since it is a
>>> non-associative data type. When the bitset map is created, it must
>>> be created with a key_size of 0, and the max_entries value should be the
>>> desired size of the bitset, in number of bits.
>>>
>>> The bitset map supports peek (determining whether a bit is set in the
>>> map), push (setting a bit in the map), and pop (clearing a bit in the
>>> map) operations. These operations are exposed to userspace applications
>>> through the already existing syscalls in the following way:
>>>
>>> BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
>>> BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
>>> BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
>>>
>>> For updates, the user will pass in a NULL key and the index of the
>>> bit to set in the bitmap as the value. For lookups, the user will pass
>>> in the index of the bit to check as the value. If the bit is set, 0
>>> will be returned, else -ENOENT. For clearing the bit, the user will pass
>>> in the index of the bit to clear as the value.
>> This is interesting, and I can see other uses of such a data structure.
>> However, a couple of questions (talking mostly about the 'raw' bitmap
>> without the bloom filter enabled):
>>
>> - How are you envisioning synchronisation to work? The code is using the
>>    atomic set_bit() operation, but there's no test_and_{set,clear}_bit().
>>    Any thoughts on how users would do this?
> I was thinking for users who are doing concurrent lookups + updates,
> they are responsible for synchronizing the operations through mutexes.
> Do you think this makes sense / is reasonable?

Right, that is an option, of course, but it's a bit heavyweight. Atomic
bitops are a nice light-weight synchronisation primitive.

Hmm, looking at your code again, you're already using
test_and_clear_bit() in pop_elem(). So why not just mirror that to
test_and_set_bit() in push_elem(), and change the returns to EEXIST and
ENOENT if trying to set or clear a bit that is already set or cleared
(respectively)?

>> - It would be useful to expose the "find first set (ffs)" operation of
>>    the bitmap as well. This can be added later, but thinking about the
>>    API from the start would be good to avoid having to add a whole
>>    separate helper for this. My immediate thought is to reserve peek(-1)
>>    for this use - WDYT?
> I think using peek(-1) for "find first set" sounds like a great idea!

Awesome!

>> - Any thoughts on inlining the lookups? This should at least be feasible
>>    for the non-bloom-filter type, but I'm not quite sure if the use of
>>    map_extra allows the verifier to distinguish between the map types
>>    (I'm a little fuzzy on how the inlining actually works)? And can
>>    peek()/push()/pop() be inlined at all?
>
> I am not too familiar with how bpf instructions and inlining works, but
> from a first glance, this looks doable for both the non-bloom filter
> and bloom filter cases. From my cursory understanding of how it works,
> it seems like we could have something like "bitset_map_gen_lookup" where
> we parse the bpf_map->map_extra to see if the bloom filter is enabled;
> if it is, we could call the hash function directly to compute which bit 
> to look up,
> and then use the same insn logic for looking up the bit in both cases
> (the bitmap w/ and w/out the bloom filter).
>
> I don't think there is support yet in the verifier for inlining
> peek()/push()/pop(), but it seems like this should be doable as well.
>
> I think these changes would maybe warrant a separate patchset
> on top of this one. What are your thoughts?

Ah yes, I think you're right, this should be possible to add later. I'm
fine with deferring that to a separate series, then :)

-Toke


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-06 22:20 ` [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities Joanne Koong
  2021-10-07 14:20   ` Toke Høiland-Jørgensen
@ 2021-10-08 23:05   ` Andrii Nakryiko
  2021-10-08 23:24     ` Zvi Effron
  1 sibling, 1 reply; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-08 23:05 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel Team

On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch adds the kernel-side changes for the implementation of
> a bitset map with bloom filter capabilities.
>
> The bitset map does not have keys, only values since it is a
> non-associative data type. When the bitset map is created, it must
> be created with a key_size of 0, and the max_entries value should be the
> desired size of the bitset, in number of bits.
>
> The bitset map supports peek (determining whether a bit is set in the
> map), push (setting a bit in the map), and pop (clearing a bit in the
> map) operations. These operations are exposed to userspace applications
> through the already existing syscalls in the following way:
>
> BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
> BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
> BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
>
> For updates, the user will pass in a NULL key and the index of the
> bit to set in the bitmap as the value. For lookups, the user will pass
> in the index of the bit to check as the value. If the bit is set, 0
> will be returned, else -ENOENT. For clearing the bit, the user will pass
> in the index of the bit to clear as the value.
>
> Since we need to pass in the index of the bit to set/clear in the bitmap

While I can buy that Bloom filter doesn't have a key, bitset clearly
has and that key is bit index. Are we all sure that this marriage of
bit set and bloom filter is the best API design?

> whenever we do a lookup/clear, in the verifier layer this requires us to
> modify the argument type of a bitset's BPF_FUNC_map_peek_elem and
> BPF_FUNC_map_pop_elem calls to ARG_PTR_TO_MAP_VALUE; correspondingly, in
> the syscall layer, we need to copy over the user value so that in
> bpf_map_peek_elem and bpf_map_pop_elem, we have the specific
> value to check.
>
> The bitset map may be used as an inner map.
>
> The bitset map may also have additional bloom filter capabilities. The
> lower 4 bits of the map_extra flags for the bitset map denote how many
> hash functions to use. If more than 0 hash functions is specified, the
> bitset map will operate as a bloom filter where a set of bits are
> set/checked where the bits are the results from the bloom filter
> functions. Right now, jhash is function used; in the future, this can be
> expanded to use map_extra to specify which hash function to use.

Please make sure that map_extra is forced to zero for all the existing
maps, for future extensibility.

>
> A few things to additionally please take note of:
>  * If there are any concurrent lookups + updates in the bloom filter, the
> user is responsible for synchronizing this to ensure no false negative
> lookups occur.
>  * Deleting an element in the bloom filter map is not supported.
>  * The benchmarks later in this patchset can help compare the performance
> of using different number of hashes on different entry sizes. In general,
> using more hashes increases the false positive rate of an element being

probably meant "decreases" here?

> detected in the bloom filter but decreases the speed of a lookup.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---
>  include/linux/bpf.h            |   2 +
>  include/linux/bpf_types.h      |   1 +
>  include/uapi/linux/bpf.h       |   9 ++
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/bitset.c            | 256 +++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c           |  25 +++-
>  kernel/bpf/verifier.c          |  10 +-
>  tools/include/uapi/linux/bpf.h |   9 ++
>  8 files changed, 308 insertions(+), 6 deletions(-)
>  create mode 100644 kernel/bpf/bitset.c
>

[...]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-08 21:57       ` Toke Høiland-Jørgensen
@ 2021-10-08 23:11         ` Andrii Nakryiko
  2021-10-09 13:10           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-08 23:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Joanne Koong, bpf, Kernel Team

On Fri, Oct 8, 2021 at 2:58 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Joanne Koong <joannekoong@fb.com> writes:
>
> > On 10/7/21 7:20 AM, Toke Høiland-Jørgensen wrote:
> >
> >> Joanne Koong <joannekoong@fb.com> writes:
> >>
> >>> This patch adds the kernel-side changes for the implementation of
> >>> a bitset map with bloom filter capabilities.
> >>>
> >>> The bitset map does not have keys, only values since it is a
> >>> non-associative data type. When the bitset map is created, it must
> >>> be created with a key_size of 0, and the max_entries value should be the
> >>> desired size of the bitset, in number of bits.
> >>>
> >>> The bitset map supports peek (determining whether a bit is set in the
> >>> map), push (setting a bit in the map), and pop (clearing a bit in the
> >>> map) operations. These operations are exposed to userspace applications
> >>> through the already existing syscalls in the following way:
> >>>
> >>> BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
> >>> BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
> >>> BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
> >>>
> >>> For updates, the user will pass in a NULL key and the index of the
> >>> bit to set in the bitmap as the value. For lookups, the user will pass
> >>> in the index of the bit to check as the value. If the bit is set, 0
> >>> will be returned, else -ENOENT. For clearing the bit, the user will pass
> >>> in the index of the bit to clear as the value.
> >> This is interesting, and I can see other uses of such a data structure.
> >> However, a couple of questions (talking mostly about the 'raw' bitmap
> >> without the bloom filter enabled):
> >>
> >> - How are you envisioning synchronisation to work? The code is using the
> >>    atomic set_bit() operation, but there's no test_and_{set,clear}_bit().
> >>    Any thoughts on how users would do this?
> > I was thinking for users who are doing concurrent lookups + updates,
> > they are responsible for synchronizing the operations through mutexes.
> > Do you think this makes sense / is reasonable?
>
> Right, that is an option, of course, but it's a bit heavyweight. Atomic
> bitops are a nice light-weight synchronisation primitive.
>
> Hmm, looking at your code again, you're already using
> test_and_clear_bit() in pop_elem(). So why not just mirror that to
> test_and_set_bit() in push_elem(), and change the returns to EEXIST and
> ENOENT if trying to set or clear a bit that is already set or cleared
> (respectively)?
>
> >> - It would be useful to expose the "find first set (ffs)" operation of
> >>    the bitmap as well. This can be added later, but thinking about the
> >>    API from the start would be good to avoid having to add a whole
> >>    separate helper for this. My immediate thought is to reserve peek(-1)
> >>    for this use - WDYT?
> > I think using peek(-1) for "find first set" sounds like a great idea!
>
> Awesome!

What's the intended use case for such an operation? But also searching
just a first set bit is non completely generic, I'd imagine that in
practice (at least judging from bit operations done on u64s I've seen
in the wild) you'd most likely need "first set bit after bit N", so
peek(-N)?..

I think it's an overkill to provide something like this, but even if
we do, wouldn't BPF_MAP_GET_NEXT_KEY (except we now say it's a
"value", not a "key", but that's nitpicking) be a sort of natural
extension to provide this? Though it might be only available to
user-space right? Oh, and it won't work in Bloom filter "mode", right?

>
> >> - Any thoughts on inlining the lookups? This should at least be feasible
> >>    for the non-bloom-filter type, but I'm not quite sure if the use of
> >>    map_extra allows the verifier to distinguish between the map types
> >>    (I'm a little fuzzy on how the inlining actually works)? And can
> >>    peek()/push()/pop() be inlined at all?
> >
> > I am not too familiar with how bpf instructions and inlining works, but
> > from a first glance, this looks doable for both the non-bloom filter
> > and bloom filter cases. From my cursory understanding of how it works,
> > it seems like we could have something like "bitset_map_gen_lookup" where
> > we parse the bpf_map->map_extra to see if the bloom filter is enabled;
> > if it is, we could call the hash function directly to compute which bit
> > to look up,
> > and then use the same insn logic for looking up the bit in both cases
> > (the bitmap w/ and w/out the bloom filter).
> >
> > I don't think there is support yet in the verifier for inlining
> > peek()/push()/pop(), but it seems like this should be doable as well.
> >
> > I think these changes would maybe warrant a separate patchset
> > on top of this one. What are your thoughts?
>
> Ah yes, I think you're right, this should be possible to add later. I'm
> fine with deferring that to a separate series, then :)
>
> -Toke
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag
  2021-10-06 22:21 ` [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag Joanne Koong
@ 2021-10-08 23:19   ` Andrii Nakryiko
  2021-10-20 21:08     ` Joanne Koong
  2021-10-09  2:12   ` Andrii Nakryiko
  1 sibling, 1 reply; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-08 23:19 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel Team

On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch adds the libbpf infrastructure for supporting a
> per-map-type "map_extra" field, whose definition will be
> idiosyncratic depending on map type.
>
> For example, for the bitset map, the lower 4 bits of map_extra
> is used to denote the number of hash functions.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---
>  include/uapi/linux/bpf.h        |  1 +
>  tools/include/uapi/linux/bpf.h  |  1 +
>  tools/lib/bpf/bpf.c             |  1 +
>  tools/lib/bpf/bpf.h             |  1 +
>  tools/lib/bpf/bpf_helpers.h     |  1 +
>  tools/lib/bpf/libbpf.c          | 25 ++++++++++++++++++++++++-
>  tools/lib/bpf/libbpf.h          |  4 ++++
>  tools/lib/bpf/libbpf.map        |  2 ++
>  tools/lib/bpf/libbpf_internal.h |  4 +++-
>  9 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b40fa1a72a75..a6f225e9c95a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5639,6 +5639,7 @@ struct bpf_map_info {
>         __u32 btf_id;
>         __u32 btf_key_type_id;
>         __u32 btf_value_type_id;
> +       __u32 map_extra;
>  } __attribute__((aligned(8)));
>
>  struct bpf_btf_info {
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index b40fa1a72a75..a6f225e9c95a 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5639,6 +5639,7 @@ struct bpf_map_info {
>         __u32 btf_id;
>         __u32 btf_key_type_id;
>         __u32 btf_value_type_id;
> +       __u32 map_extra;
>  } __attribute__((aligned(8)));
>
>  struct bpf_btf_info {
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 7d1741ceaa32..41e3e85e7789 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -97,6 +97,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>         attr.btf_key_type_id = create_attr->btf_key_type_id;
>         attr.btf_value_type_id = create_attr->btf_value_type_id;
>         attr.map_ifindex = create_attr->map_ifindex;
> +       attr.map_extra = create_attr->map_extra;
>         if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
>                 attr.btf_vmlinux_value_type_id =
>                         create_attr->btf_vmlinux_value_type_id;
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 6fffb3cdf39b..c4049f2d63cc 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -50,6 +50,7 @@ struct bpf_create_map_attr {
>                 __u32 inner_map_fd;
>                 __u32 btf_vmlinux_value_type_id;
>         };
> +       __u32 map_extra;

this struct is frozen, we can't change it. It's fine to not allow
passing map_extra in libbpf APIs. We have libbpf 1.0 task to revamp
low-level APIs like map creation in a way that will allow good
extensibility. You don't have to worry about that in this patch set.

>  };
>
>  LIBBPF_API int
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 963b1060d944..bce5a0090f3f 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -133,6 +133,7 @@ struct bpf_map_def {
>         unsigned int value_size;
>         unsigned int max_entries;
>         unsigned int map_flags;
> +       unsigned int map_extra;
>  };

This one is also frozen, please don't change it.

>
>  enum libbpf_pin_type {
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index ed313fd491bd..12a9ecd45a78 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -2274,6 +2274,10 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
>                         }
>                         map_def->pinning = val;
>                         map_def->parts |= MAP_DEF_PINNING;
> +               } else if (strcmp(name, "map_extra") == 0) {
> +                       if (!get_map_field_int(map_name, btf, m, &map_def->map_extra))
> +                               return -EINVAL;
> +                       map_def->parts |= MAP_DEF_MAP_EXTRA;
>                 } else {
>                         if (strict) {
>                                 pr_warn("map '%s': unknown field '%s'.\n", map_name, name);
> @@ -2298,6 +2302,7 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
>         map->def.value_size = def->value_size;
>         map->def.max_entries = def->max_entries;
>         map->def.map_flags = def->map_flags;
> +       map->def.map_extra = def->map_extra;
>
>         map->numa_node = def->numa_node;
>         map->btf_key_type_id = def->key_type_id;
> @@ -2322,6 +2327,8 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
>                 pr_debug("map '%s': found max_entries = %u.\n", map->name, def->max_entries);
>         if (def->parts & MAP_DEF_MAP_FLAGS)
>                 pr_debug("map '%s': found map_flags = %u.\n", map->name, def->map_flags);
> +       if (def->parts & MAP_DEF_MAP_EXTRA)
> +               pr_debug("map '%s': found map_extra = %u.\n", map->name, def->map_extra);

reading this now, I think map_flags should be emitted as %x, can you
please update map_flags format specified and use %x for map_extra as
well?

>         if (def->parts & MAP_DEF_PINNING)
>                 pr_debug("map '%s': found pinning = %u.\n", map->name, def->pinning);
>         if (def->parts & MAP_DEF_NUMA_NODE)
> @@ -4017,6 +4024,7 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
>         map->def.value_size = info.value_size;
>         map->def.max_entries = info.max_entries;
>         map->def.map_flags = info.map_flags;
> +       map->def.map_extra = info.map_extra;
>         map->btf_key_type_id = info.btf_key_type_id;
>         map->btf_value_type_id = info.btf_value_type_id;
>         map->reused = true;
> @@ -4534,7 +4542,8 @@ static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
>                 map_info.key_size == map->def.key_size &&
>                 map_info.value_size == map->def.value_size &&
>                 map_info.max_entries == map->def.max_entries &&
> -               map_info.map_flags == map->def.map_flags);
> +               map_info.map_flags == map->def.map_flags &&
> +               map_info.map_extra == map->def.map_extra);
>  }
>
>  static int
> @@ -4631,6 +4640,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
>         create_attr.key_size = def->key_size;
>         create_attr.value_size = def->value_size;
>         create_attr.numa_node = map->numa_node;
> +       create_attr.map_extra = def->map_extra;
>
>         if (def->type == BPF_MAP_TYPE_PERF_EVENT_ARRAY && !def->max_entries) {
>                 int nr_cpus;
> @@ -8637,6 +8647,19 @@ int bpf_map__set_map_flags(struct bpf_map *map, __u32 flags)
>         return 0;
>  }
>
> +__u32 bpf_map__map_extra(const struct bpf_map *map)
> +{
> +       return map->def.map_extra;
> +}
> +
> +int bpf_map__set_map_extra(struct bpf_map *map, __u32 map_extra)
> +{
> +       if (map->fd >= 0)
> +               return libbpf_err(-EBUSY);
> +       map->def.map_extra = map_extra;
> +       return 0;
> +}
> +
>  __u32 bpf_map__numa_node(const struct bpf_map *map)
>  {
>         return map->numa_node;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 89ca9c83ed4e..55e8dfe6f3e1 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -486,6 +486,7 @@ struct bpf_map_def {
>         unsigned int value_size;
>         unsigned int max_entries;
>         unsigned int map_flags;
> +       unsigned int map_extra;
>  };

this struct is also frozen, please keep it as is

>
>  /**
> @@ -562,6 +563,9 @@ LIBBPF_API __u32 bpf_map__btf_value_type_id(const struct bpf_map *map);
>  /* get/set map if_index */
>  LIBBPF_API __u32 bpf_map__ifindex(const struct bpf_map *map);
>  LIBBPF_API int bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
> +/* get/set map map_extra flags */
> +LIBBPF_API __u32 bpf_map__map_extra(const struct bpf_map *map);
> +LIBBPF_API int bpf_map__set_map_extra(struct bpf_map *map, __u32 map_extra);
>
>  typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void *);
>  LIBBPF_API int bpf_map__set_priv(struct bpf_map *map, void *priv,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index f270d25e4af3..308378b3f20b 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -395,4 +395,6 @@ LIBBPF_0.6.0 {
>                 bpf_object__prev_program;
>                 btf__add_btf;
>                 btf__add_tag;
> +               bpf_map__map_extra;
> +               bpf_map__set_map_extra;

this list is alphabetically sorted, please keep it so

>  } LIBBPF_0.5.0;
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> index f7fd3944d46d..188db854d9c2 100644
> --- a/tools/lib/bpf/libbpf_internal.h
> +++ b/tools/lib/bpf/libbpf_internal.h
> @@ -193,8 +193,9 @@ enum map_def_parts {
>         MAP_DEF_NUMA_NODE       = 0x080,
>         MAP_DEF_PINNING         = 0x100,
>         MAP_DEF_INNER_MAP       = 0x200,
> +       MAP_DEF_MAP_EXTRA       = 0x400,
>
> -       MAP_DEF_ALL             = 0x3ff, /* combination of all above */
> +       MAP_DEF_ALL             = 0x7ff, /* combination of all above */
>  };
>
>  struct btf_map_def {
> @@ -208,6 +209,7 @@ struct btf_map_def {
>         __u32 map_flags;
>         __u32 numa_node;
>         __u32 pinning;
> +       __u32 map_extra;
>  };

this is currently the only (because internal) struct that can get
map_extra added :)

>
>  int parse_btf_map_def(const char *map_name, struct btf *btf,
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-08 23:05   ` Andrii Nakryiko
@ 2021-10-08 23:24     ` Zvi Effron
  2021-10-09  0:16       ` Martin KaFai Lau
  0 siblings, 1 reply; 34+ messages in thread
From: Zvi Effron @ 2021-10-08 23:24 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Joanne Koong, bpf, Kernel Team

On Fri, Oct 8, 2021 at 4:05 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
> >
> > This patch adds the kernel-side changes for the implementation of
> > a bitset map with bloom filter capabilities.
> >
> > The bitset map does not have keys, only values since it is a
> > non-associative data type. When the bitset map is created, it must
> > be created with a key_size of 0, and the max_entries value should be the
> > desired size of the bitset, in number of bits.
> >
> > The bitset map supports peek (determining whether a bit is set in the
> > map), push (setting a bit in the map), and pop (clearing a bit in the
> > map) operations. These operations are exposed to userspace applications
> > through the already existing syscalls in the following way:
> >
> > BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
> > BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
> > BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
> >
> > For updates, the user will pass in a NULL key and the index of the
> > bit to set in the bitmap as the value. For lookups, the user will pass
> > in the index of the bit to check as the value. If the bit is set, 0
> > will be returned, else -ENOENT. For clearing the bit, the user will pass
> > in the index of the bit to clear as the value.
> >
> > Since we need to pass in the index of the bit to set/clear in the bitmap
>
> While I can buy that Bloom filter doesn't have a key, bitset clearly
> has and that key is bit index. Are we all sure that this marriage of
> bit set and bloom filter is the best API design?
>

Adding on to this, as a user of the bpf helpers, peek, push, and pop have
meaning to me as operating on data structures for which those are well defined,
such as stacks and queues. For bitsets, "push"ing a bit doesn't make sense to
me as a concept, so needing to use bpf_map_push_elem to set a bit is not
intuitive to me. bpf_map_lookup_elem, bpf_map_update_elem, and
bpf_map_delete_elem make more sense to me (though if we have update, delete is
redundant).

> > whenever we do a lookup/clear, in the verifier layer this requires us to
> > modify the argument type of a bitset's BPF_FUNC_map_peek_elem and
> > BPF_FUNC_map_pop_elem calls to ARG_PTR_TO_MAP_VALUE; correspondingly, in
> > the syscall layer, we need to copy over the user value so that in
> > bpf_map_peek_elem and bpf_map_pop_elem, we have the specific
> > value to check.
> >
> > The bitset map may be used as an inner map.
> >
> > The bitset map may also have additional bloom filter capabilities. The
> > lower 4 bits of the map_extra flags for the bitset map denote how many
> > hash functions to use. If more than 0 hash functions is specified, the
> > bitset map will operate as a bloom filter where a set of bits are
> > set/checked where the bits are the results from the bloom filter
> > functions. Right now, jhash is function used; in the future, this can be
> > expanded to use map_extra to specify which hash function to use.
>
> Please make sure that map_extra is forced to zero for all the existing
> maps, for future extensibility.
>
> >
> > A few things to additionally please take note of:
> >  * If there are any concurrent lookups + updates in the bloom filter, the
> > user is responsible for synchronizing this to ensure no false negative
> > lookups occur.
> >  * Deleting an element in the bloom filter map is not supported.
> >  * The benchmarks later in this patchset can help compare the performance
> > of using different number of hashes on different entry sizes. In general,
> > using more hashes increases the false positive rate of an element being
>
> probably meant "decreases" here?
>
> > detected in the bloom filter but decreases the speed of a lookup.
> >
> > Signed-off-by: Joanne Koong <joannekoong@fb.com>
> > ---
> >  include/linux/bpf.h            |   2 +
> >  include/linux/bpf_types.h      |   1 +
> >  include/uapi/linux/bpf.h       |   9 ++
> >  kernel/bpf/Makefile            |   2 +-
> >  kernel/bpf/bitset.c            | 256 +++++++++++++++++++++++++++++++++
> >  kernel/bpf/syscall.c           |  25 +++-
> >  kernel/bpf/verifier.c          |  10 +-
> >  tools/include/uapi/linux/bpf.h |   9 ++
> >  8 files changed, 308 insertions(+), 6 deletions(-)
> >  create mode 100644 kernel/bpf/bitset.c
> >
>
> [...]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-08 23:24     ` Zvi Effron
@ 2021-10-09  0:16       ` Martin KaFai Lau
  0 siblings, 0 replies; 34+ messages in thread
From: Martin KaFai Lau @ 2021-10-09  0:16 UTC (permalink / raw)
  To: Zvi Effron; +Cc: Andrii Nakryiko, Joanne Koong, bpf, Kernel Team

On Fri, Oct 08, 2021 at 04:24:58PM -0700, Zvi Effron wrote:
> On Fri, Oct 8, 2021 at 4:05 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
> > >
> > > This patch adds the kernel-side changes for the implementation of
> > > a bitset map with bloom filter capabilities.
> > >
> > > The bitset map does not have keys, only values since it is a
> > > non-associative data type. When the bitset map is created, it must
> > > be created with a key_size of 0, and the max_entries value should be the
> > > desired size of the bitset, in number of bits.
> > >
> > > The bitset map supports peek (determining whether a bit is set in the
> > > map), push (setting a bit in the map), and pop (clearing a bit in the
> > > map) operations. These operations are exposed to userspace applications
> > > through the already existing syscalls in the following way:
> > >
> > > BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
> > > BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
> > > BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
> > >
> > > For updates, the user will pass in a NULL key and the index of the
> > > bit to set in the bitmap as the value. For lookups, the user will pass
> > > in the index of the bit to check as the value. If the bit is set, 0
> > > will be returned, else -ENOENT. For clearing the bit, the user will pass
> > > in the index of the bit to clear as the value.
> > >
> > > Since we need to pass in the index of the bit to set/clear in the bitmap
> >
> > While I can buy that Bloom filter doesn't have a key, bitset clearly
> > has and that key is bit index. Are we all sure that this marriage of
> > bit set and bloom filter is the best API design?
> >
> 
> Adding on to this, as a user of the bpf helpers, peek, push, and pop have
> meaning to me as operating on data structures for which those are well defined,
> such as stacks and queues. For bitsets, "push"ing a bit doesn't make sense to
> me as a concept, so needing to use bpf_map_push_elem to set a bit is not
> intuitive to me. bpf_map_lookup_elem, bpf_map_update_elem, and
> bpf_map_delete_elem make more sense to me (though if we have update, delete is
> redundant).
The peek, push, and pop helpers could be given a different name
like test_bit, set_bit, and clear_bit in bpf_helper_defs.h.  I think this
was also mentioned in the earlier discussion.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag
  2021-10-06 22:21 ` [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag Joanne Koong
  2021-10-08 23:19   ` Andrii Nakryiko
@ 2021-10-09  2:12   ` Andrii Nakryiko
  1 sibling, 0 replies; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-09  2:12 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel Team

On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch adds the libbpf infrastructure for supporting a
> per-map-type "map_extra" field, whose definition will be
> idiosyncratic depending on map type.
>
> For example, for the bitset map, the lower 4 bits of map_extra
> is used to denote the number of hash functions.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---
>  include/uapi/linux/bpf.h        |  1 +
>  tools/include/uapi/linux/bpf.h  |  1 +
>  tools/lib/bpf/bpf.c             |  1 +
>  tools/lib/bpf/bpf.h             |  1 +
>  tools/lib/bpf/bpf_helpers.h     |  1 +
>  tools/lib/bpf/libbpf.c          | 25 ++++++++++++++++++++++++-
>  tools/lib/bpf/libbpf.h          |  4 ++++
>  tools/lib/bpf/libbpf.map        |  2 ++
>  tools/lib/bpf/libbpf_internal.h |  4 +++-
>  9 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b40fa1a72a75..a6f225e9c95a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5639,6 +5639,7 @@ struct bpf_map_info {
>         __u32 btf_id;
>         __u32 btf_key_type_id;
>         __u32 btf_value_type_id;
> +       __u32 map_extra;
>  } __attribute__((aligned(8)));
>
>  struct bpf_btf_info {
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index b40fa1a72a75..a6f225e9c95a 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5639,6 +5639,7 @@ struct bpf_map_info {
>         __u32 btf_id;
>         __u32 btf_key_type_id;
>         __u32 btf_value_type_id;
> +       __u32 map_extra;
>  } __attribute__((aligned(8)));
>
>  struct bpf_btf_info {
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 7d1741ceaa32..41e3e85e7789 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -97,6 +97,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>         attr.btf_key_type_id = create_attr->btf_key_type_id;
>         attr.btf_value_type_id = create_attr->btf_value_type_id;
>         attr.map_ifindex = create_attr->map_ifindex;
> +       attr.map_extra = create_attr->map_extra;
>         if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
>                 attr.btf_vmlinux_value_type_id =
>                         create_attr->btf_vmlinux_value_type_id;
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 6fffb3cdf39b..c4049f2d63cc 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -50,6 +50,7 @@ struct bpf_create_map_attr {
>                 __u32 inner_map_fd;
>                 __u32 btf_vmlinux_value_type_id;
>         };
> +       __u32 map_extra;

btw, I think it might be better to use __u64 for map_extra in kernel
UAPI for extensibility (e.g., some maps might specify that this is a
pointer to some extra data structure, just like we do for some types
of bpf_iter program types to specify extra parameters). In libbpf you
can't express entire 64  bits with __uint() macro, but eventually that
limitation might be raised separately.

>  };
>
>  LIBBPF_API int
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 963b1060d944..bce5a0090f3f 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -133,6 +133,7 @@ struct bpf_map_def {
>         unsigned int value_size;
>         unsigned int max_entries;
>         unsigned int map_flags;
> +       unsigned int map_extra;
>  };
>
>  enum libbpf_pin_type {
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index ed313fd491bd..12a9ecd45a78 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -2274,6 +2274,10 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
>                         }
>                         map_def->pinning = val;
>                         map_def->parts |= MAP_DEF_PINNING;
> +               } else if (strcmp(name, "map_extra") == 0) {
> +                       if (!get_map_field_int(map_name, btf, m, &map_def->map_extra))
> +                               return -EINVAL;
> +                       map_def->parts |= MAP_DEF_MAP_EXTRA;
>                 } else {
>                         if (strict) {
>                                 pr_warn("map '%s': unknown field '%s'.\n", map_name, name);
> @@ -2298,6 +2302,7 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
>         map->def.value_size = def->value_size;
>         map->def.max_entries = def->max_entries;
>         map->def.map_flags = def->map_flags;
> +       map->def.map_extra = def->map_extra;
>
>         map->numa_node = def->numa_node;
>         map->btf_key_type_id = def->key_type_id;
> @@ -2322,6 +2327,8 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
>                 pr_debug("map '%s': found max_entries = %u.\n", map->name, def->max_entries);
>         if (def->parts & MAP_DEF_MAP_FLAGS)
>                 pr_debug("map '%s': found map_flags = %u.\n", map->name, def->map_flags);
> +       if (def->parts & MAP_DEF_MAP_EXTRA)
> +               pr_debug("map '%s': found map_extra = %u.\n", map->name, def->map_extra);
>         if (def->parts & MAP_DEF_PINNING)
>                 pr_debug("map '%s': found pinning = %u.\n", map->name, def->pinning);
>         if (def->parts & MAP_DEF_NUMA_NODE)
> @@ -4017,6 +4024,7 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
>         map->def.value_size = info.value_size;
>         map->def.max_entries = info.max_entries;
>         map->def.map_flags = info.map_flags;
> +       map->def.map_extra = info.map_extra;
>         map->btf_key_type_id = info.btf_key_type_id;
>         map->btf_value_type_id = info.btf_value_type_id;
>         map->reused = true;
> @@ -4534,7 +4542,8 @@ static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
>                 map_info.key_size == map->def.key_size &&
>                 map_info.value_size == map->def.value_size &&
>                 map_info.max_entries == map->def.max_entries &&
> -               map_info.map_flags == map->def.map_flags);
> +               map_info.map_flags == map->def.map_flags &&
> +               map_info.map_extra == map->def.map_extra);
>  }
>
>  static int
> @@ -4631,6 +4640,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
>         create_attr.key_size = def->key_size;
>         create_attr.value_size = def->value_size;
>         create_attr.numa_node = map->numa_node;
> +       create_attr.map_extra = def->map_extra;
>
>         if (def->type == BPF_MAP_TYPE_PERF_EVENT_ARRAY && !def->max_entries) {
>                 int nr_cpus;
> @@ -8637,6 +8647,19 @@ int bpf_map__set_map_flags(struct bpf_map *map, __u32 flags)
>         return 0;
>  }
>
> +__u32 bpf_map__map_extra(const struct bpf_map *map)
> +{
> +       return map->def.map_extra;
> +}
> +
> +int bpf_map__set_map_extra(struct bpf_map *map, __u32 map_extra)
> +{
> +       if (map->fd >= 0)
> +               return libbpf_err(-EBUSY);
> +       map->def.map_extra = map_extra;
> +       return 0;
> +}
> +
>  __u32 bpf_map__numa_node(const struct bpf_map *map)
>  {
>         return map->numa_node;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 89ca9c83ed4e..55e8dfe6f3e1 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -486,6 +486,7 @@ struct bpf_map_def {
>         unsigned int value_size;
>         unsigned int max_entries;
>         unsigned int map_flags;
> +       unsigned int map_extra;
>  };
>
>  /**
> @@ -562,6 +563,9 @@ LIBBPF_API __u32 bpf_map__btf_value_type_id(const struct bpf_map *map);
>  /* get/set map if_index */
>  LIBBPF_API __u32 bpf_map__ifindex(const struct bpf_map *map);
>  LIBBPF_API int bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
> +/* get/set map map_extra flags */
> +LIBBPF_API __u32 bpf_map__map_extra(const struct bpf_map *map);
> +LIBBPF_API int bpf_map__set_map_extra(struct bpf_map *map, __u32 map_extra);
>
>  typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void *);
>  LIBBPF_API int bpf_map__set_priv(struct bpf_map *map, void *priv,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index f270d25e4af3..308378b3f20b 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -395,4 +395,6 @@ LIBBPF_0.6.0 {
>                 bpf_object__prev_program;
>                 btf__add_btf;
>                 btf__add_tag;
> +               bpf_map__map_extra;
> +               bpf_map__set_map_extra;
>  } LIBBPF_0.5.0;
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> index f7fd3944d46d..188db854d9c2 100644
> --- a/tools/lib/bpf/libbpf_internal.h
> +++ b/tools/lib/bpf/libbpf_internal.h
> @@ -193,8 +193,9 @@ enum map_def_parts {
>         MAP_DEF_NUMA_NODE       = 0x080,
>         MAP_DEF_PINNING         = 0x100,
>         MAP_DEF_INNER_MAP       = 0x200,
> +       MAP_DEF_MAP_EXTRA       = 0x400,
>
> -       MAP_DEF_ALL             = 0x3ff, /* combination of all above */
> +       MAP_DEF_ALL             = 0x7ff, /* combination of all above */
>  };
>
>  struct btf_map_def {
> @@ -208,6 +209,7 @@ struct btf_map_def {
>         __u32 map_flags;
>         __u32 numa_node;
>         __u32 pinning;
> +       __u32 map_extra;
>  };
>
>  int parse_btf_map_def(const char *map_name, struct btf *btf,
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
  2021-10-06 22:21 ` [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive Joanne Koong
  2021-10-06 22:35   ` Joanne Koong
@ 2021-10-09  2:39   ` Andrii Nakryiko
  1 sibling, 0 replies; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-09  2:39 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel Team

On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch adds benchmark tests for the throughput (for lookups + updates)
> and the false positive rate of bloom filter lookups, as well as some
> minor refactoring of the bash script for running the benchmarks.
>
> These benchmarks show that as the number of hash functions increases,
> the throughput and the false positive rate of the bloom filter decreases.
> From the benchmark data, the approximate average false-positive rates for
> 8-byte values are roughly as follows:
>
> 1 hash function = ~30%
> 2 hash functions = ~15%
> 3 hash functions = ~5%
> 4 hash functions = ~2.5%
> 5 hash functions = ~1%
> 6 hash functions = ~0.5%
> 7 hash functions  = ~0.35%
> 8 hash functions = ~0.15%
> 9 hash functions = ~0.1%
> 10 hash functions = ~0%
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---
>  tools/testing/selftests/bpf/Makefile          |   6 +-
>  tools/testing/selftests/bpf/bench.c           |  37 ++
>  tools/testing/selftests/bpf/bench.h           |   3 +
>  .../bpf/benchs/bench_bloom_filter_map.c       | 411 ++++++++++++++++++
>  .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
>  .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
>  .../selftests/bpf/benchs/run_common.sh        |  48 ++
>  tools/testing/selftests/bpf/bpf_util.h        |  11 +
>  .../selftests/bpf/progs/bloom_filter_bench.c  | 146 +++++++
>  9 files changed, 690 insertions(+), 30 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
>  create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
>  create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
>  create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_bench.c
>

[...]

> +static struct ctx {
> +       struct bloom_filter_bench *skel;
> +
> +       int bloom_filter_fd;
> +       int hashmap_fd;
> +       int array_map_fd;
> +
> +       pthread_mutex_t map_done_mtx;
> +       pthread_cond_t map_done;
> +       bool map_prepare_err;
> +
> +       __u32 next_map_idx;
> +

nit: unnecessary empty line

> +} ctx = {
> +       .map_done_mtx = PTHREAD_MUTEX_INITIALIZER,
> +       .map_done = PTHREAD_COND_INITIALIZER,
> +};
> +

[...]

> +
> +static void populate_maps(void)
> +{
> +       unsigned int nr_cpus = bpf_num_possible_cpus();
> +       pthread_t map_thread;
> +       int i, err;
> +
> +       ctx.bloom_filter_fd = bpf_map__fd(ctx.skel->maps.bloom_filter_map);
> +       ctx.hashmap_fd = bpf_map__fd(ctx.skel->maps.hashmap);
> +       ctx.array_map_fd = bpf_map__fd(ctx.skel->maps.array_map);
> +
> +       for (i = 0; i < nr_cpus; i++) {
> +               err = pthread_create(&map_thread, NULL, map_prepare_thread,
> +                                    NULL);
> +               if (err) {
> +                       fprintf(stderr, "failed to create pthread: %d\n", -errno);
> +                       exit(1);
> +               }
> +       }
> +
> +       pthread_mutex_lock(&ctx.map_done_mtx);
> +       pthread_cond_wait(&ctx.map_done, &ctx.map_done_mtx);

This is a fragile way to use cond_wait. If prepare finishes faster
than you get to this cond_wait, you'll be stuck forevere. Also
cond_var can spuriously wake up, if I remember correctly. So the
pattern is usually to do
checking of some condition in a loop  (inside the locked region) and
if the condition doesn't hold, cond_wait on it (I renamed ctx.map_done
into ctx.map_done_cv):

pthread_mutex_lock(&ctx.map_done_mtx);
while (!ctx.map_done /* this is bool now */)
    pthread_cond_wait(&ctx.map_done_cv, &ctx.map_done_mtx);
pthread_mutex_unlock(&ctx.map_done_mtx);


> +       pthread_mutex_unlock(&ctx.map_done_mtx);
> +
> +       if (ctx.map_prepare_err)
> +               exit(1);
> +}
> +
> +static struct bloom_filter_bench *setup_skeleton(bool hashmap_use_bloom_filter)
> +{
> +       struct bloom_filter_bench *skel;
> +       int err;
> +
> +       setup_libbpf();
> +
> +       skel = bloom_filter_bench__open();
> +       if (!skel) {
> +               fprintf(stderr, "failed to open skeleton\n");
> +               exit(1);
> +       }
> +
> +       skel->rodata->hashmap_use_bloom_filter = hashmap_use_bloom_filter;
> +
> +       /* Resize number of entries */
> +       err = bpf_map__resize(skel->maps.hashmap, args.nr_entries);
> +       if (err) {

These errors can't happen unless args.nr_entries is zero, so I'd just
drop them. But please use bpf_map__set_max_entries() instead,
bpf_map__resize() is going to be deprecated.

> +               fprintf(stderr, "failed to resize hashmap\n");
> +               exit(1);
> +       }
> +
> +       err = bpf_map__resize(skel->maps.array_map, args.nr_entries);
> +       if (err) {
> +               fprintf(stderr, "failed to resize array map\n");
> +               exit(1);
> +       }
> +
> +       err = bpf_map__resize(skel->maps.bloom_filter_map,
> +                             BPF_BLOOM_FILTER_BITSET_SZ(args.nr_entries,
> +                                                        args.nr_hash_funcs));
> +       if (err) {
> +               fprintf(stderr, "failed to resize bloom filter\n");
> +               exit(1);
> +       }
> +
> +       /* Set value size */
> +       err = bpf_map__set_value_size(skel->maps.array_map, args.value_size);
> +       if (err) {

same here, error can only happen if the map is already created in the
kernel, so be pragmatic and skip that (especially in benchmarks)

> +               fprintf(stderr, "failed to set array map value size\n");
> +               exit(1);
> +       }
> +
> +       err = bpf_map__set_value_size(skel->maps.bloom_filter_map, args.value_size);
> +       if (err) {
> +               fprintf(stderr, "failed to set bloom filter map value size\n");
> +               exit(1);
> +       }
> +

[...]

> diff --git a/tools/testing/selftests/bpf/bpf_util.h b/tools/testing/selftests/bpf/bpf_util.h
> index a3352a64c067..a260a963efda 100644
> --- a/tools/testing/selftests/bpf/bpf_util.h
> +++ b/tools/testing/selftests/bpf/bpf_util.h
> @@ -40,4 +40,15 @@ static inline unsigned int bpf_num_possible_cpus(void)
>         (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER))
>  #endif
>
> +/* Helper macro for computing the optimal number of bits for a
> + * bloom filter map.
> + *
> + * Mathematically, the optimal bitset size that minimizes the
> + * false positive probability is n * k / ln(2) where n is the expected
> + * number of unique entries in the bloom filter and k is the number of
> + * hash functions. We use 7 / 5 to approximate 1 / ln(2).
> + */
> +#define BPF_BLOOM_FILTER_BITSET_SZ(nr_uniq_entries, nr_hash_funcs) \
> +       ((nr_uniq_entries) * (nr_hash_funcs) / 5 * 7)

hm.. I thought you were going to add into include/linux/uapi/bpf.h,
why did you change your mind?

> +
>  #endif /* __BPF_UTIL__ */
> diff --git a/tools/testing/selftests/bpf/progs/bloom_filter_bench.c b/tools/testing/selftests/bpf/progs/bloom_filter_bench.c
> new file mode 100644
> index 000000000000..a44a47ddc4d7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/bloom_filter_bench.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +
> +#include <errno.h>
> +#include <linux/bpf.h>
> +#include <stdbool.h>
> +#include <bpf/bpf_helpers.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +struct bpf_map;
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_ARRAY);
> +       __uint(key_size, sizeof(__u32));
> +       /* max entries and value_size will be set programmatically.
> +        * They are configurable from the userspace bench program.
> +        */
> +} array_map SEC(".maps");
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_BITSET);
> +       /* max entries,  value_size, and # of hash functions will be set
> +        * programmatically. They are configurable from the userspace
> +        * bench program.
> +        */
> +} bloom_filter_map SEC(".maps");
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_HASH);
> +       /* max entries, key_size, and value_size, will be set
> +        * programmatically. They are configurable from the userspace
> +        * bench program.
> +        */
> +} hashmap SEC(".maps");
> +
> +struct callback_ctx {
> +       struct bpf_map *map;
> +       bool update;
> +};
> +
> +/* Tracks the number of hits, drops, and false hits */
> +struct {
> +       __u32 stats[3];
> +} __attribute__((__aligned__(256))) percpu_stats[256];
> +
> +__u8 value_sz_nr_u32s;
> +
> +const __u32 hit_key  = 0;
> +const __u32 drop_key  = 1;
> +const __u32 false_hit_key = 2;
> +
> +const volatile bool hashmap_use_bloom_filter = true;
> +
> +int error = 0;
> +
> +static __always_inline void log_result(__u32 key)
> +{
> +       __u32 cpu = bpf_get_smp_processor_id();
> +
> +       percpu_stats[cpu & 255].stats[key]++;
> +}
> +
> +static __u64
> +bloom_filter_callback(struct bpf_map *map, __u32 *key, void *val,
> +                     struct callback_ctx *data)
> +{
> +       int err;
> +
> +       if (data->update)
> +               err = bpf_map_push_elem(data->map, val, 0);
> +       else
> +               err = bpf_map_peek_elem(data->map, val);
> +
> +       if (err) {
> +               error |= 1;
> +               return 1; /* stop the iteration */
> +       }
> +
> +       log_result(hit_key);
> +
> +       return 0;
> +}
> +
> +SEC("fentry/__x64_sys_getpgid")
> +int prog_bloom_filter_lookup(void *ctx)
> +{
> +       struct callback_ctx data;
> +
> +       data.map = (struct bpf_map *)&bloom_filter_map;
> +       data.update = false;
> +
> +       bpf_for_each_map_elem(&array_map, bloom_filter_callback, &data, 0);
> +
> +       return 0;
> +}
> +
> +SEC("fentry/__x64_sys_getpgid")
> +int prog_bloom_filter_update(void *ctx)
> +{
> +       struct callback_ctx data;
> +
> +       data.map = (struct bpf_map *)&bloom_filter_map;
> +       data.update = true;
> +
> +       bpf_for_each_map_elem(&array_map, bloom_filter_callback, &data, 0);
> +
> +       return 0;
> +}
> +
> +SEC("fentry/__x64_sys_getpgid")
> +int prog_bloom_filter_hashmap_lookup(void *ctx)
> +{
> +       __u64 *result;
> +       int i, j, err;
> +
> +       __u32 val[64] = {0};
> +
> +       for (i = 0; i < 1024; i++) {
> +               for (j = 0; j < value_sz_nr_u32s && j < 64; j++)
> +                       val[j] = bpf_get_prandom_u32();
> +
> +               if (hashmap_use_bloom_filter) {

this is purely subjective, so take it for what it is worth. Using full
"bloom_filter" everywhere is a bit mouthful and causes unnecessarily
long identifiers. I think "bloom" itself is very recognizable and
doesn't detract from readability (I'd claim it actually improves it).
When using a bench tool manually, having to type "bloom-filter-update"
if the equivalent "bloom-update" is just as good, would get old pretty
fast for me.

Similarly program names above, why "prog_" prefix? What does it
contribute except causes longer identifiers in skeleton?

> +                       err = bpf_map_peek_elem(&bloom_filter_map, val);
> +                       if (err) {
> +                               if (err != -ENOENT) {
> +                                       error |= 3;
> +                                       return 0;
> +                               }
> +                               log_result(hit_key);
> +                               continue;
> +                       }
> +               }
> +
> +               result = bpf_map_lookup_elem(&hashmap, val);
> +               if (result) {
> +                       log_result(hit_key);
> +               } else {
> +                       if (hashmap_use_bloom_filter)
> +                               log_result(false_hit_key);
> +                       log_result(drop_key);
> +               }
> +       }
> +
> +       return 0;
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
  2021-10-06 22:35   ` Joanne Koong
@ 2021-10-09  2:54     ` Andrii Nakryiko
  2021-10-15 23:35       ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-09  2:54 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel Team

On Wed, Oct 6, 2021 at 3:37 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> On 10/6/21 3:21 PM, Joanne Koong wrote:
>
> > This patch adds benchmark tests for the throughput (for lookups + updates)
> > and the false positive rate of bloom filter lookups, as well as some
> > minor refactoring of the bash script for running the benchmarks.
> >
> > These benchmarks show that as the number of hash functions increases,
> > the throughput and the false positive rate of the bloom filter decreases.
> >  From the benchmark data, the approximate average false-positive rates for
> > 8-byte values are roughly as follows:
> >
> > 1 hash function = ~30%
> > 2 hash functions = ~15%
> > 3 hash functions = ~5%
> > 4 hash functions = ~2.5%
> > 5 hash functions = ~1%
> > 6 hash functions = ~0.5%
> > 7 hash functions  = ~0.35%
> > 8 hash functions = ~0.15%
> > 9 hash functions = ~0.1%
> > 10 hash functions = ~0%
> >
> > Signed-off-by: Joanne Koong <joannekoong@fb.com>
> > ---
> >   tools/testing/selftests/bpf/Makefile          |   6 +-
> >   tools/testing/selftests/bpf/bench.c           |  37 ++
> >   tools/testing/selftests/bpf/bench.h           |   3 +
> >   .../bpf/benchs/bench_bloom_filter_map.c       | 411 ++++++++++++++++++
> >   .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
> >   .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
> >   .../selftests/bpf/benchs/run_common.sh        |  48 ++
> >   tools/testing/selftests/bpf/bpf_util.h        |  11 +
> >   .../selftests/bpf/progs/bloom_filter_bench.c  | 146 +++++++
> >   9 files changed, 690 insertions(+), 30 deletions(-)
> >   create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
> >   create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
> >   create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
> >   create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_bench.c
> >

[...] (it's a good idea to trim irrelevant parts of email to make it
easier to see relevant parts)

> > +
> > +SEC("fentry/__x64_sys_getpgid")
> > +int prog_bloom_filter_hashmap_lookup(void *ctx)
> > +{
> > +     __u64 *result;
> > +     int i, j, err;
> > +
> > +     __u32 val[64] = {0};
> > +
> > +     for (i = 0; i < 1024; i++) {
> > +             for (j = 0; j < value_sz_nr_u32s && j < 64; j++)
> > +                     val[j] = bpf_get_prandom_u32();
> > +
> I tried out prepopulating these random values from the userspace side
> (where we generate a random sequence of 10000 bytes and put that
> in a bpf array map, then iterate through the bpf array map in the bpf
> program; when I tried implementing it using a global array, I saw
> verifier errors when indexing into the array).
>
> Additionally, this slows down the bench program as well, since we need
> to generate all of these random values in the setup() portion of the
> program.
> I'm not convinced that prepopulating the values ahead of time nets us
> anything - if the concern is that this slows down the bpf program,
> I think this slows down the program in both the hashmap with and without
> bloom filter cases; since we are mainly only interested in the delta between
> these two scenarios, I don't think this ends up mattering that much.

So imagine that a hashmap benchmark takes 10ms per iteration, and
bloom filter + hashmap takes 5ms. That's a 2x difference, right? Now
imagine that random values generation takes another 5ms, so actually
you measure 15ms vs 10ms run time. Now, suddenly, you have measured
just a 1.5x difference.

But it's ok, feel free to just keep the benchmark as is.

>
> > +             if (hashmap_use_bloom_filter) {
> > +                     err = bpf_map_peek_elem(&bloom_filter_map, val);
> > +                     if (err) {
> > +                             if (err != -ENOENT) {
> > +                                     error |= 3;
> > +                                     return 0;
> > +                             }
> > +                             log_result(hit_key);
> > +                             continue;
> > +                     }
> > +             }
> > +
> > +             result = bpf_map_lookup_elem(&hashmap, val);
> > +             if (result) {
> > +                     log_result(hit_key);
> > +             } else {
> > +                     if (hashmap_use_bloom_filter)
> > +                             log_result(false_hit_key);
> > +                     log_result(drop_key);
> > +             }
> > +     }
> > +
> > +     return 0;
> > +}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-08 23:11         ` Andrii Nakryiko
@ 2021-10-09 13:10           ` Toke Høiland-Jørgensen
  2021-10-12  3:17             ` Andrii Nakryiko
  0 siblings, 1 reply; 34+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-09 13:10 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Joanne Koong, bpf, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Oct 8, 2021 at 2:58 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Joanne Koong <joannekoong@fb.com> writes:
>>
>> > On 10/7/21 7:20 AM, Toke Høiland-Jørgensen wrote:
>> >
>> >> Joanne Koong <joannekoong@fb.com> writes:
>> >>
>> >>> This patch adds the kernel-side changes for the implementation of
>> >>> a bitset map with bloom filter capabilities.
>> >>>
>> >>> The bitset map does not have keys, only values since it is a
>> >>> non-associative data type. When the bitset map is created, it must
>> >>> be created with a key_size of 0, and the max_entries value should be the
>> >>> desired size of the bitset, in number of bits.
>> >>>
>> >>> The bitset map supports peek (determining whether a bit is set in the
>> >>> map), push (setting a bit in the map), and pop (clearing a bit in the
>> >>> map) operations. These operations are exposed to userspace applications
>> >>> through the already existing syscalls in the following way:
>> >>>
>> >>> BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
>> >>> BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
>> >>> BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
>> >>>
>> >>> For updates, the user will pass in a NULL key and the index of the
>> >>> bit to set in the bitmap as the value. For lookups, the user will pass
>> >>> in the index of the bit to check as the value. If the bit is set, 0
>> >>> will be returned, else -ENOENT. For clearing the bit, the user will pass
>> >>> in the index of the bit to clear as the value.
>> >> This is interesting, and I can see other uses of such a data structure.
>> >> However, a couple of questions (talking mostly about the 'raw' bitmap
>> >> without the bloom filter enabled):
>> >>
>> >> - How are you envisioning synchronisation to work? The code is using the
>> >>    atomic set_bit() operation, but there's no test_and_{set,clear}_bit().
>> >>    Any thoughts on how users would do this?
>> > I was thinking for users who are doing concurrent lookups + updates,
>> > they are responsible for synchronizing the operations through mutexes.
>> > Do you think this makes sense / is reasonable?
>>
>> Right, that is an option, of course, but it's a bit heavyweight. Atomic
>> bitops are a nice light-weight synchronisation primitive.
>>
>> Hmm, looking at your code again, you're already using
>> test_and_clear_bit() in pop_elem(). So why not just mirror that to
>> test_and_set_bit() in push_elem(), and change the returns to EEXIST and
>> ENOENT if trying to set or clear a bit that is already set or cleared
>> (respectively)?
>>
>> >> - It would be useful to expose the "find first set (ffs)" operation of
>> >>    the bitmap as well. This can be added later, but thinking about the
>> >>    API from the start would be good to avoid having to add a whole
>> >>    separate helper for this. My immediate thought is to reserve peek(-1)
>> >>    for this use - WDYT?
>> > I think using peek(-1) for "find first set" sounds like a great idea!
>>
>> Awesome!
>
> What's the intended use case for such an operation?

The 'find first set' operation is a single instruction on common
architectures, so it's an efficient way of finding the first non-empty
bucket if you index them in a bitmap; sch_qfq uses this, for instance.

> But also searching just a first set bit is non completely generic, I'd
> imagine that in practice (at least judging from bit operations done on
> u64s I've seen in the wild) you'd most likely need "first set bit
> after bit N", so peek(-N)?..

You're right as far as generality goes, but for my use case "after bit
N" is not so important (you enqueue into different buckets and always
need to find the lowest one. But there could be other use cases for
"find first set after N", of course. If using -N the parameter would
have to change to explicitly signed, of course, but that's fine by me :)

> I think it's an overkill to provide something like this, but even if
> we do, wouldn't BPF_MAP_GET_NEXT_KEY (except we now say it's a
> "value", not a "key", but that's nitpicking) be a sort of natural
> extension to provide this? Though it might be only available to
> user-space right?

Yeah, thought about get_next_key() but as you say that doesn't work from
BPF. It would make sense to *also* expose this to userspace through
get_next_key(), though.

> Oh, and it won't work in Bloom filter "mode", right?

I expect not? But we could just return -EINVAL in that case?

-Toke


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-09 13:10           ` Toke Høiland-Jørgensen
@ 2021-10-12  3:17             ` Andrii Nakryiko
  2021-10-12 12:48               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-12  3:17 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Joanne Koong, bpf, Kernel Team

On Sat, Oct 9, 2021 at 3:10 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Fri, Oct 8, 2021 at 2:58 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Joanne Koong <joannekoong@fb.com> writes:
> >>
> >> > On 10/7/21 7:20 AM, Toke Høiland-Jørgensen wrote:
> >> >
> >> >> Joanne Koong <joannekoong@fb.com> writes:
> >> >>
> >> >>> This patch adds the kernel-side changes for the implementation of
> >> >>> a bitset map with bloom filter capabilities.
> >> >>>
> >> >>> The bitset map does not have keys, only values since it is a
> >> >>> non-associative data type. When the bitset map is created, it must
> >> >>> be created with a key_size of 0, and the max_entries value should be the
> >> >>> desired size of the bitset, in number of bits.
> >> >>>
> >> >>> The bitset map supports peek (determining whether a bit is set in the
> >> >>> map), push (setting a bit in the map), and pop (clearing a bit in the
> >> >>> map) operations. These operations are exposed to userspace applications
> >> >>> through the already existing syscalls in the following way:
> >> >>>
> >> >>> BPF_MAP_UPDATE_ELEM -> bpf_map_push_elem
> >> >>> BPF_MAP_LOOKUP_ELEM -> bpf_map_peek_elem
> >> >>> BPF_MAP_LOOKUP_AND_DELETE_ELEM -> bpf_map_pop_elem
> >> >>>
> >> >>> For updates, the user will pass in a NULL key and the index of the
> >> >>> bit to set in the bitmap as the value. For lookups, the user will pass
> >> >>> in the index of the bit to check as the value. If the bit is set, 0
> >> >>> will be returned, else -ENOENT. For clearing the bit, the user will pass
> >> >>> in the index of the bit to clear as the value.
> >> >> This is interesting, and I can see other uses of such a data structure.
> >> >> However, a couple of questions (talking mostly about the 'raw' bitmap
> >> >> without the bloom filter enabled):
> >> >>
> >> >> - How are you envisioning synchronisation to work? The code is using the
> >> >>    atomic set_bit() operation, but there's no test_and_{set,clear}_bit().
> >> >>    Any thoughts on how users would do this?
> >> > I was thinking for users who are doing concurrent lookups + updates,
> >> > they are responsible for synchronizing the operations through mutexes.
> >> > Do you think this makes sense / is reasonable?
> >>
> >> Right, that is an option, of course, but it's a bit heavyweight. Atomic
> >> bitops are a nice light-weight synchronisation primitive.
> >>
> >> Hmm, looking at your code again, you're already using
> >> test_and_clear_bit() in pop_elem(). So why not just mirror that to
> >> test_and_set_bit() in push_elem(), and change the returns to EEXIST and
> >> ENOENT if trying to set or clear a bit that is already set or cleared
> >> (respectively)?
> >>
> >> >> - It would be useful to expose the "find first set (ffs)" operation of
> >> >>    the bitmap as well. This can be added later, but thinking about the
> >> >>    API from the start would be good to avoid having to add a whole
> >> >>    separate helper for this. My immediate thought is to reserve peek(-1)
> >> >>    for this use - WDYT?
> >> > I think using peek(-1) for "find first set" sounds like a great idea!
> >>
> >> Awesome!
> >
> > What's the intended use case for such an operation?
>
> The 'find first set' operation is a single instruction on common
> architectures, so it's an efficient way of finding the first non-empty
> bucket if you index them in a bitmap; sch_qfq uses this, for instance.

There is also extremely useful popcnt() instruction, would be great to
have that as well. There is also fls() (find largest set bit), it is
used extensively throughout the kernel. If we'd like to take this ad
absurdum, there are a lot of useful operations defined in
include/linux/bitops.h and include/linux/bitmap.h, I'm pretty sure one
can come up with a use case for every one of those.

The question is whether we should bloat the kernel with such helpers/operations?

I still think that we don't need a dedicated BITSET map. I'm not
talking about Bloom filters here, mind you. With the Bloom filter I
can't (in principle) beat the performance aspect, so I stopped trying
to argue against that. But for BITSET map, there is next to zero
reason (in my view) to have it as a dedicated map vs just implementing
it as an ARRAY map. Or even, for cases where it's feasible, as a
global variable array (avoiding BPF helper calls completely). Any
bitset operation is literally one bpf_map_lookup_elem() helper call
and one logical and/or/xor operation. And you can choose whether it
has to be atomic or not. And you don't even have to do power-of-2
sizing, if you don't want to (but your life will be a bit harder to
prove stuff to the BPF verifier).

Further, if one implements BITSET as just an ARRAY of u64s (for
example), you get all the power of bpf_for_each_map_elem() and you can
implement finding first unset bit, last unset bit, and anything in
between. All using already existing primitives. Yes, there is a
callback overhead, but with your custom ARRAY, you can optimize
further by using something bigger than u64 as a "building block" of
your ARRAY. E.g., you can have u64[8], and reduce the number of
necessary callbacks by 8. But we are getting way too far ahead. My
claim is that ARRAY is better than BITSET map and doesn't lose in
performance (still one BPF helper call for basic operations).

I'd love to hear specific arguments in favor of dedicated BITSET, though.

>
> > But also searching just a first set bit is non completely generic, I'd
> > imagine that in practice (at least judging from bit operations done on
> > u64s I've seen in the wild) you'd most likely need "first set bit
> > after bit N", so peek(-N)?..
>
> You're right as far as generality goes, but for my use case "after bit
> N" is not so important (you enqueue into different buckets and always
> need to find the lowest one. But there could be other use cases for
> "find first set after N", of course. If using -N the parameter would
> have to change to explicitly signed, of course, but that's fine by me :)
>
> > I think it's an overkill to provide something like this, but even if
> > we do, wouldn't BPF_MAP_GET_NEXT_KEY (except we now say it's a
> > "value", not a "key", but that's nitpicking) be a sort of natural
> > extension to provide this? Though it might be only available to
> > user-space right?
>
> Yeah, thought about get_next_key() but as you say that doesn't work from
> BPF. It would make sense to *also* expose this to userspace through
> get_next_key(), though.
>
> > Oh, and it won't work in Bloom filter "mode", right?
>
> I expect not? But we could just return -EINVAL in that case?
>
> -Toke
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-12  3:17             ` Andrii Nakryiko
@ 2021-10-12 12:48               ` Toke Høiland-Jørgensen
  2021-10-12 22:46                 ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-12 12:48 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Joanne Koong, bpf, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

>>
>> The 'find first set' operation is a single instruction on common
>> architectures, so it's an efficient way of finding the first non-empty
>> bucket if you index them in a bitmap; sch_qfq uses this, for instance.
>
> There is also extremely useful popcnt() instruction, would be great to
> have that as well. There is also fls() (find largest set bit), it is
> used extensively throughout the kernel. If we'd like to take this ad
> absurdum, there are a lot of useful operations defined in
> include/linux/bitops.h and include/linux/bitmap.h, I'm pretty sure one
> can come up with a use case for every one of those.
>
> The question is whether we should bloat the kernel with such
> helpers/operations?

I agree, all of those are interesting bitwise operations that would be
useful to expose to BPF. But if we're not going to "bloat the kernel"
with them, what should we do? Introduce new BPF instructions?

> I'd love to hear specific arguments in favor of dedicated BITSET,
> though.

Mainly the above; given the right instructions, I totally buy your
assertion that one can build a bitmap using regular BPF arrays...

-Toke


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-12 12:48               ` Toke Høiland-Jørgensen
@ 2021-10-12 22:46                 ` Joanne Koong
  2021-10-12 23:25                   ` Zvi Effron
  2021-10-13  0:11                   ` Martin KaFai Lau
  0 siblings, 2 replies; 34+ messages in thread
From: Joanne Koong @ 2021-10-12 22:46 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko; +Cc: bpf, Kernel Team

On 10/12/21 5:48 AM, Toke Høiland-Jørgensen wrote:

> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
>>> The 'find first set' operation is a single instruction on common
>>> architectures, so it's an efficient way of finding the first non-empty
>>> bucket if you index them in a bitmap; sch_qfq uses this, for instance.
>> There is also extremely useful popcnt() instruction, would be great to
>> have that as well. There is also fls() (find largest set bit), it is
>> used extensively throughout the kernel. If we'd like to take this ad
>> absurdum, there are a lot of useful operations defined in
>> include/linux/bitops.h and include/linux/bitmap.h, I'm pretty sure one
>> can come up with a use case for every one of those.
>>
>> The question is whether we should bloat the kernel with such
>> helpers/operations?
> I agree, all of those are interesting bitwise operations that would be
> useful to expose to BPF. But if we're not going to "bloat the kernel"
> with them, what should we do? Introduce new BPF instructions?
>
>> I'd love to hear specific arguments in favor of dedicated BITSET,
>> though.
> Mainly the above; given the right instructions, I totally buy your
> assertion that one can build a bitmap using regular BPF arrays...
>
> -Toke
I have the same opinion as Toke here - the most compelling reason I
see for the bitset map to be supported by the kernel is so we can
support a wider set of bit operations that wouldn't be available
strictly through bpf.

I'm also open to adding the bloom filter map and then in the
future, if/when there is a need for the bitset map, adding that as a
separate map. In that case, we could have the bitset map take in
both key and value where key = the bitset index and value = 0 or 1.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-12 22:46                 ` Joanne Koong
@ 2021-10-12 23:25                   ` Zvi Effron
  2021-10-13  1:17                     ` Joanne Koong
  2021-10-13  0:11                   ` Martin KaFai Lau
  1 sibling, 1 reply; 34+ messages in thread
From: Zvi Effron @ 2021-10-12 23:25 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, bpf, Kernel Team

On Tue, Oct 12, 2021 at 3:47 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> On 10/12/21 5:48 AM, Toke Høiland-Jørgensen wrote:
>
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >
> >>> The 'find first set' operation is a single instruction on common
> >>> architectures, so it's an efficient way of finding the first non-empty
> >>> bucket if you index them in a bitmap; sch_qfq uses this, for instance.
> >> There is also extremely useful popcnt() instruction, would be great to
> >> have that as well. There is also fls() (find largest set bit), it is
> >> used extensively throughout the kernel. If we'd like to take this ad
> >> absurdum, there are a lot of useful operations defined in
> >> include/linux/bitops.h and include/linux/bitmap.h, I'm pretty sure one
> >> can come up with a use case for every one of those.
> >>
> >> The question is whether we should bloat the kernel with such
> >> helpers/operations?
> > I agree, all of those are interesting bitwise operations that would be
> > useful to expose to BPF. But if we're not going to "bloat the kernel"
> > with them, what should we do? Introduce new BPF instructions?
> >
> >> I'd love to hear specific arguments in favor of dedicated BITSET,
> >> though.
> > Mainly the above; given the right instructions, I totally buy your
> > assertion that one can build a bitmap using regular BPF arrays...
> >
> > -Toke
> I have the same opinion as Toke here - the most compelling reason I
> see for the bitset map to be supported by the kernel is so we can
> support a wider set of bit operations that wouldn't be available
> strictly through bpf.
>

Do we need a new map type to support those extra bit operations?
If we're not implementing them as helpers (or instructions), then I don't see
how the new map type helps bring those operations to eBPF.

If we are implementing them as helpers, do we need a new map type to do that?
Can't we make helpers that operate on data instead of a map?

A map feels like a pretty heavy-weight way to expose these operations to me. It
requires user-space to create the map just so eBPF programs can use the
operations. This feels, to me, like it mixes the "persistent storage"
capability of maps with the bit operations goal being discussed. Making helpers
that operate on data would allow persistent storage in a map if that's where
the data lives, but also using the operations on non-persistent data if
desired.

--Zvi

> I'm also open to adding the bloom filter map and then in the
> future, if/when there is a need for the bitset map, adding that as a
> separate map. In that case, we could have the bitset map take in
> both key and value where key = the bitset index and value = 0 or 1.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-12 22:46                 ` Joanne Koong
  2021-10-12 23:25                   ` Zvi Effron
@ 2021-10-13  0:11                   ` Martin KaFai Lau
  2021-10-13  4:41                     ` Alexei Starovoitov
  1 sibling, 1 reply; 34+ messages in thread
From: Martin KaFai Lau @ 2021-10-13  0:11 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, bpf, Kernel Team

On Tue, Oct 12, 2021 at 03:46:47PM -0700, Joanne Koong wrote:
> I'm also open to adding the bloom filter map and then in the
> future, if/when there is a need for the bitset map, adding that as a
> separate map. In that case, we could have the bitset map take in
> both key and value where key = the bitset index and value = 0 or 1.
v4 uses nr_hash_funcs is 0 (i.e. map_extra is 0) to mean bitset
and non-zero to mean bloom filter.

Since the existing no-key API can be reused and work fine with bloom
filter, my current thinking is it can start with bloom filter first
and limit the nr_hash_funcs to be non-zero.

If in the future a bitset map is needed and the bloom filter API can
be reused, the non-zero check can be relaxed.  If not, a separate
API/map needs to be created for bitset anyway.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-12 23:25                   ` Zvi Effron
@ 2021-10-13  1:17                     ` Joanne Koong
  2021-10-13  4:48                       ` Alexei Starovoitov
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2021-10-13  1:17 UTC (permalink / raw)
  To: Zvi Effron
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, bpf, Kernel Team

On 10/12/21 4:25 PM, Zvi Effron wrote:

> On Tue, Oct 12, 2021 at 3:47 PM Joanne Koong <joannekoong@fb.com> wrote:
>> On 10/12/21 5:48 AM, Toke Høiland-Jørgensen wrote:
>>
>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>
>>>>> The 'find first set' operation is a single instruction on common
>>>>> architectures, so it's an efficient way of finding the first non-empty
>>>>> bucket if you index them in a bitmap; sch_qfq uses this, for instance.
>>>> There is also extremely useful popcnt() instruction, would be great to
>>>> have that as well. There is also fls() (find largest set bit), it is
>>>> used extensively throughout the kernel. If we'd like to take this ad
>>>> absurdum, there are a lot of useful operations defined in
>>>> include/linux/bitops.h and include/linux/bitmap.h, I'm pretty sure one
>>>> can come up with a use case for every one of those.
>>>>
>>>> The question is whether we should bloat the kernel with such
>>>> helpers/operations?
>>> I agree, all of those are interesting bitwise operations that would be
>>> useful to expose to BPF. But if we're not going to "bloat the kernel"
>>> with them, what should we do? Introduce new BPF instructions?
>>>
>>>> I'd love to hear specific arguments in favor of dedicated BITSET,
>>>> though.
>>> Mainly the above; given the right instructions, I totally buy your
>>> assertion that one can build a bitmap using regular BPF arrays...
>>>
>>> -Toke
>> I have the same opinion as Toke here - the most compelling reason I
>> see for the bitset map to be supported by the kernel is so we can
>> support a wider set of bit operations that wouldn't be available
>> strictly through bpf.
>>
> Do we need a new map type to support those extra bit operations?
> If we're not implementing them as helpers (or instructions), then I don't see
> how the new map type helps bring those operations to eBPF.
>
> If we are implementing them as helpers, do we need a new map type to do that?
> Can't we make helpers that operate on data instead of a map?
I'm presuming the bitset data would reside in the ARRAY map (to cover
map-in-map use cases, and to bypass verifier out-of-bounds issues
that would (or might, not 100% sure) arise from indexing into a global 
array).
I think the cleanest way then to support a large amount of special case
bit operations would be to have one bpf helper function which takes in a
map and a "flags" where "flags" indicates which type of special-case bit
operation to do. We could, if we wanted to, use the ARRAY map for this,
but to me it seems cleaner and safer to have the map be a separate BITSET
map where we can make guarantees about the map (eg bitset size can be
enforced to reject out of bounds indices)

If the bitset data could reside in a global array in the bpf program, then I
agree - it seems like we could just make a helper function that takes in
an ARG_PTR_TO_MEM where we pass the data in as a ptr, instead of needing
a map.
> A map feels like a pretty heavy-weight way to expose these operations to me. It
> requires user-space to create the map just so eBPF programs can use the
> operations. This feels, to me, like it mixes the "persistent storage"
> capability of maps with the bit operations goal being discussed. Making helpers
> that operate on data would allow persistent storage in a map if that's where
> the data lives, but also using the operations on non-persistent data if
> desired.
>
> --Zvi
>
>> I'm also open to adding the bloom filter map and then in the
>> future, if/when there is a need for the bitset map, adding that as a
>> separate map. In that case, we could have the bitset map take in
>> both key and value where key = the bitset index and value = 0 or 1.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-13  0:11                   ` Martin KaFai Lau
@ 2021-10-13  4:41                     ` Alexei Starovoitov
  2021-10-19 23:53                       ` Andrii Nakryiko
  0 siblings, 1 reply; 34+ messages in thread
From: Alexei Starovoitov @ 2021-10-13  4:41 UTC (permalink / raw)
  To: Martin KaFai Lau, Joanne Koong
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, bpf, Kernel Team

On 10/12/21 5:11 PM, Martin KaFai Lau wrote:
> On Tue, Oct 12, 2021 at 03:46:47PM -0700, Joanne Koong wrote:
>> I'm also open to adding the bloom filter map and then in the
>> future, if/when there is a need for the bitset map, adding that as a
>> separate map. In that case, we could have the bitset map take in
>> both key and value where key = the bitset index and value = 0 or 1.
> v4 uses nr_hash_funcs is 0 (i.e. map_extra is 0) to mean bitset
> and non-zero to mean bloom filter.
> 
> Since the existing no-key API can be reused and work fine with bloom
> filter, my current thinking is it can start with bloom filter first
> and limit the nr_hash_funcs to be non-zero.
> 
> If in the future a bitset map is needed and the bloom filter API can
> be reused, the non-zero check can be relaxed.  If not, a separate
> API/map needs to be created for bitset anyway.
> 

sounds good to me.
let's drop bitset for now since there doesn't seem to be a consensus.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-13  1:17                     ` Joanne Koong
@ 2021-10-13  4:48                       ` Alexei Starovoitov
  0 siblings, 0 replies; 34+ messages in thread
From: Alexei Starovoitov @ 2021-10-13  4:48 UTC (permalink / raw)
  To: Joanne Koong, Zvi Effron
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, bpf, Kernel Team

On 10/12/21 6:17 PM, Joanne Koong wrote:
> If the bitset data could reside in a global array in the bpf program, 
> then I
> agree - it seems like we could just make a helper function that takes in
> an ARG_PTR_TO_MEM where we pass the data in as a ptr, instead of needing
> a map.

The main advantage of helpers for bit ops is ease of implementation.
We can add set/test/clear_bit along with ffs fls and others
very quickly.
But programs that need to do atomic bit ops probably care
about performance, so non-inlined helper might be too slow.
Doing the same via new instructions will take a ton more time,
but the long term benefits of insns are worth considering.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
  2021-10-09  2:54     ` Andrii Nakryiko
@ 2021-10-15 23:35       ` Joanne Koong
  2021-10-20  0:46         ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2021-10-15 23:35 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Kernel Team

On 10/8/21 7:54 PM, Andrii Nakryiko wrote:

> On Wed, Oct 6, 2021 at 3:37 PM Joanne Koong <joannekoong@fb.com> wrote:
>> On 10/6/21 3:21 PM, Joanne Koong wrote:
>>
>>> This patch adds benchmark tests for the throughput (for lookups + updates)
>>> and the false positive rate of bloom filter lookups, as well as some
>>> minor refactoring of the bash script for running the benchmarks.
>>>
>>> These benchmarks show that as the number of hash functions increases,
>>> the throughput and the false positive rate of the bloom filter decreases.
>>>   From the benchmark data, the approximate average false-positive rates for
>>> 8-byte values are roughly as follows:
>>>
>>> 1 hash function = ~30%
>>> 2 hash functions = ~15%
>>> 3 hash functions = ~5%
>>> 4 hash functions = ~2.5%
>>> 5 hash functions = ~1%
>>> 6 hash functions = ~0.5%
>>> 7 hash functions  = ~0.35%
>>> 8 hash functions = ~0.15%
>>> 9 hash functions = ~0.1%
>>> 10 hash functions = ~0%
>>>
>>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
>>> ---
>>>    tools/testing/selftests/bpf/Makefile          |   6 +-
>>>    tools/testing/selftests/bpf/bench.c           |  37 ++
>>>    tools/testing/selftests/bpf/bench.h           |   3 +
>>>    .../bpf/benchs/bench_bloom_filter_map.c       | 411 ++++++++++++++++++
>>>    .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
>>>    .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
>>>    .../selftests/bpf/benchs/run_common.sh        |  48 ++
>>>    tools/testing/selftests/bpf/bpf_util.h        |  11 +
>>>    .../selftests/bpf/progs/bloom_filter_bench.c  | 146 +++++++
>>>    9 files changed, 690 insertions(+), 30 deletions(-)
>>>    create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
>>>    create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
>>>    create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
>>>    create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_bench.c
>>>
[...]

>>> +
>>> +SEC("fentry/__x64_sys_getpgid")
>>> +int prog_bloom_filter_hashmap_lookup(void *ctx)
>>> +{
>>> +     __u64 *result;
>>> +     int i, j, err;
>>> +
>>> +     __u32 val[64] = {0};
>>> +
>>> +     for (i = 0; i < 1024; i++) {
>>> +             for (j = 0; j < value_sz_nr_u32s && j < 64; j++)
>>> +                     val[j] = bpf_get_prandom_u32();
>>> +
>> I tried out prepopulating these random values from the userspace side
>> (where we generate a random sequence of 10000 bytes and put that
>> in a bpf array map, then iterate through the bpf array map in the bpf
>> program; when I tried implementing it using a global array, I saw
>> verifier errors when indexing into the array).
>>
>> Additionally, this slows down the bench program as well, since we need
>> to generate all of these random values in the setup() portion of the
>> program.
>> I'm not convinced that prepopulating the values ahead of time nets us
>> anything - if the concern is that this slows down the bpf program,
>> I think this slows down the program in both the hashmap with and without
>> bloom filter cases; since we are mainly only interested in the delta between
>> these two scenarios, I don't think this ends up mattering that much.
> So imagine that a hashmap benchmark takes 10ms per iteration, and
> bloom filter + hashmap takes 5ms. That's a 2x difference, right? Now
> imagine that random values generation takes another 5ms, so actually
> you measure 15ms vs 10ms run time. Now, suddenly, you have measured
> just a 1.5x difference.
Yeah, you're right - in this case, the calls to bpf_get_prandom_u32()
are time-intensive enough to have a measurable impact on the difference. I
guess we could say that the delta is a conservative lower bound estimate -
that this shows the bloom filter is at least X% faster on average,
but I agree that it'd be great to get a more accurate estimate of the
speed improvement.

What do you think about expanding the benchmark framework to let the
user program control when an iteration is finished? Right now, every 
iteration
runs for 1 sec, and we calculate how many hits+drops have occurred within
that 1 sec. This doesn't work great for when we need to prepopulate the 
data in
advance since we don't know how much data is needed (eg 1 million entries
might be enough on some servers while on other more powerful servers, it'll
finish going through the 1 million entries before the timer is triggered -
one option is to keep reusing the same data until the timer triggers, but
this runs into potential issues where the hits/drops stats can overflow,
especially since they monotonically increase between iterations); we 
could err
on overpopulating to make sure there will always be enough entries, but
then this leads to irritatingly long setup times.

A more ideal setup would be something where we prepopulate the data to
1 million entries, then in each iteration of the benchmark go through the
1 million entries timing how long it takes to run through it with vs. 
without
the bloom filter. This also leads to slightly more accurate results 
since now
we also don't have to spend time logging each hit/drop in the corresponding
per-cpu stats. I'm thinking about this mostly in the context of the bloom
filter, but maybe having this option of running benchmarks this way could be
useful for other maps in the future as well.

What are your thoughts?


>
> But it's ok, feel free to just keep the benchmark as is.
>
[...]


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities
  2021-10-13  4:41                     ` Alexei Starovoitov
@ 2021-10-19 23:53                       ` Andrii Nakryiko
  0 siblings, 0 replies; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-19 23:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Joanne Koong, Toke Høiland-Jørgensen,
	bpf, Kernel Team

On Tue, Oct 12, 2021 at 9:41 PM Alexei Starovoitov <ast@fb.com> wrote:
>
> On 10/12/21 5:11 PM, Martin KaFai Lau wrote:
> > On Tue, Oct 12, 2021 at 03:46:47PM -0700, Joanne Koong wrote:
> >> I'm also open to adding the bloom filter map and then in the
> >> future, if/when there is a need for the bitset map, adding that as a
> >> separate map. In that case, we could have the bitset map take in
> >> both key and value where key = the bitset index and value = 0 or 1.
> > v4 uses nr_hash_funcs is 0 (i.e. map_extra is 0) to mean bitset
> > and non-zero to mean bloom filter.
> >
> > Since the existing no-key API can be reused and work fine with bloom
> > filter, my current thinking is it can start with bloom filter first
> > and limit the nr_hash_funcs to be non-zero.
> >
> > If in the future a bitset map is needed and the bloom filter API can
> > be reused, the non-zero check can be relaxed.  If not, a separate
> > API/map needs to be created for bitset anyway.
> >
>
> sounds good to me.
> let's drop bitset for now since there doesn't seem to be a consensus.

sgtm too

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive
  2021-10-15 23:35       ` Joanne Koong
@ 2021-10-20  0:46         ` Joanne Koong
  0 siblings, 0 replies; 34+ messages in thread
From: Joanne Koong @ 2021-10-20  0:46 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Kernel Team

On 10/15/21 4:35 PM, Joanne Koong wrote:

> On 10/8/21 7:54 PM, Andrii Nakryiko wrote:
>
>> On Wed, Oct 6, 2021 at 3:37 PM Joanne Koong <joannekoong@fb.com> wrote:
>>> On 10/6/21 3:21 PM, Joanne Koong wrote:
>>>
>>>> This patch adds benchmark tests for the throughput (for lookups + 
>>>> updates)
>>>> and the false positive rate of bloom filter lookups, as well as some
>>>> minor refactoring of the bash script for running the benchmarks.
>>>>
>>>> These benchmarks show that as the number of hash functions increases,
>>>> the throughput and the false positive rate of the bloom filter 
>>>> decreases.
>>>>   From the benchmark data, the approximate average false-positive 
>>>> rates for
>>>> 8-byte values are roughly as follows:
>>>>
>>>> 1 hash function = ~30%
>>>> 2 hash functions = ~15%
>>>> 3 hash functions = ~5%
>>>> 4 hash functions = ~2.5%
>>>> 5 hash functions = ~1%
>>>> 6 hash functions = ~0.5%
>>>> 7 hash functions  = ~0.35%
>>>> 8 hash functions = ~0.15%
>>>> 9 hash functions = ~0.1%
>>>> 10 hash functions = ~0%
>>>>
>>>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
>>>> ---
>>>>    tools/testing/selftests/bpf/Makefile          |   6 +-
>>>>    tools/testing/selftests/bpf/bench.c           |  37 ++
>>>>    tools/testing/selftests/bpf/bench.h           |   3 +
>>>>    .../bpf/benchs/bench_bloom_filter_map.c       | 411 
>>>> ++++++++++++++++++
>>>>    .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
>>>>    .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
>>>>    .../selftests/bpf/benchs/run_common.sh        |  48 ++
>>>>    tools/testing/selftests/bpf/bpf_util.h        |  11 +
>>>>    .../selftests/bpf/progs/bloom_filter_bench.c  | 146 +++++++
>>>>    9 files changed, 690 insertions(+), 30 deletions(-)
>>>>    create mode 100644 
>>>> tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
>>>>    create mode 100755 
>>>> tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
>>>>    create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
>>>>    create mode 100644 
>>>> tools/testing/selftests/bpf/progs/bloom_filter_bench.c
>>>>
> [...]
>
>>>> +
>>>> +SEC("fentry/__x64_sys_getpgid")
>>>> +int prog_bloom_filter_hashmap_lookup(void *ctx)
>>>> +{
>>>> +     __u64 *result;
>>>> +     int i, j, err;
>>>> +
>>>> +     __u32 val[64] = {0};
>>>> +
>>>> +     for (i = 0; i < 1024; i++) {
>>>> +             for (j = 0; j < value_sz_nr_u32s && j < 64; j++)
>>>> +                     val[j] = bpf_get_prandom_u32();
>>>> +
>>> I tried out prepopulating these random values from the userspace side
>>> (where we generate a random sequence of 10000 bytes and put that
>>> in a bpf array map, then iterate through the bpf array map in the bpf
>>> program; when I tried implementing it using a global array, I saw
>>> verifier errors when indexing into the array).
>>>
>>> Additionally, this slows down the bench program as well, since we need
>>> to generate all of these random values in the setup() portion of the
>>> program.
>>> I'm not convinced that prepopulating the values ahead of time nets us
>>> anything - if the concern is that this slows down the bpf program,
>>> I think this slows down the program in both the hashmap with and 
>>> without
>>> bloom filter cases; since we are mainly only interested in the delta 
>>> between
>>> these two scenarios, I don't think this ends up mattering that much.
>> So imagine that a hashmap benchmark takes 10ms per iteration, and
>> bloom filter + hashmap takes 5ms. That's a 2x difference, right? Now
>> imagine that random values generation takes another 5ms, so actually
>> you measure 15ms vs 10ms run time. Now, suddenly, you have measured
>> just a 1.5x difference.
> Yeah, you're right - in this case, the calls to bpf_get_prandom_u32()
> are time-intensive enough to have a measurable impact on the 
> difference. I
> guess we could say that the delta is a conservative lower bound 
> estimate -
> that this shows the bloom filter is at least X% faster on average,
> but I agree that it'd be great to get a more accurate estimate of the
> speed improvement.
>
> What do you think about expanding the benchmark framework to let the
> user program control when an iteration is finished? Right now, every 
> iteration
> runs for 1 sec, and we calculate how many hits+drops have occurred within
> that 1 sec. This doesn't work great for when we need to prepopulate 
> the data in
> advance since we don't know how much data is needed (eg 1 million entries
> might be enough on some servers while on other more powerful servers, 
> it'll
> finish going through the 1 million entries before the timer is 
> triggered -
> one option is to keep reusing the same data until the timer triggers, but
> this runs into potential issues where the hits/drops stats can overflow,
> especially since they monotonically increase between iterations); we 
> could err
> on overpopulating to make sure there will always be enough entries, but
> then this leads to irritatingly long setup times.
>
> A more ideal setup would be something where we prepopulate the data to
> 1 million entries, then in each iteration of the benchmark go through the
> 1 million entries timing how long it takes to run through it with vs. 
> without
> the bloom filter. This also leads to slightly more accurate results 
> since now
> we also don't have to spend time logging each hit/drop in the 
> corresponding
> per-cpu stats. I'm thinking about this mostly in the context of the bloom
> filter, but maybe having this option of running benchmarks this way 
> could be
> useful for other maps in the future as well.
>
> What are your thoughts?
>
Andrii and I had a great discussion about this offline and in the next 
revision
(aka v5), the values will be prepopulated ahead of time and recycled. We
don't have to worry about hits/drops overflowing u64; a u64 is large enough
to handle this
>
>>
>> But it's ok, feel free to just keep the benchmark as is.
>>
> [...]
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag
  2021-10-08 23:19   ` Andrii Nakryiko
@ 2021-10-20 21:08     ` Joanne Koong
  2021-10-20 21:21       ` Andrii Nakryiko
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2021-10-20 21:08 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Kernel Team

On 10/8/21 4:19 PM, Andrii Nakryiko wrote:

> On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
>> This patch adds the libbpf infrastructure for supporting a
>> per-map-type "map_extra" field, whose definition will be
>> idiosyncratic depending on map type.
>>
>> For example, for the bitset map, the lower 4 bits of map_extra
>> is used to denote the number of hash functions.
>>
>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
>> ---
>>   include/uapi/linux/bpf.h        |  1 +
>>   tools/include/uapi/linux/bpf.h  |  1 +
>>   tools/lib/bpf/bpf.c             |  1 +
>>   tools/lib/bpf/bpf.h             |  1 +
>>   tools/lib/bpf/bpf_helpers.h     |  1 +
>>   tools/lib/bpf/libbpf.c          | 25 ++++++++++++++++++++++++-
>>   tools/lib/bpf/libbpf.h          |  4 ++++
>>   tools/lib/bpf/libbpf.map        |  2 ++
>>   tools/lib/bpf/libbpf_internal.h |  4 +++-
>>   9 files changed, 38 insertions(+), 2 deletions(-)
>>
>> [...]
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 7d1741ceaa32..41e3e85e7789 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -97,6 +97,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>>          attr.btf_key_type_id = create_attr->btf_key_type_id;
>>          attr.btf_value_type_id = create_attr->btf_value_type_id;
>>          attr.map_ifindex = create_attr->map_ifindex;
>> +       attr.map_extra = create_attr->map_extra;
>>          if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
>>                  attr.btf_vmlinux_value_type_id =
>>                          create_attr->btf_vmlinux_value_type_id;
>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>> index 6fffb3cdf39b..c4049f2d63cc 100644
>> --- a/tools/lib/bpf/bpf.h
>> +++ b/tools/lib/bpf/bpf.h
>> @@ -50,6 +50,7 @@ struct bpf_create_map_attr {
>>                  __u32 inner_map_fd;
>>                  __u32 btf_vmlinux_value_type_id;
>>          };
>> +       __u32 map_extra;
> this struct is frozen, we can't change it. It's fine to not allow
> passing map_extra in libbpf APIs. We have libbpf 1.0 task to revamp
> low-level APIs like map creation in a way that will allow good
> extensibility. You don't have to worry about that in this patch set.
I see! From my understanding, without "map_extra" added to the
bpf_create_map_attr struct, it's not possible in the subsequent
bloom filter benchmark tests to set the map_extra flag, which
means we can't set the number of hash functions. (The entrypoint
for propagating the flags to the kernel at map creation time is
in the function "bpf_create_map_xattr", which takes in a
struct bpf_create_map_attr).

1) To get the benchmark numbers for different # of hash functions, I'll
test using a modified version of the code where the map_extra flags
gets propagated to the kernel. I'll add a TODO to the benchmarks
saying that the specified # of hash functions will get propagated for real
once libbpf's map creation supports map_extra.


2) Should I  drop this libbpf patch altogether from this patchset, and add
it when we do the libbpf 1.0 task to revamp the map creation APIs? Since
without extending map creation to include the map_extra, these map_extra
libbpf changes don't have much effect right now

What are your thoughts?

> [...]
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag
  2021-10-20 21:08     ` Joanne Koong
@ 2021-10-20 21:21       ` Andrii Nakryiko
  2021-10-21 20:14         ` Joanne Koong
  0 siblings, 1 reply; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-20 21:21 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel Team

On Wed, Oct 20, 2021 at 2:09 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> On 10/8/21 4:19 PM, Andrii Nakryiko wrote:
>
> > On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
> >> This patch adds the libbpf infrastructure for supporting a
> >> per-map-type "map_extra" field, whose definition will be
> >> idiosyncratic depending on map type.
> >>
> >> For example, for the bitset map, the lower 4 bits of map_extra
> >> is used to denote the number of hash functions.
> >>
> >> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> >> ---
> >>   include/uapi/linux/bpf.h        |  1 +
> >>   tools/include/uapi/linux/bpf.h  |  1 +
> >>   tools/lib/bpf/bpf.c             |  1 +
> >>   tools/lib/bpf/bpf.h             |  1 +
> >>   tools/lib/bpf/bpf_helpers.h     |  1 +
> >>   tools/lib/bpf/libbpf.c          | 25 ++++++++++++++++++++++++-
> >>   tools/lib/bpf/libbpf.h          |  4 ++++
> >>   tools/lib/bpf/libbpf.map        |  2 ++
> >>   tools/lib/bpf/libbpf_internal.h |  4 +++-
> >>   9 files changed, 38 insertions(+), 2 deletions(-)
> >>
> >> [...]
> >>
> >> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >> index 7d1741ceaa32..41e3e85e7789 100644
> >> --- a/tools/lib/bpf/bpf.c
> >> +++ b/tools/lib/bpf/bpf.c
> >> @@ -97,6 +97,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
> >>          attr.btf_key_type_id = create_attr->btf_key_type_id;
> >>          attr.btf_value_type_id = create_attr->btf_value_type_id;
> >>          attr.map_ifindex = create_attr->map_ifindex;
> >> +       attr.map_extra = create_attr->map_extra;
> >>          if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
> >>                  attr.btf_vmlinux_value_type_id =
> >>                          create_attr->btf_vmlinux_value_type_id;
> >> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> >> index 6fffb3cdf39b..c4049f2d63cc 100644
> >> --- a/tools/lib/bpf/bpf.h
> >> +++ b/tools/lib/bpf/bpf.h
> >> @@ -50,6 +50,7 @@ struct bpf_create_map_attr {
> >>                  __u32 inner_map_fd;
> >>                  __u32 btf_vmlinux_value_type_id;
> >>          };
> >> +       __u32 map_extra;
> > this struct is frozen, we can't change it. It's fine to not allow
> > passing map_extra in libbpf APIs. We have libbpf 1.0 task to revamp
> > low-level APIs like map creation in a way that will allow good
> > extensibility. You don't have to worry about that in this patch set.
> I see! From my understanding, without "map_extra" added to the
> bpf_create_map_attr struct, it's not possible in the subsequent
> bloom filter benchmark tests to set the map_extra flag, which

Didn't you add bpf_map__set_map_extra() setter for that? Also one can
always do direct bpf syscall (see sys_bpf in tools/lib/bpf/bpf.c), if
absolutely necessary. But set_map_extra() setter is the way to go for
benchmark, I think.

> means we can't set the number of hash functions. (The entrypoint
> for propagating the flags to the kernel at map creation time is
> in the function "bpf_create_map_xattr", which takes in a
> struct bpf_create_map_attr).
>
> 1) To get the benchmark numbers for different # of hash functions, I'll
> test using a modified version of the code where the map_extra flags
> gets propagated to the kernel. I'll add a TODO to the benchmarks
> saying that the specified # of hash functions will get propagated for real
> once libbpf's map creation supports map_extra.
>
>
> 2) Should I  drop this libbpf patch altogether from this patchset, and add
> it when we do the libbpf 1.0 task to revamp the map creation APIs? Since
> without extending map creation to include the map_extra, these map_extra
> libbpf changes don't have much effect right now

No, getter/setter API is good to have, please keep them.

>
> What are your thoughts?
>
> > [...]
> >> --
> >> 2.30.2
> >>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag
  2021-10-20 21:21       ` Andrii Nakryiko
@ 2021-10-21 20:14         ` Joanne Koong
  2021-10-21 21:41           ` Andrii Nakryiko
  0 siblings, 1 reply; 34+ messages in thread
From: Joanne Koong @ 2021-10-21 20:14 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, Kernel Team

On 10/20/21 2:21 PM, Andrii Nakryiko wrote:

> On Wed, Oct 20, 2021 at 2:09 PM Joanne Koong <joannekoong@fb.com> wrote:
>> On 10/8/21 4:19 PM, Andrii Nakryiko wrote:
>>
>>> On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
>>>> This patch adds the libbpf infrastructure for supporting a
>>>> per-map-type "map_extra" field, whose definition will be
>>>> idiosyncratic depending on map type.
>>>>
>>>> For example, for the bitset map, the lower 4 bits of map_extra
>>>> is used to denote the number of hash functions.
>>>>
>>>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
>>>> ---
>>>>    include/uapi/linux/bpf.h        |  1 +
>>>>    tools/include/uapi/linux/bpf.h  |  1 +
>>>>    tools/lib/bpf/bpf.c             |  1 +
>>>>    tools/lib/bpf/bpf.h             |  1 +
>>>>    tools/lib/bpf/bpf_helpers.h     |  1 +
>>>>    tools/lib/bpf/libbpf.c          | 25 ++++++++++++++++++++++++-
>>>>    tools/lib/bpf/libbpf.h          |  4 ++++
>>>>    tools/lib/bpf/libbpf.map        |  2 ++
>>>>    tools/lib/bpf/libbpf_internal.h |  4 +++-
>>>>    9 files changed, 38 insertions(+), 2 deletions(-)
>>>>
>>>> [...]
>>>>
>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>>>> index 7d1741ceaa32..41e3e85e7789 100644
>>>> --- a/tools/lib/bpf/bpf.c
>>>> +++ b/tools/lib/bpf/bpf.c
>>>> @@ -97,6 +97,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
>>>>           attr.btf_key_type_id = create_attr->btf_key_type_id;
>>>>           attr.btf_value_type_id = create_attr->btf_value_type_id;
>>>>           attr.map_ifindex = create_attr->map_ifindex;
>>>> +       attr.map_extra = create_attr->map_extra;
>>>>           if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
>>>>                   attr.btf_vmlinux_value_type_id =
>>>>                           create_attr->btf_vmlinux_value_type_id;
>>>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
>>>> index 6fffb3cdf39b..c4049f2d63cc 100644
>>>> --- a/tools/lib/bpf/bpf.h
>>>> +++ b/tools/lib/bpf/bpf.h
>>>> @@ -50,6 +50,7 @@ struct bpf_create_map_attr {
>>>>                   __u32 inner_map_fd;
>>>>                   __u32 btf_vmlinux_value_type_id;
>>>>           };
>>>> +       __u32 map_extra;
>>> this struct is frozen, we can't change it. It's fine to not allow
>>> passing map_extra in libbpf APIs. We have libbpf 1.0 task to revamp
>>> low-level APIs like map creation in a way that will allow good
>>> extensibility. You don't have to worry about that in this patch set.
>> I see! From my understanding, without "map_extra" added to the
>> bpf_create_map_attr struct, it's not possible in the subsequent
>> bloom filter benchmark tests to set the map_extra flag, which
> Didn't you add bpf_map__set_map_extra() setter for that? Also one can
> always do direct bpf syscall (see sys_bpf in tools/lib/bpf/bpf.c), if
> absolutely necessary. But set_map_extra() setter is the way to go for
> benchmark, I think.
bpf_map__set_map_extra() sets the map_extra field for the bpf_map
struct, but that field can't get propagated through to the kernel
when the BPF_MAP_CREATE syscall is called in bpf_map_create_xattr.
This is because bpf_map_create_xattr takes in a "bpf_create_map_attr"
struct to instantiate the "bpf_attr" struct it passes to the kernel, but
map_extra is not part of "bpf_create_map_attr" struct and can't be
added since the struct is frozen.

I don't think doing a direct bpf syscall in the userspace program,
and then passing the "int bloom_map_fd" to the bpf program
through the skeleton works either. This is because in the bpf program,
we can't call bpf_map_peek/push since these only take in a
"struct bpf_map *", and not an fd. We can't go from fd -> struct bpf_map *
either with something like

struct fd f = fdget(bloom_map_fd);
struct bpf_map *map = __bpf_map_get(f);

since both "__bpf_map_get" and "fdget" are not functions bpf programs
can call.


>> means we can't set the number of hash functions. (The entrypoint
>> for propagating the flags to the kernel at map creation time is
>> in the function "bpf_create_map_xattr", which takes in a
>> struct bpf_create_map_attr).
>>
>> 1) To get the benchmark numbers for different # of hash functions, I'll
>> test using a modified version of the code where the map_extra flags
>> gets propagated to the kernel. I'll add a TODO to the benchmarks
>> saying that the specified # of hash functions will get propagated for real
>> once libbpf's map creation supports map_extra.
>>
>>
>> 2) Should I  drop this libbpf patch altogether from this patchset, and add
>> it when we do the libbpf 1.0 task to revamp the map creation APIs? Since
>> without extending map creation to include the map_extra, these map_extra
>> libbpf changes don't have much effect right now
> No, getter/setter API is good to have, please keep them.
>
>>> [...]
>>>> --
>>>> 2.30.2
>>>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag
  2021-10-21 20:14         ` Joanne Koong
@ 2021-10-21 21:41           ` Andrii Nakryiko
  0 siblings, 0 replies; 34+ messages in thread
From: Andrii Nakryiko @ 2021-10-21 21:41 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf, Kernel Team

On Thu, Oct 21, 2021 at 1:14 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> On 10/20/21 2:21 PM, Andrii Nakryiko wrote:
>
> > On Wed, Oct 20, 2021 at 2:09 PM Joanne Koong <joannekoong@fb.com> wrote:
> >> On 10/8/21 4:19 PM, Andrii Nakryiko wrote:
> >>
> >>> On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@fb.com> wrote:
> >>>> This patch adds the libbpf infrastructure for supporting a
> >>>> per-map-type "map_extra" field, whose definition will be
> >>>> idiosyncratic depending on map type.
> >>>>
> >>>> For example, for the bitset map, the lower 4 bits of map_extra
> >>>> is used to denote the number of hash functions.
> >>>>
> >>>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> >>>> ---
> >>>>    include/uapi/linux/bpf.h        |  1 +
> >>>>    tools/include/uapi/linux/bpf.h  |  1 +
> >>>>    tools/lib/bpf/bpf.c             |  1 +
> >>>>    tools/lib/bpf/bpf.h             |  1 +
> >>>>    tools/lib/bpf/bpf_helpers.h     |  1 +
> >>>>    tools/lib/bpf/libbpf.c          | 25 ++++++++++++++++++++++++-
> >>>>    tools/lib/bpf/libbpf.h          |  4 ++++
> >>>>    tools/lib/bpf/libbpf.map        |  2 ++
> >>>>    tools/lib/bpf/libbpf_internal.h |  4 +++-
> >>>>    9 files changed, 38 insertions(+), 2 deletions(-)
> >>>>
> >>>> [...]
> >>>>
> >>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >>>> index 7d1741ceaa32..41e3e85e7789 100644
> >>>> --- a/tools/lib/bpf/bpf.c
> >>>> +++ b/tools/lib/bpf/bpf.c
> >>>> @@ -97,6 +97,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
> >>>>           attr.btf_key_type_id = create_attr->btf_key_type_id;
> >>>>           attr.btf_value_type_id = create_attr->btf_value_type_id;
> >>>>           attr.map_ifindex = create_attr->map_ifindex;
> >>>> +       attr.map_extra = create_attr->map_extra;
> >>>>           if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
> >>>>                   attr.btf_vmlinux_value_type_id =
> >>>>                           create_attr->btf_vmlinux_value_type_id;
> >>>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> >>>> index 6fffb3cdf39b..c4049f2d63cc 100644
> >>>> --- a/tools/lib/bpf/bpf.h
> >>>> +++ b/tools/lib/bpf/bpf.h
> >>>> @@ -50,6 +50,7 @@ struct bpf_create_map_attr {
> >>>>                   __u32 inner_map_fd;
> >>>>                   __u32 btf_vmlinux_value_type_id;
> >>>>           };
> >>>> +       __u32 map_extra;
> >>> this struct is frozen, we can't change it. It's fine to not allow
> >>> passing map_extra in libbpf APIs. We have libbpf 1.0 task to revamp
> >>> low-level APIs like map creation in a way that will allow good
> >>> extensibility. You don't have to worry about that in this patch set.
> >> I see! From my understanding, without "map_extra" added to the
> >> bpf_create_map_attr struct, it's not possible in the subsequent
> >> bloom filter benchmark tests to set the map_extra flag, which
> > Didn't you add bpf_map__set_map_extra() setter for that? Also one can
> > always do direct bpf syscall (see sys_bpf in tools/lib/bpf/bpf.c), if
> > absolutely necessary. But set_map_extra() setter is the way to go for
> > benchmark, I think.
> bpf_map__set_map_extra() sets the map_extra field for the bpf_map
> struct, but that field can't get propagated through to the kernel
> when the BPF_MAP_CREATE syscall is called in bpf_map_create_xattr.
> This is because bpf_map_create_xattr takes in a "bpf_create_map_attr"
> struct to instantiate the "bpf_attr" struct it passes to the kernel, but
> map_extra is not part of "bpf_create_map_attr" struct and can't be
> added since the struct is frozen.

Oh, that's where the problem is. Libbpf internally doesn't have to use
bpf_create_map_xattr(). We are going to revamp all these low-level
interfaces to be extensible, but until then, I think it will be fine
to just create an internal helper that would allow us to create maps
without restrictions of maintaining API compatibility. See what we did
with libbpf__bpf_prog_load().

>
> I don't think doing a direct bpf syscall in the userspace program,
> and then passing the "int bloom_map_fd" to the bpf program
> through the skeleton works either. This is because in the bpf program,
> we can't call bpf_map_peek/push since these only take in a
> "struct bpf_map *", and not an fd. We can't go from fd -> struct bpf_map *
> either with something like
>
> struct fd f = fdget(bloom_map_fd);
> struct bpf_map *map = __bpf_map_get(f);
>
> since both "__bpf_map_get" and "fdget" are not functions bpf programs
> can call.

On BPF side there is no "struct bpf_map", actually. bpf_map_peek()
takes just "void *" which will be just a reference to the variable
that represents the map (and BPF verifier actually does the right
thing during program load, passing correct kernel address of the map).
On user-space side, though, user can use bpf_map__reuse_fd() to set
everything up, if they create map with their own custom logic.

But we are getting too much into the weeds. Let's just copy/paste
bpf_map_create_xattr() for now and add map_extra support there. And
pretty soon we'll have a nicer set of low-level APIs, at which point
we'll switch to using them internally as well.

>
>
> >> means we can't set the number of hash functions. (The entrypoint
> >> for propagating the flags to the kernel at map creation time is
> >> in the function "bpf_create_map_xattr", which takes in a
> >> struct bpf_create_map_attr).
> >>
> >> 1) To get the benchmark numbers for different # of hash functions, I'll
> >> test using a modified version of the code where the map_extra flags
> >> gets propagated to the kernel. I'll add a TODO to the benchmarks
> >> saying that the specified # of hash functions will get propagated for real
> >> once libbpf's map creation supports map_extra.
> >>
> >>
> >> 2) Should I  drop this libbpf patch altogether from this patchset, and add
> >> it when we do the libbpf 1.0 task to revamp the map creation APIs? Since
> >> without extending map creation to include the map_extra, these map_extra
> >> libbpf changes don't have much effect right now
> > No, getter/setter API is good to have, please keep them.
> >
> >>> [...]
> >>>> --
> >>>> 2.30.2
> >>>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-10-21 21:41 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-06 22:20 [PATCH bpf-next v4 0/5] Implement bitset maps, with bloom filter Joanne Koong
2021-10-06 22:20 ` [PATCH bpf-next v4 1/5] bpf: Add bitset map with bloom filter capabilities Joanne Koong
2021-10-07 14:20   ` Toke Høiland-Jørgensen
2021-10-07 21:59     ` Joanne Koong
2021-10-08 21:57       ` Toke Høiland-Jørgensen
2021-10-08 23:11         ` Andrii Nakryiko
2021-10-09 13:10           ` Toke Høiland-Jørgensen
2021-10-12  3:17             ` Andrii Nakryiko
2021-10-12 12:48               ` Toke Høiland-Jørgensen
2021-10-12 22:46                 ` Joanne Koong
2021-10-12 23:25                   ` Zvi Effron
2021-10-13  1:17                     ` Joanne Koong
2021-10-13  4:48                       ` Alexei Starovoitov
2021-10-13  0:11                   ` Martin KaFai Lau
2021-10-13  4:41                     ` Alexei Starovoitov
2021-10-19 23:53                       ` Andrii Nakryiko
2021-10-08 23:05   ` Andrii Nakryiko
2021-10-08 23:24     ` Zvi Effron
2021-10-09  0:16       ` Martin KaFai Lau
2021-10-06 22:21 ` [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag Joanne Koong
2021-10-08 23:19   ` Andrii Nakryiko
2021-10-20 21:08     ` Joanne Koong
2021-10-20 21:21       ` Andrii Nakryiko
2021-10-21 20:14         ` Joanne Koong
2021-10-21 21:41           ` Andrii Nakryiko
2021-10-09  2:12   ` Andrii Nakryiko
2021-10-06 22:21 ` [PATCH bpf-next v4 3/5] selftests/bpf: Add bitset map test cases Joanne Koong
2021-10-06 22:21 ` [PATCH bpf-next v4 4/5] bpf/benchs: Add benchmark tests for bloom filter throughput + false positive Joanne Koong
2021-10-06 22:35   ` Joanne Koong
2021-10-09  2:54     ` Andrii Nakryiko
2021-10-15 23:35       ` Joanne Koong
2021-10-20  0:46         ` Joanne Koong
2021-10-09  2:39   ` Andrii Nakryiko
2021-10-06 22:21 ` [PATCH bpf-next v4 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out bloom filter Joanne Koong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.