All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/5] Implement bloom filter map
@ 2021-08-31 22:50 Joanne Koong
  2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Joanne Koong @ 2021-08-31 22:50 UTC (permalink / raw)
  To: bpf; +Cc: Joanne Koong

This patchset adds a new kind of bpf map: the bloom filter map.
Bloom filters are a space-efficient probabilistic data structure
used to quickly test whether an element exists in a set.
In a bloom filter, false positives are possible whereas false
negatives are not.

This patchset includes benchmarks for a configurable number of hashes and
entries in the bloom filter. The benchmarks roughly show that on average,
using 3 hash functions is one of the most optimal choices. When comparing
hashmap lookups using 3 hashes in the bloom filter vs. hashmap lookups not
using a bloom filter, lookups using the bloom filter are roughly 15% faster
for 50k entries, 25% faster for 100k entries, 180% faster for 500k entries,
and 200% faster for 1 million entries.

Joanne Koong (5):
  bpf: Add bloom filter map implementation
  libbpf: Allow the number of hashes in bloom filter maps to be
    configurable
  selftests/bpf: Add bloom filter map test cases
  bpf/benchs: Add benchmark test for bloom filter maps
  bpf/benchs: Add benchmarks for comparing hashmap lookups with vs.
    without bloom filter

 include/linux/bpf.h                           |   3 +-
 include/linux/bpf_types.h                     |   1 +
 include/uapi/linux/bpf.h                      |   3 +
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/bloom_filter.c                     | 171 ++++++++
 kernel/bpf/syscall.c                          |  20 +-
 kernel/bpf/verifier.c                         |  19 +-
 tools/include/uapi/linux/bpf.h                |   3 +
 tools/lib/bpf/bpf.c                           |   2 +
 tools/lib/bpf/bpf.h                           |   1 +
 tools/lib/bpf/libbpf.c                        |  32 +-
 tools/lib/bpf/libbpf.h                        |   3 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_internal.h               |   4 +-
 tools/testing/selftests/bpf/Makefile          |   4 +-
 tools/testing/selftests/bpf/bench.c           |  57 ++-
 tools/testing/selftests/bpf/bench.h           |   3 +
 .../bpf/benchs/bench_bloom_filter_map.c       | 383 ++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  |  43 ++
 .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
 .../selftests/bpf/benchs/run_common.sh        |  60 +++
 .../bpf/prog_tests/bloom_filter_map.c         | 123 ++++++
 .../selftests/bpf/progs/bloom_filter_map.c    | 123 ++++++
 23 files changed, 1048 insertions(+), 44 deletions(-)
 create mode 100644 kernel/bpf/bloom_filter.c
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
 create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_map.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-08-31 22:50 [PATCH bpf-next 0/5] Implement bloom filter map Joanne Koong
@ 2021-08-31 22:50 ` Joanne Koong
  2021-09-01  2:55   ` Alexei Starovoitov
                     ` (2 more replies)
  2021-08-31 22:50 ` [PATCH bpf-next 2/5] libbpf: Allow the number of hashes in bloom filter maps to be configurable Joanne Koong
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 23+ messages in thread
From: Joanne Koong @ 2021-08-31 22:50 UTC (permalink / raw)
  To: bpf; +Cc: Joanne Koong

Bloom filters are a space-efficient probabilistic data structure
used to quickly test whether an element exists in a set.
In a bloom filter, false positives are possible whereas false
negatives are not.

This patch adds a bloom filter map for bpf programs.
The bloom filter map supports peek (determining whether an element
is present in the map) and push (adding an element to the map)
operations.These operations are exposed to userspace applications
through the already existing syscalls in the following way:

BPF_MAP_LOOKUP_ELEM -> peek
BPF_MAP_UPDATE_ELEM -> push

The bloom filter map does not have keys, only values. In light of
this, the bloom filter map's API matches that of queue stack maps:
user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
APIs to query or add an element to the bloom filter map. When the
bloom filter map is created, it must be created with a key_size of 0.

For updates, the user will pass in the element to add to the map
as the value, wih a NULL key. For lookups, the user will pass in the
element to query in the map as the value. In the verifier layer, this
requires us to modify the argument type of a bloom filter's
BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
the syscall layer, we need to copy over the user value so that in
bpf_map_peek_elem, we know which specific value to query.

The maximum number of entries in the bloom filter is not enforced; if
the user wishes to insert more entries into the bloom filter than they
specified as the max entries size of the bloom filter, that is permitted
but the performance of their bloom filter will have a higher false
positive rate.

The number of hashes to use for the bloom filter is configurable from
userspace. The benchmarks later in this patchset can help compare the
performances of different number of hashes on different entry
sizes. In general, using more hashes decreases the speed of a lookup,
but increases the false positive rate of an element being detected in the
bloom filter.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 include/linux/bpf.h            |   3 +-
 include/linux/bpf_types.h      |   1 +
 include/uapi/linux/bpf.h       |   3 +
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/bloom_filter.c      | 171 +++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  20 +++-
 kernel/bpf/verifier.c          |  19 +++-
 tools/include/uapi/linux/bpf.h |   3 +
 8 files changed, 214 insertions(+), 8 deletions(-)
 create mode 100644 kernel/bpf/bloom_filter.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f4c16f19f83e..2abaa1052096 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -181,7 +181,8 @@ struct bpf_map {
 	u32 btf_vmlinux_value_type_id;
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 22 bytes hole */
+	u32 nr_hashes; /* used for bloom filter maps */
+	/* 18 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 9c81724e4b98..c4424ac2fa02 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
 
 BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 791f31dd0abe..c2acb0a510fe 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_RINGBUF,
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BLOOM_FILTER,
 };
 
 /* Note that tracing related programs such as
@@ -1274,6 +1275,7 @@ union bpf_attr {
 						   * struct stored as the
 						   * map value
 						   */
+		__u32	nr_hashes;      /* used for configuring bloom filter maps */
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -5594,6 +5596,7 @@ struct bpf_map_info {
 	__u32 btf_id;
 	__u32 btf_key_type_id;
 	__u32 btf_value_type_id;
+	__u32 nr_hashes; /* used for bloom filter maps */
 } __attribute__((aligned(8)));
 
 struct bpf_btf_info {
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 7f33098ca63f..cf6ca339f3cd 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -7,7 +7,7 @@ endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
-obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
+obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
new file mode 100644
index 000000000000..3ae799ab3747
--- /dev/null
+++ b/kernel/bpf/bloom_filter.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <linux/bitmap.h>
+#include <linux/bpf.h>
+#include <linux/err.h>
+#include <linux/jhash.h>
+#include <linux/random.h>
+#include <linux/spinlock.h>
+
+#define BLOOM_FILTER_CREATE_FLAG_MASK \
+	(BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
+
+struct bpf_bloom_filter {
+	struct bpf_map map;
+	u32 bit_array_mask;
+	u32 hash_seed;
+	/* Used for synchronizing parallel writes to the bit array */
+	spinlock_t spinlock;
+	unsigned long bit_array[];
+};
+
+static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
+{
+	struct bpf_bloom_filter *bloom_filter =
+		container_of(map, struct bpf_bloom_filter, map);
+	u32 i, hash;
+
+	for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
+		hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
+			bloom_filter->bit_array_mask;
+		if (!test_bit(hash, bloom_filter->bit_array))
+			return -ENOENT;
+	}
+
+	return 0;
+}
+
+static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
+{
+	int numa_node = bpf_map_attr_numa_node(attr);
+	u32 nr_bits, bit_array_bytes, bit_array_mask;
+	struct bpf_bloom_filter *bloom_filter;
+
+	if (!bpf_capable())
+		return ERR_PTR(-EPERM);
+
+	if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
+	    attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags))
+		return ERR_PTR(-EINVAL);
+
+	/* For the bloom filter, the optimal bit array size that minimizes the
+	 * false positive probability is n * k / ln(2) where n is the number of
+	 * expected entries in the bloom filter and k is the number of hash
+	 * functions. We use 7 / 5 to approximate 1 / ln(2).
+	 *
+	 * We round this up to the nearest power of two to enable more efficient
+	 * hashing using bitmasks. The bitmask will be the bit array size - 1.
+	 *
+	 * If this overflows a u32, the bit array size will have 2^32 (4
+	 * GB) bits.
+	 */
+	if (unlikely(check_mul_overflow(attr->max_entries, attr->nr_hashes, &nr_bits)) ||
+	    unlikely(check_mul_overflow(nr_bits / 5, (u32)7, &nr_bits)) ||
+	    unlikely(nr_bits > (1UL << 31))) {
+		/* The bit array size is 2^32 bits but to avoid overflowing the
+		 * u32, we use BITS_TO_BYTES(U32_MAX), which will round up to the
+		 * equivalent number of bytes
+		 */
+		bit_array_bytes = BITS_TO_BYTES(U32_MAX);
+		bit_array_mask = U32_MAX;
+	} else {
+		if (nr_bits <= BITS_PER_LONG)
+			nr_bits = BITS_PER_LONG;
+		else
+			nr_bits = roundup_pow_of_two(nr_bits);
+		bit_array_bytes = BITS_TO_BYTES(nr_bits);
+		bit_array_mask = nr_bits - 1;
+	}
+
+	bit_array_bytes = roundup(bit_array_bytes, sizeof(unsigned long));
+	bloom_filter = bpf_map_area_alloc(sizeof(*bloom_filter) + bit_array_bytes,
+					  numa_node);
+
+	if (!bloom_filter)
+		return ERR_PTR(-ENOMEM);
+
+	bpf_map_init_from_attr(&bloom_filter->map, attr);
+	bloom_filter->map.nr_hashes = attr->nr_hashes;
+
+	bloom_filter->bit_array_mask = bit_array_mask;
+	spin_lock_init(&bloom_filter->spinlock);
+
+	if (!(attr->map_flags & BPF_F_ZERO_SEED))
+		bloom_filter->hash_seed = get_random_int();
+
+	return &bloom_filter->map;
+}
+
+static void bloom_filter_map_free(struct bpf_map *map)
+{
+	struct bpf_bloom_filter *bloom_filter =
+		container_of(map, struct bpf_bloom_filter, map);
+
+	bpf_map_area_free(bloom_filter);
+}
+
+static int bloom_filter_map_push_elem(struct bpf_map *map, void *value,
+				      u64 flags)
+{
+	struct bpf_bloom_filter *bloom_filter =
+		container_of(map, struct bpf_bloom_filter, map);
+	unsigned long spinlock_flags;
+	u32 i, hash;
+
+	if (flags != BPF_ANY)
+		return -EINVAL;
+
+	spin_lock_irqsave(&bloom_filter->spinlock, spinlock_flags);
+
+	for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
+		hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
+			bloom_filter->bit_array_mask;
+		bitmap_set(bloom_filter->bit_array, hash, 1);
+	}
+
+	spin_unlock_irqrestore(&bloom_filter->spinlock, spinlock_flags);
+
+	return 0;
+}
+
+static void *bloom_filter_map_lookup_elem(struct bpf_map *map, void *key)
+{
+	/* The eBPF program should use map_peek_elem instead */
+	return ERR_PTR(-EINVAL);
+}
+
+static int bloom_filter_map_update_elem(struct bpf_map *map, void *key,
+					void *value, u64 flags)
+{
+	/* The eBPF program should use map_push_elem instead */
+	return -EINVAL;
+}
+
+static int bloom_filter_map_delete_elem(struct bpf_map *map, void *key)
+{
+	return -EOPNOTSUPP;
+}
+
+static int bloom_filter_map_get_next_key(struct bpf_map *map, void *key,
+					 void *next_key)
+{
+	return -EOPNOTSUPP;
+}
+
+static int bloom_filter_map_btf_id;
+const struct bpf_map_ops bloom_filter_map_ops = {
+	.map_meta_equal = bpf_map_meta_equal,
+	.map_alloc = bloom_filter_map_alloc,
+	.map_free = bloom_filter_map_free,
+	.map_push_elem = bloom_filter_map_push_elem,
+	.map_peek_elem = bloom_filter_map_peek_elem,
+	.map_lookup_elem = bloom_filter_map_lookup_elem,
+	.map_update_elem = bloom_filter_map_update_elem,
+	.map_delete_elem = bloom_filter_map_delete_elem,
+	.map_get_next_key = bloom_filter_map_get_next_key,
+	.map_check_btf = map_check_no_btf,
+	.map_btf_name = "bpf_bloom_filter",
+	.map_btf_id = &bloom_filter_map_btf_id,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4e50c0bfdb7d..b80bdda26fbf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -199,7 +199,8 @@ static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
 		err = bpf_fd_reuseport_array_update_elem(map, key, value,
 							 flags);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
+		   map->map_type == BPF_MAP_TYPE_STACK ||
+		   map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
 		err = map->ops->map_push_elem(map, value, flags);
 	} else {
 		rcu_read_lock();
@@ -238,7 +239,8 @@ static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
 	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
 		err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
+		   map->map_type == BPF_MAP_TYPE_STACK ||
+		   map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
 		err = map->ops->map_peek_elem(map, value);
 	} else if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) {
 		/* struct_ops map requires directly updating "value" */
@@ -810,7 +812,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD btf_vmlinux_value_type_id
+#define BPF_MAP_CREATE_LAST_FIELD nr_hashes
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
@@ -831,6 +833,9 @@ static int map_create(union bpf_attr *attr)
 		return -EINVAL;
 	}
 
+	if (attr->nr_hashes != 0 && attr->map_type != BPF_MAP_TYPE_BLOOM_FILTER)
+		return -EINVAL;
+
 	f_flags = bpf_get_file_flag(attr->map_flags);
 	if (f_flags < 0)
 		return f_flags;
@@ -1080,6 +1085,14 @@ static int map_lookup_elem(union bpf_attr *attr)
 	if (!value)
 		goto free_key;
 
+	if (map->map_type == BPF_MAP_TYPE_BLOOM_FILTER) {
+		if (copy_from_user(value, uvalue, value_size))
+			err = -EFAULT;
+		else
+			err = bpf_map_copy_value(map, key, value, attr->flags);
+		goto free_value;
+	}
+
 	err = bpf_map_copy_value(map, key, value, attr->flags);
 	if (err)
 		goto free_value;
@@ -3872,6 +3885,7 @@ static int bpf_map_get_info_by_fd(struct file *file,
 	info.max_entries = map->max_entries;
 	info.map_flags = map->map_flags;
 	memcpy(info.name, map->name, sizeof(map->name));
+	info.nr_hashes = map->nr_hashes;
 
 	if (map->btf) {
 		info.btf_id = btf_obj_id(map->btf);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 047ac4b4703b..5cbcff4c2222 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4813,7 +4813,10 @@ static int resolve_map_arg_type(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 		break;
-
+	case BPF_MAP_TYPE_BLOOM_FILTER:
+		if (meta->func_id == BPF_FUNC_map_peek_elem)
+			*arg_type = ARG_PTR_TO_MAP_VALUE;
+		break;
 	default:
 		break;
 	}
@@ -5388,6 +5391,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    func_id != BPF_FUNC_task_storage_delete)
 			goto error;
 		break;
+	case BPF_MAP_TYPE_BLOOM_FILTER:
+		if (func_id != BPF_FUNC_map_push_elem &&
+		    func_id != BPF_FUNC_map_peek_elem)
+			goto error;
+		break;
 	default:
 		break;
 	}
@@ -5455,13 +5463,18 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    map->map_type != BPF_MAP_TYPE_SOCKHASH)
 			goto error;
 		break;
-	case BPF_FUNC_map_peek_elem:
 	case BPF_FUNC_map_pop_elem:
-	case BPF_FUNC_map_push_elem:
 		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
 		    map->map_type != BPF_MAP_TYPE_STACK)
 			goto error;
 		break;
+	case BPF_FUNC_map_push_elem:
+	case BPF_FUNC_map_peek_elem:
+		if (map->map_type != BPF_MAP_TYPE_QUEUE &&
+		    map->map_type != BPF_MAP_TYPE_STACK &&
+		    map->map_type != BPF_MAP_TYPE_BLOOM_FILTER)
+			goto error;
+		break;
 	case BPF_FUNC_sk_storage_get:
 	case BPF_FUNC_sk_storage_delete:
 		if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 791f31dd0abe..26b814a7d61a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -906,6 +906,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_RINGBUF,
 	BPF_MAP_TYPE_INODE_STORAGE,
 	BPF_MAP_TYPE_TASK_STORAGE,
+	BPF_MAP_TYPE_BLOOM_FILTER,
 };
 
 /* Note that tracing related programs such as
@@ -1274,6 +1275,7 @@ union bpf_attr {
 						   * struct stored as the
 						   * map value
 						   */
+		__u32	nr_hashes;	/* used for configuring bloom filter maps */
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -5594,6 +5596,7 @@ struct bpf_map_info {
 	__u32 btf_id;
 	__u32 btf_key_type_id;
 	__u32 btf_value_type_id;
+	__u32 nr_hashes; /* used for bloom filter maps */
 } __attribute__((aligned(8)));
 
 struct bpf_btf_info {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 2/5] libbpf: Allow the number of hashes in bloom filter maps to be configurable
  2021-08-31 22:50 [PATCH bpf-next 0/5] Implement bloom filter map Joanne Koong
  2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
@ 2021-08-31 22:50 ` Joanne Koong
  2021-09-02  3:30   ` Andrii Nakryiko
  2021-08-31 22:50 ` [PATCH bpf-next 3/5] selftests/bpf: Add bloom filter map test cases Joanne Koong
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2021-08-31 22:50 UTC (permalink / raw)
  To: bpf; +Cc: Joanne Koong

This patch adds the libbpf infrastructure that will allow the user to
specify the number of hash functions to use for the bloom filter map.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 tools/lib/bpf/bpf.c             |  2 ++
 tools/lib/bpf/bpf.h             |  1 +
 tools/lib/bpf/libbpf.c          | 32 +++++++++++++++++++++++++++++++-
 tools/lib/bpf/libbpf.h          |  3 +++
 tools/lib/bpf/libbpf.map        |  2 ++
 tools/lib/bpf/libbpf_internal.h |  4 +++-
 6 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 2401fad090c5..cc928c5b92a4 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -100,6 +100,8 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
 	if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
 		attr.btf_vmlinux_value_type_id =
 			create_attr->btf_vmlinux_value_type_id;
+	else if (attr.map_type == BPF_MAP_TYPE_BLOOM_FILTER)
+		attr.nr_hashes = create_attr->nr_hashes;
 	else
 		attr.inner_map_fd = create_attr->inner_map_fd;
 
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 6fffb3cdf39b..ea29d6647e20 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -49,6 +49,7 @@ struct bpf_create_map_attr {
 	union {
 		__u32 inner_map_fd;
 		__u32 btf_vmlinux_value_type_id;
+		__u32 nr_hashes; /* used for bloom filter maps */
 	};
 };
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 88d8825fc6f6..ac03404aeb57 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -377,6 +377,7 @@ struct bpf_map {
 	char *pin_path;
 	bool pinned;
 	bool reused;
+	__u32 nr_hashes; /* used for bloom filter maps */
 };
 
 enum extern_type {
@@ -1290,6 +1291,11 @@ static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
 	return false;
 }
 
+static bool bpf_map_type__is_bloom_filter(enum bpf_map_type type)
+{
+	return type == BPF_MAP_TYPE_BLOOM_FILTER;
+}
+
 int bpf_object__section_size(const struct bpf_object *obj, const char *name,
 			     __u32 *size)
 {
@@ -2080,6 +2086,10 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
 			if (!get_map_field_int(map_name, btf, m, &map_def->map_flags))
 				return -EINVAL;
 			map_def->parts |= MAP_DEF_MAP_FLAGS;
+		} else if (strcmp(name, "nr_hashes") == 0) {
+			if (!get_map_field_int(map_name, btf, m, &map_def->nr_hashes))
+				return -EINVAL;
+			map_def->parts |= MAP_DEF_NR_HASHES;
 		} else if (strcmp(name, "numa_node") == 0) {
 			if (!get_map_field_int(map_name, btf, m, &map_def->numa_node))
 				return -EINVAL;
@@ -2264,6 +2274,7 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
 	map->numa_node = def->numa_node;
 	map->btf_key_type_id = def->key_type_id;
 	map->btf_value_type_id = def->value_type_id;
+	map->nr_hashes = def->nr_hashes;
 
 	if (def->parts & MAP_DEF_MAP_TYPE)
 		pr_debug("map '%s': found type = %u.\n", map->name, def->map_type);
@@ -2288,6 +2299,8 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
 		pr_debug("map '%s': found pinning = %u.\n", map->name, def->pinning);
 	if (def->parts & MAP_DEF_NUMA_NODE)
 		pr_debug("map '%s': found numa_node = %u.\n", map->name, def->numa_node);
+	if (def->parts & MAP_DEF_NR_HASHES)
+		pr_debug("map '%s': found nr_hashes = %u.\n", map->name, def->nr_hashes);
 
 	if (def->parts & MAP_DEF_INNER_MAP)
 		pr_debug("map '%s': found inner map definition.\n", map->name);
@@ -3979,6 +3992,7 @@ int bpf_map__reuse_fd(struct bpf_map *map, int fd)
 	map->btf_key_type_id = info.btf_key_type_id;
 	map->btf_value_type_id = info.btf_value_type_id;
 	map->reused = true;
+	map->nr_hashes = info.nr_hashes;
 
 	return 0;
 
@@ -4473,7 +4487,8 @@ static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd)
 		map_info.key_size == map->def.key_size &&
 		map_info.value_size == map->def.value_size &&
 		map_info.max_entries == map->def.max_entries &&
-		map_info.map_flags == map->def.map_flags);
+		map_info.map_flags == map->def.map_flags &&
+		map_info.nr_hashes == map->nr_hashes);
 }
 
 static int
@@ -4611,6 +4626,8 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b
 		}
 		if (map->inner_map_fd >= 0)
 			create_attr.inner_map_fd = map->inner_map_fd;
+	} else if (bpf_map_type__is_bloom_filter(def->type)) {
+		create_attr.nr_hashes = map->nr_hashes;
 	}
 
 	if (obj->gen_loader) {
@@ -8560,6 +8577,19 @@ int bpf_map__set_numa_node(struct bpf_map *map, __u32 numa_node)
 	return 0;
 }
 
+__u32 bpf_map__nr_hashes(const struct bpf_map *map)
+{
+	return map->nr_hashes;
+}
+
+int bpf_map__set_nr_hashes(struct bpf_map *map, __u32 nr_hashes)
+{
+	if (map->fd >= 0)
+		return libbpf_err(-EBUSY);
+	map->nr_hashes = nr_hashes;
+	return 0;
+}
+
 __u32 bpf_map__key_size(const struct bpf_map *map)
 {
 	return map->def.key_size;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index f177d897c5f7..497b84772be8 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -538,6 +538,9 @@ LIBBPF_API __u32 bpf_map__btf_value_type_id(const struct bpf_map *map);
 /* get/set map if_index */
 LIBBPF_API __u32 bpf_map__ifindex(const struct bpf_map *map);
 LIBBPF_API int bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
+/* get/set nr_hashes. used for bloom filter maps */
+LIBBPF_API __u32 bpf_map__nr_hashes(const struct bpf_map *map);
+LIBBPF_API int bpf_map__set_nr_hashes(struct bpf_map *map, __u32 nr_hashes);
 
 typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void *);
 LIBBPF_API int bpf_map__set_priv(struct bpf_map *map, void *priv,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index bbc53bb25f68..372c2478274f 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -385,4 +385,6 @@ LIBBPF_0.5.0 {
 		btf__load_vmlinux_btf;
 		btf_dump__dump_type_data;
 		libbpf_set_strict_mode;
+		bpf_map__nr_hashes;
+		bpf_map__set_nr_hashes;
 } LIBBPF_0.4.0;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 533b0211f40a..501ae042980d 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -171,8 +171,9 @@ enum map_def_parts {
 	MAP_DEF_NUMA_NODE	= 0x080,
 	MAP_DEF_PINNING		= 0x100,
 	MAP_DEF_INNER_MAP	= 0x200,
+	MAP_DEF_NR_HASHES	= 0x400,
 
-	MAP_DEF_ALL		= 0x3ff, /* combination of all above */
+	MAP_DEF_ALL		= 0x7ff, /* combination of all above */
 };
 
 struct btf_map_def {
@@ -186,6 +187,7 @@ struct btf_map_def {
 	__u32 map_flags;
 	__u32 numa_node;
 	__u32 pinning;
+	__u32 nr_hashes; /* used for bloom filter maps */
 };
 
 int parse_btf_map_def(const char *map_name, struct btf *btf,
-- 
2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 3/5] selftests/bpf: Add bloom filter map test cases
  2021-08-31 22:50 [PATCH bpf-next 0/5] Implement bloom filter map Joanne Koong
  2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
  2021-08-31 22:50 ` [PATCH bpf-next 2/5] libbpf: Allow the number of hashes in bloom filter maps to be configurable Joanne Koong
@ 2021-08-31 22:50 ` Joanne Koong
  2021-09-01  2:55   ` Alexei Starovoitov
  2021-08-31 22:50 ` [PATCH bpf-next 4/5] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
  2021-08-31 22:50 ` [PATCH bpf-next 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong
  4 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2021-08-31 22:50 UTC (permalink / raw)
  To: bpf; +Cc: Joanne Koong

This patch adds test cases for bpf bloom filter maps. They
include tests for invalid operations by the userspace, checks
against false-negatives, and a bpf program that queries the
bloom filter map for values added by the userspace.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 .../bpf/prog_tests/bloom_filter_map.c         | 123 ++++++++++++++++++
 .../selftests/bpf/progs/bloom_filter_map.c    |  49 +++++++
 2 files changed, 172 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bloom_filter_map.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c b/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
new file mode 100644
index 000000000000..6b41a8cfec6c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bloom_filter_map.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <sys/syscall.h>
+#include <test_progs.h>
+#include "bloom_filter_map.skel.h"
+
+static void test_bloom_filter_map_fail(void)
+{
+	struct bpf_create_map_attr xattr = {
+		.name = "bloom_filter_map",
+		.map_type = BPF_MAP_TYPE_BLOOM_FILTER,
+		.max_entries = 100,
+		.value_size = sizeof(__u32),
+		.nr_hashes = 3,
+	};
+	__u32 value;
+	int fd, err;
+
+	/* Invalid key size */
+	xattr.key_size = 4;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid key size"))
+		close(fd);
+	xattr.key_size = 0;
+
+	/* Invalid value size */
+	xattr.value_size = 0;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid value size"))
+		close(fd);
+	xattr.value_size = sizeof(__u32);
+
+	/* Invalid max entries size */
+	xattr.max_entries = 0;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid max entries size"))
+		close(fd);
+	xattr.max_entries = 100;
+
+	/* Invalid number of hashes */
+	xattr.nr_hashes = 0;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid number of hashes"))
+		close(fd);
+	xattr.nr_hashes = 3;
+
+	/* Bloom filter maps do not support BPF_F_NO_PREALLOC */
+	xattr.map_flags = BPF_F_NO_PREALLOC;
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_LT(fd, 0, "bpf_create_map bloom filter invalid flags"))
+		close(fd);
+	xattr.map_flags = 0;
+
+	fd = bpf_create_map_xattr(&xattr);
+	if (!ASSERT_GE(fd, 0, "bpf_create_map bloom filter"))
+		return;
+
+	/* Test invalid flags */
+	err = bpf_map_update_elem(fd, NULL, &value, -1);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_EXIST);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_F_LOCK);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, BPF_NOEXIST);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	err = bpf_map_update_elem(fd, NULL, &value, 10000);
+	ASSERT_EQ(err, -EINVAL, "bpf_map_update_elem bloom filter invalid flags");
+
+	close(fd);
+}
+
+static void bloom_filter_map(struct bloom_filter_map *skel)
+{
+	const int map_size = bpf_map__max_entries(skel->maps.map_random_data);
+	int err, map_random_data_fd, map_bloom_filter_fd, i;
+	__u64 val;
+
+	map_random_data_fd = bpf_map__fd(skel->maps.map_random_data);
+	map_bloom_filter_fd = bpf_map__fd(skel->maps.map_bloom_filter);
+
+	/* Generate random values and add them to the maps */
+	for (i = 0; i < map_size; i++) {
+		val = rand();
+		err = bpf_map_update_elem(map_random_data_fd, &i, &val, BPF_ANY);
+		if (!ASSERT_OK(err, "Add random value to map_random_data"))
+			continue;
+
+		err = bpf_map_update_elem(map_bloom_filter_fd, NULL, &val, 0);
+		if (!ASSERT_OK(err, "Add random value to map_bloom_filter"))
+			return;
+	}
+
+	skel->links.prog_bloom_filter =
+		bpf_program__attach_trace(skel->progs.prog_bloom_filter);
+	if (!ASSERT_OK_PTR(skel->links.prog_bloom_filter, "link"))
+		return;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->error, 0, "error");
+}
+
+void test_bloom_filter_map(void)
+{
+	struct bloom_filter_map *skel;
+
+	test_bloom_filter_map_fail();
+
+	skel = bloom_filter_map__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "bloom_filter_map__open_and_load"))
+		goto cleanup;
+
+	bloom_filter_map(skel);
+
+cleanup:
+	bloom_filter_map__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bloom_filter_map.c b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
new file mode 100644
index 000000000000..2d9c43a30246
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-3.0
+/* Copyright (c) 2021 Facebook */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct bpf_map;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1000);
+	__type(key, __u32);
+	__type(value, __u64);
+} map_random_data SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_BLOOM_FILTER);
+	__uint(key_size, 0);
+	__uint(value_size, sizeof(__u64));
+	__uint(max_entries, 1000);
+	__uint(nr_hashes, 2);
+} map_bloom_filter SEC(".maps");
+
+int error = 0;
+
+static __u64
+check_elem(struct bpf_map *map, __u32 *key, __u64 *val,
+	   void *data)
+{
+	int err;
+
+	err = bpf_map_peek_elem(&map_bloom_filter, val);
+	if (err) {
+		error |= 1;
+		return 1; /* stop the iteration */
+	}
+
+	return 0;
+}
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter(void *ctx)
+{
+	bpf_for_each_map_elem(&map_random_data, check_elem, NULL, 0);
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 4/5] bpf/benchs: Add benchmark test for bloom filter maps
  2021-08-31 22:50 [PATCH bpf-next 0/5] Implement bloom filter map Joanne Koong
                   ` (2 preceding siblings ...)
  2021-08-31 22:50 ` [PATCH bpf-next 3/5] selftests/bpf: Add bloom filter map test cases Joanne Koong
@ 2021-08-31 22:50 ` Joanne Koong
  2021-09-02  3:35   ` Andrii Nakryiko
  2021-08-31 22:50 ` [PATCH bpf-next 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong
  4 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2021-08-31 22:50 UTC (permalink / raw)
  To: bpf; +Cc: Joanne Koong

This patch adds benchmark tests for the throughput and false positive
rate of bloom filter map lookups for a given number of entries and a
given number of hash functions.

These benchmarks show that as the number of hash functions increases,
the throughput and the false positive rate of the bloom filter map
decreases. From the benchmark data, the approximate average
false-positive rates are roughly as follows:

1 hash function = ~30%
2 hash functions = ~15%
3 hash functions = ~5%
4 hash functions = ~2.5%
5 hash functions = ~1%
6 hash functions = 0.5%
7 hash functions  = ~0.35%
8 hash functions = ~0.15%
9 hash functions = ~0.1%
10 hash functions = ~0%

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 tools/testing/selftests/bpf/Makefile          |   4 +-
 tools/testing/selftests/bpf/bench.c           |  35 ++
 tools/testing/selftests/bpf/bench.h           |   3 +
 .../bpf/benchs/bench_bloom_filter_map.c       | 344 ++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
 .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
 .../selftests/bpf/benchs/run_common.sh        |  48 +++
 .../selftests/bpf/progs/bloom_filter_map.c    |  74 ++++
 8 files changed, 537 insertions(+), 29 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
 create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 866531c08e4f..3576fdff117c 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -519,13 +519,15 @@ $(OUTPUT)/bench_rename.o: $(OUTPUT)/test_overhead.skel.h
 $(OUTPUT)/bench_trigger.o: $(OUTPUT)/trigger_bench.skel.h
 $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
 			    $(OUTPUT)/perfbuf_bench.skel.h
+$(OUTPUT)/bench_bloom_filter_map.o: $(OUTPUT)/bloom_filter_map.skel.h
 $(OUTPUT)/bench.o: bench.h testing_helpers.h
 $(OUTPUT)/bench: LDLIBS += -lm
 $(OUTPUT)/bench: $(OUTPUT)/bench.o $(OUTPUT)/testing_helpers.o \
 		 $(OUTPUT)/bench_count.o \
 		 $(OUTPUT)/bench_rename.o \
 		 $(OUTPUT)/bench_trigger.o \
-		 $(OUTPUT)/bench_ringbufs.o
+		 $(OUTPUT)/bench_ringbufs.o \
+		 $(OUTPUT)/bench_bloom_filter_map.o
 	$(call msg,BINARY,,$@)
 	$(Q)$(CC) $(LDFLAGS) -o $@ $(filter %.a %.o,$^) $(LDLIBS)
 
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 6ea15b93a2f8..0bcbdb4405a3 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -51,6 +51,35 @@ void setup_libbpf()
 		fprintf(stderr, "failed to increase RLIMIT_MEMLOCK: %d", err);
 }
 
+void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns)
+{
+	long total = res->false_hits  + res->hits + res->drops;
+
+	printf("Iter %3d (%7.3lfus): ",
+	       iter, (delta_ns - 1000000000) / 1000.0);
+
+	printf("%ld false hits of %ld total operations. Percentage = %2.2f %%\n",
+	       res->false_hits, total, ((float)res->false_hits / total) * 100);
+}
+
+void false_hits_report_final(struct bench_res res[], int res_cnt)
+{
+	long total_hits = 0, total_drops = 0, total_false_hits = 0, total_ops = 0;
+	int i;
+
+	for (i = 0; i < res_cnt; i++) {
+		total_hits += res[i].hits;
+		total_false_hits += res[i].false_hits;
+		total_drops += res[i].drops;
+	}
+	total_ops = total_hits + total_false_hits + total_drops;
+
+	printf("Summary: %ld false hits of %ld total operations. ",
+	       total_false_hits, total_ops);
+	printf("Percentage =  %2.2f %%\n",
+	       ((float)total_false_hits / total_ops) * 100);
+}
+
 void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns)
 {
 	double hits_per_sec, drops_per_sec;
@@ -132,9 +161,11 @@ static const struct argp_option opts[] = {
 };
 
 extern struct argp bench_ringbufs_argp;
+extern struct argp bench_bloom_filter_map_argp;
 
 static const struct argp_child bench_parsers[] = {
 	{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
+	{ &bench_bloom_filter_map_argp, 0, "Bloom filter map benchmark", 0 },
 	{},
 };
 
@@ -323,6 +354,8 @@ extern const struct bench bench_rb_libbpf;
 extern const struct bench bench_rb_custom;
 extern const struct bench bench_pb_libbpf;
 extern const struct bench bench_pb_custom;
+extern const struct bench bench_bloom_filter_map;
+extern const struct bench bench_bloom_filter_false_positive;
 
 static const struct bench *benchs[] = {
 	&bench_count_global,
@@ -344,6 +377,8 @@ static const struct bench *benchs[] = {
 	&bench_rb_custom,
 	&bench_pb_libbpf,
 	&bench_pb_custom,
+	&bench_bloom_filter_map,
+	&bench_bloom_filter_false_positive,
 };
 
 static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/bench.h b/tools/testing/selftests/bpf/bench.h
index c1f48a473b02..624c6b11501f 100644
--- a/tools/testing/selftests/bpf/bench.h
+++ b/tools/testing/selftests/bpf/bench.h
@@ -33,6 +33,7 @@ struct env {
 struct bench_res {
 	long hits;
 	long drops;
+	long false_hits;
 };
 
 struct bench {
@@ -56,6 +57,8 @@ extern const struct bench *bench;
 void setup_libbpf();
 void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns);
 void hits_drops_report_final(struct bench_res res[], int res_cnt);
+void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns);
+void false_hits_report_final(struct bench_res res[], int res_cnt);
 
 static inline __u64 get_time_ns() {
 	struct timespec t;
diff --git a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
new file mode 100644
index 000000000000..dadca2a6e900
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
@@ -0,0 +1,344 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include <argp.h>
+#include <pthread.h>
+#include "bench.h"
+#include "bloom_filter_map.skel.h"
+#include "bpf_util.h"
+
+static struct ctx {
+	struct bloom_filter_map *skel;
+	pthread_mutex_t map_done_mtx;
+	pthread_cond_t map_done;
+	bool map_prepare_err;
+	__u32 next_map_idx;
+} ctx = {
+	.map_done_mtx = PTHREAD_MUTEX_INITIALIZER,
+	.map_done = PTHREAD_COND_INITIALIZER,
+};
+
+static struct {
+	__u64 nr_entries;
+	__u64 nr_hashes;
+} args = {
+	.nr_entries = 1000,
+	.nr_hashes = 1,
+};
+
+enum {
+	ARG_NR_ENTRIES = 3000,
+	ARG_NR_HASHES = 3001,
+};
+
+static const struct argp_option opts[] = {
+	{ "nr_entries", ARG_NR_ENTRIES, "NR_ENTRIES", 0,
+		"Set number of entries in the bloom filter map"},
+	{ "nr_hashes", ARG_NR_HASHES, "NR_HASHES", 0,
+		"Set number of hashes in the bloom filter map"},
+	{},
+};
+
+static error_t parse_arg(int key, char *arg, struct argp_state *state)
+{
+	switch (key) {
+	case ARG_NR_ENTRIES:
+		args.nr_entries = strtol(arg, NULL, 10);
+		if (args.nr_entries == 0) {
+			fprintf(stderr, "Invalid nr_entries count.");
+			argp_usage(state);
+		}
+		break;
+	case ARG_NR_HASHES:
+		args.nr_hashes = strtol(arg, NULL, 10);
+		if (args.nr_hashes == 0) {
+			fprintf(stderr, "Invalid nr_hashes count.");
+			argp_usage(state);
+		}
+		break;
+	default:
+		return ARGP_ERR_UNKNOWN;
+	}
+
+	return 0;
+}
+
+/* exported into benchmark runner */
+const struct argp bench_bloom_filter_map_argp = {
+	.options = opts,
+	.parser = parse_arg,
+};
+
+static void validate(void)
+{
+	if (env.consumer_cnt != 1) {
+		fprintf(stderr, "bloom filter map benchmark doesn't support multi-consumer!\n");
+		exit(1);
+	}
+}
+
+static inline void trigger_bpf_program(void)
+{
+	syscall(__NR_getpgid);
+}
+
+static void *producer(void *input)
+{
+	while (true)
+		trigger_bpf_program();
+
+	return NULL;
+}
+
+static void *map_prepare_thread(void *arg)
+{
+	int err, random_data_fd, bloom_filter_fd, hashmap_fd;
+	__u64 i, val;
+
+	bloom_filter_fd = bpf_map__fd(ctx.skel->maps.map_bloom_filter);
+	random_data_fd = bpf_map__fd(ctx.skel->maps.map_random_data);
+	hashmap_fd = bpf_map__fd(ctx.skel->maps.hashmap);
+
+	while (true) {
+		i = __atomic_add_fetch(&ctx.next_map_idx, 1, __ATOMIC_RELAXED);
+		if (i > args.nr_entries)
+			break;
+again:
+		err = syscall(__NR_getrandom, &val, sizeof(val), 0);
+		if (err != sizeof(val)) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr, "failed to get random value\n");
+			break;
+		}
+		err = bpf_map_update_elem(hashmap_fd, &val, &val, BPF_NOEXIST);
+		if (err) {
+			if (err != -EEXIST) {
+				ctx.map_prepare_err = true;
+				fprintf(stderr, "failed to add elem to hashmap: %d\n", -errno);
+				break;
+			}
+			goto again;
+		}
+
+		i--;
+		err = bpf_map_update_elem(random_data_fd, &i, &val, 0);
+		if (err) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr, "failed to add elem to array: %d\n", -errno);
+			break;
+		}
+
+		err = bpf_map_update_elem(bloom_filter_fd, NULL, &val, 0);
+		if (err) {
+			ctx.map_prepare_err = true;
+			fprintf(stderr, "failed to add elem to bloom_filter: %d\n", -errno);
+			break;
+		}
+	}
+
+	pthread_mutex_lock(&ctx.map_done_mtx);
+	pthread_cond_signal(&ctx.map_done);
+	pthread_mutex_unlock(&ctx.map_done_mtx);
+
+	return NULL;
+}
+
+static void populate_maps(void)
+{
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	pthread_t map_thread;
+	int i, err;
+
+	for (i = 0; i < nr_cpus; i++) {
+		err = pthread_create(&map_thread, NULL, map_prepare_thread,
+				     NULL);
+		if (err) {
+			fprintf(stderr, "failed to create pthread: %d\n", -errno);
+			exit(1);
+		}
+	}
+
+	pthread_mutex_lock(&ctx.map_done_mtx);
+	pthread_cond_wait(&ctx.map_done, &ctx.map_done_mtx);
+	pthread_mutex_unlock(&ctx.map_done_mtx);
+
+	if (ctx.map_prepare_err)
+		exit(1);
+}
+
+static struct bloom_filter_map *setup_skeleton(void)
+{
+	struct bloom_filter_map *skel;
+	int err;
+
+	setup_libbpf();
+
+	skel = bloom_filter_map__open();
+	if (!skel) {
+		fprintf(stderr, "failed to open skeleton\n");
+		exit(1);
+	}
+
+	err = bpf_map__resize(skel->maps.map_random_data, args.nr_entries);
+	if (err) {
+		fprintf(stderr, "failed to resize map_random_data\n");
+		exit(1);
+	}
+
+	err = bpf_map__resize(skel->maps.hashmap, args.nr_entries);
+	if (err) {
+		fprintf(stderr, "failed to resize hashmap\n");
+		exit(1);
+	}
+
+	err = bpf_map__resize(skel->maps.map_bloom_filter, args.nr_entries);
+	if (err) {
+		fprintf(stderr, "failed to resize bloom filter\n");
+		exit(1);
+	}
+
+	err = bpf_map__set_nr_hashes(skel->maps.map_bloom_filter, args.nr_hashes);
+	if (err) {
+		fprintf(stderr, "failed to set n hashes\n");
+		exit(1);
+	}
+
+	if (bloom_filter_map__load(skel)) {
+		fprintf(stderr, "failed to load skeleton\n");
+		exit(1);
+	}
+
+	return skel;
+}
+
+static void bloom_filter_map_setup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton();
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
+static void hashmap_lookup_setup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton();
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_hashmap_lookup);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
+static void measure(struct bench_res *res)
+{
+	long total_hits = 0, total_drops = 0, total_false_hits = 0;
+	unsigned int nr_cpus = bpf_num_possible_cpus();
+	BPF_DECLARE_PERCPU(__u64, zeroed_values);
+	BPF_DECLARE_PERCPU(__u64, false_hits);
+	BPF_DECLARE_PERCPU(__u64, drops);
+	BPF_DECLARE_PERCPU(__u64, hits);
+	int err, i, percpu_array_fd;
+	__u32 key;
+
+	if (ctx.skel->bss->error != 0) {
+		fprintf(stderr, "error (%d) when searching the bloom filter\n",
+			ctx.skel->bss->error);
+		exit(1);
+	}
+
+	key = ctx.skel->rodata->hit_key;
+	percpu_array_fd = bpf_map__fd(ctx.skel->maps.percpu_array);
+	err = bpf_map_lookup_elem(percpu_array_fd, &key, hits);
+	if (err) {
+		fprintf(stderr, "lookup in the percpu array  for 'hits' failed: %d\n",
+			-errno);
+		exit(1);
+	}
+
+	key = ctx.skel->rodata->drop_key;
+	err = bpf_map_lookup_elem(percpu_array_fd, &key, drops);
+	if (err) {
+		fprintf(stderr, "lookup in the percpu array for 'drops' failed: %d\n",
+			-errno);
+		exit(1);
+	}
+
+	key = ctx.skel->rodata->false_hit_key;
+	err = bpf_map_lookup_elem(percpu_array_fd, &key, false_hits);
+	if (err) {
+		fprintf(stderr, "lookup in the percpu array for 'false hits' failed: %d\n",
+			-errno);
+		exit(1);
+	}
+
+	for (i = 0; i < nr_cpus; i++) {
+		total_hits += bpf_percpu(hits, i);
+		total_drops += bpf_percpu(drops, i);
+		total_false_hits += bpf_percpu(false_hits, i);
+	}
+
+	res->hits = total_hits;
+	res->drops = total_drops;
+	res->false_hits = total_false_hits;
+
+	memset(zeroed_values, 0, sizeof(zeroed_values));
+
+	/* zero out the percpu array */
+	key = ctx.skel->rodata->hit_key;
+	err = bpf_map_update_elem(percpu_array_fd, &key, zeroed_values, BPF_ANY);
+	if (err) {
+		fprintf(stderr, "zeroing the percpu array failed: %d\n", -errno);
+		exit(1);
+	}
+	key = ctx.skel->rodata->drop_key;
+	err = bpf_map_update_elem(percpu_array_fd, &key, zeroed_values, BPF_ANY);
+	if (err) {
+		fprintf(stderr, "zeroing the percpu array failed: %d\n", -errno);
+		exit(1);
+	}
+	key = ctx.skel->rodata->false_hit_key;
+	err = bpf_map_update_elem(percpu_array_fd, &key, zeroed_values, BPF_ANY);
+	if (err) {
+		fprintf(stderr, "zeroing the percpu array failed: %d\n", -errno);
+		exit(1);
+	}
+}
+
+static void *consumer(void *input)
+{
+	return NULL;
+}
+
+const struct bench bench_bloom_filter_map = {
+	.name = "bloom-filter-map",
+	.validate = validate,
+	.setup = bloom_filter_map_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_bloom_filter_false_positive = {
+	.name = "bloom-filter-false-positive",
+	.validate = validate,
+	.setup = hashmap_lookup_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = false_hits_report_progress,
+	.report_final = false_hits_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
new file mode 100755
index 000000000000..8f2de6e39313
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source ./benchs/run_common.sh
+
+set -eufo pipefail
+
+header "Bloom filter map"
+for t in 1 4 8; do
+for h in {1..10}; do
+subtitle "# threads: $t, # hashes: $h"
+	for e in 10000 50000 75000 100000 250000 500000 750000 1000000 2500000 5000000; do
+		printf "%'d entries -\n" $e
+		printf "\t"
+		summarize "Total operations: " \
+			"$($RUN_BENCH -p $t --nr_hashes $h --nr_entries $e bloom-filter-map)"
+		printf "\t"
+		summarize_percentage "False positive rate: " \
+			"$($RUN_BENCH -p $t --nr_hashes $h --nr_entries $e bloom-filter-false-positive)"
+	done
+	printf "\n"
+done
+done
+
+header "Bloom filter map, multi-producer contention"
+for t in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do
+	summarize "$t threads - " "$($RUN_BENCH -p $t bloom-filter-map)"
+done
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
index af4aa04caba6..ada028aa9007 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_ringbufs.sh
@@ -1,34 +1,8 @@
 #!/bin/bash
 
-set -eufo pipefail
-
-RUN_BENCH="sudo ./bench -w3 -d10 -a"
-
-function hits()
-{
-	echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
-}
-
-function drops()
-{
-	echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
-}
+source ./benchs/run_common.sh
 
-function header()
-{
-	local len=${#1}
-
-	printf "\n%s\n" "$1"
-	for i in $(seq 1 $len); do printf '='; done
-	printf '\n'
-}
-
-function summarize()
-{
-	bench="$1"
-	summary=$(echo $2 | tail -n1)
-	printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
-}
+set -eufo pipefail
 
 header "Single-producer, parallel producer"
 for b in rb-libbpf rb-custom pb-libbpf pb-custom; do
diff --git a/tools/testing/selftests/bpf/benchs/run_common.sh b/tools/testing/selftests/bpf/benchs/run_common.sh
new file mode 100644
index 000000000000..670f23b037c4
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_common.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+RUN_BENCH="sudo ./bench -w3 -d10 -a"
+
+function header()
+{
+	local len=${#1}
+
+	printf "\n%s\n" "$1"
+	for i in $(seq 1 $len); do printf '='; done
+	printf '\n'
+}
+
+function subtitle()
+{
+	local len=${#1}
+	printf "\t%s\n" "$1"
+}
+
+function hits()
+{
+	echo "$*" | sed -E "s/.*hits\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
+function drops()
+{
+	echo "$*" | sed -E "s/.*drops\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
+function percentage()
+{
+	echo "$*" | sed -E "s/.*Percentage\s=\s+([0-9]+\.[0-9]+).*/\1/"
+}
+
+function summarize()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s (drops %s)\n" "$bench" "$(hits $summary)" "$(drops $summary)"
+}
+
+function summarize_percentage()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s%%\n" "$bench" "$(percentage $summary)"
+}
diff --git a/tools/testing/selftests/bpf/progs/bloom_filter_map.c b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
index 2d9c43a30246..1b139689219e 100644
--- a/tools/testing/selftests/bpf/progs/bloom_filter_map.c
+++ b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
@@ -1,7 +1,9 @@
 // SPDX-License-Identifier: GPL-3.0
 /* Copyright (c) 2021 Facebook */
 
+#include <errno.h>
 #include <linux/bpf.h>
+#include <stdbool.h>
 #include <bpf/bpf_helpers.h>
 
 char _license[] SEC("license") = "GPL";
@@ -23,8 +25,38 @@ struct {
 	__uint(nr_hashes, 2);
 } map_bloom_filter SEC(".maps");
 
+/* Tracks the number of hits, drops, and false hits */
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 3);
+	__type(key, __u32);
+	__type(value, __u64);
+} percpu_array SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1000);
+	__type(key, __u64);
+	__type(value, __u64);
+} hashmap SEC(".maps");
+
+const __u32 hit_key  = 0;
+const __u32 drop_key  = 1;
+const __u32 false_hit_key = 2;
+
+bool hashmap_use_bloom_filter = true;
+
 int error = 0;
 
+static __always_inline void log_result(__u32 key)
+{
+	__u64 *count;
+
+	count = bpf_map_lookup_elem(&percpu_array, &key);
+	if (count)
+		*count += 1;
+}
+
 static __u64
 check_elem(struct bpf_map *map, __u32 *key, __u64 *val,
 	   void *data)
@@ -37,6 +69,8 @@ check_elem(struct bpf_map *map, __u32 *key, __u64 *val,
 		return 1; /* stop the iteration */
 	}
 
+	log_result(hit_key);
+
 	return 0;
 }
 
@@ -47,3 +81,43 @@ int prog_bloom_filter(void *ctx)
 
 	return 0;
 }
+
+SEC("fentry/__x64_sys_getpgid")
+int prog_bloom_filter_hashmap_lookup(void *ctx)
+{
+	__u64 *result;
+	int i, err;
+
+	union {
+		__u64 data64;
+		__u32 data32[2];
+	} val;
+
+	for (i = 0; i < 512; i++) {
+		val.data32[0] = bpf_get_prandom_u32();
+		val.data32[1] = bpf_get_prandom_u32();
+
+		if (hashmap_use_bloom_filter) {
+			err = bpf_map_peek_elem(&map_bloom_filter, &val);
+			if (err) {
+				if (err != -ENOENT) {
+					error |= 2;
+					return 0;
+				}
+				log_result(drop_key);
+				continue;
+			}
+		}
+
+		result = bpf_map_lookup_elem(&hashmap, &val);
+		if (result) {
+			log_result(hit_key);
+		} else {
+			if (hashmap_use_bloom_filter)
+				log_result(false_hit_key);
+			log_result(drop_key);
+		}
+	}
+
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH bpf-next 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter
  2021-08-31 22:50 [PATCH bpf-next 0/5] Implement bloom filter map Joanne Koong
                   ` (3 preceding siblings ...)
  2021-08-31 22:50 ` [PATCH bpf-next 4/5] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
@ 2021-08-31 22:50 ` Joanne Koong
  4 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2021-08-31 22:50 UTC (permalink / raw)
  To: bpf; +Cc: Joanne Koong

This patch adds some benchmarks for comparing the performance of hashmap
lookups without the bloom filter vs. hashmap lookups with the bloom filter.

Checking the bloom filter first for whether the element exists should
overall enable a higher throughput for hashmap lookups, since if the
element does not exist in the bloom filter, we can avoid a costly lookup in
the hashmap.

The benchmark results (running on 8 threads on a machine with one numa
node, and taking the average of 3 runs) were as follows:
1 hash function -
	10k entries: 15% slower
	50k entries: 25% faster
	100k entries: 30% faster
	500k entres: 50% faster
	1 million entries: 55% faster
	5 million entries: 40% faster

2 hash functions -
	10k entries: 20% slower
	50k entries: 40% faster
	100k entries: 50% faster
	500k entres: 90% faster
	1 million entries: 115% faster
	5 million entries: 70% faster

3 hash functions -
	10k entries: 5% slower
	50k entries: 15% faster
	100k entries: 25% faster
	500k entres: 180% faster
	1 million entries: 200% faster
	5 million entries: 140% faster

4 hash functions -
	10k entries: 20% slower
	50k entries: 50% faster
	100k entries: 60% faster
	500k entres: 128% faster
	1 million entries: 150% faster
	5 million entries: 105% faster

5 hash functions -
	10k entries: 10% faster
	50k entries: 25% faster
	100k entries: 50% faster
	500k entres: 100% faster
	1 million entries: 120% faster
	5 million entries: 170% faster

6 hash functions -
	10k entries: 1.5% faster
	50k entries: 20% faster
	100k entries: 25% faster
	500k entres: 165% faster
	1 million entries: 170% faster
	5 million entries: 140% faster

7 hash functions -
	10k entries: 7% slower
	50k entries: 5% faster
	100k entries: 15% faster
	500k entres: 145% faster
	1 million entries: 155% faster
	5 million entries: 120% faster

8 hash functions -
	10k entries: 15% slower
	50k entries: 50% faster
	100k entries: 60% faster
	500k entres: 1305% faster
	1 million entries: 135% faster
	5 million entries: 105% faster

9 hash functions -
	10k entries: 25% slower
	50k entries: 35% faster
	100k entries: 60% faster
	500k entres: 105% faster
	1 million entries: 130% faster
	5 million entries: 90% faster

10 hash functions -
	10k entries: 10% faster
	50k entries: 30% faster
	100k entries: 45% faster
	500k entres: 90% faster
	1 million entries: 105% faster
	5 million entries: 140% faster

The benchmark comparisions indicate that on average, using 3 hash functions
generally is one of the most optimal choices.

Signed-off-by: Joanne Koong <joannekoong@fb.com>
---
 tools/testing/selftests/bpf/bench.c           | 22 ++++++++---
 .../bpf/benchs/bench_bloom_filter_map.c       | 39 +++++++++++++++++++
 .../bpf/benchs/run_bench_bloom_filter_map.sh  | 15 +++++++
 .../selftests/bpf/benchs/run_common.sh        | 12 ++++++
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 0bcbdb4405a3..7da1589a9fe0 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -92,20 +92,21 @@ void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns)
 	printf("Iter %3d (%7.3lfus): ",
 	       iter, (delta_ns - 1000000000) / 1000.0);
 
-	printf("hits %8.3lfM/s (%7.3lfM/prod), drops %8.3lfM/s\n",
-	       hits_per_sec, hits_per_prod, drops_per_sec);
+	printf("hits %8.3lfM/s (%7.3lfM/prod), drops %8.3lfM/s, total operations %8.3lfM/s\n",
+	       hits_per_sec, hits_per_prod, drops_per_sec, hits_per_sec + drops_per_sec);
 }
 
 void hits_drops_report_final(struct bench_res res[], int res_cnt)
 {
 	int i;
-	double hits_mean = 0.0, drops_mean = 0.0;
-	double hits_stddev = 0.0, drops_stddev = 0.0;
+	double hits_mean = 0.0, drops_mean = 0.0, total_ops_mean = 0.0;
+	double hits_stddev = 0.0, drops_stddev = 0.0, total_ops_stddev = 0.0;
 
 	for (i = 0; i < res_cnt; i++) {
 		hits_mean += res[i].hits / 1000000.0 / (0.0 + res_cnt);
 		drops_mean += res[i].drops / 1000000.0 / (0.0 + res_cnt);
 	}
+	total_ops_mean = hits_mean + drops_mean;
 
 	if (res_cnt > 1)  {
 		for (i = 0; i < res_cnt; i++) {
@@ -115,14 +116,21 @@ void hits_drops_report_final(struct bench_res res[], int res_cnt)
 			drops_stddev += (drops_mean - res[i].drops / 1000000.0) *
 					(drops_mean - res[i].drops / 1000000.0) /
 					(res_cnt - 1.0);
+			total_ops_stddev += (total_ops_mean -
+					(res[i].hits + res[i].drops) / 1000000.0) *
+					(total_ops_mean - (res[i].hits + res[i].drops) / 1000000.0)
+					/ (res_cnt - 1.0);
 		}
 		hits_stddev = sqrt(hits_stddev);
 		drops_stddev = sqrt(drops_stddev);
+		total_ops_stddev = sqrt(total_ops_stddev);
 	}
 	printf("Summary: hits %8.3lf \u00B1 %5.3lfM/s (%7.3lfM/prod), ",
 	       hits_mean, hits_stddev, hits_mean / env.producer_cnt);
-	printf("drops %8.3lf \u00B1 %5.3lfM/s\n",
+	printf("drops %8.3lf \u00B1 %5.3lfM/s, ",
 	       drops_mean, drops_stddev);
+	printf("total operations %8.3lf \u00B1 %5.3lfM/s\n",
+	       total_ops_mean, total_ops_stddev);
 }
 
 const char *argp_program_version = "benchmark";
@@ -356,6 +364,8 @@ extern const struct bench bench_pb_libbpf;
 extern const struct bench bench_pb_custom;
 extern const struct bench bench_bloom_filter_map;
 extern const struct bench bench_bloom_filter_false_positive;
+extern const struct bench bench_hashmap_without_bloom_filter;
+extern const struct bench bench_hashmap_with_bloom_filter;
 
 static const struct bench *benchs[] = {
 	&bench_count_global,
@@ -379,6 +389,8 @@ static const struct bench *benchs[] = {
 	&bench_pb_custom,
 	&bench_bloom_filter_map,
 	&bench_bloom_filter_false_positive,
+	&bench_hashmap_without_bloom_filter,
+	&bench_hashmap_with_bloom_filter,
 };
 
 static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
index dadca2a6e900..39b64116dfc3 100644
--- a/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
+++ b/tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
@@ -241,6 +241,23 @@ static void hashmap_lookup_setup(void)
 	}
 }
 
+static void hashmap_no_bloom_filter_setup(void)
+{
+	struct bpf_link *link;
+
+	ctx.skel = setup_skeleton();
+
+	ctx.skel->data->hashmap_use_bloom_filter = false;
+
+	populate_maps();
+
+	link = bpf_program__attach(ctx.skel->progs.prog_bloom_filter_hashmap_lookup);
+	if (!link) {
+		fprintf(stderr, "failed to attach program!\n");
+		exit(1);
+	}
+}
+
 static void measure(struct bench_res *res)
 {
 	long total_hits = 0, total_drops = 0, total_false_hits = 0;
@@ -342,3 +359,25 @@ const struct bench bench_bloom_filter_false_positive = {
 	.report_progress = false_hits_report_progress,
 	.report_final = false_hits_report_final,
 };
+
+const struct bench bench_hashmap_without_bloom_filter = {
+	.name = "hashmap-without-bloom-filter",
+	.validate = validate,
+	.setup = hashmap_no_bloom_filter_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
+const struct bench bench_hashmap_with_bloom_filter = {
+	.name = "hashmap-with-bloom-filter",
+	.validate = validate,
+	.setup = hashmap_lookup_setup,
+	.producer_thread = producer,
+	.consumer_thread = consumer,
+	.measure = measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
index 8f2de6e39313..53c14da00a3b 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
@@ -26,3 +26,18 @@ header "Bloom filter map, multi-producer contention"
 for t in 1 2 3 4 8 12 16 20 24 28 32 36 40 44 48 52; do
 	summarize "$t threads - " "$($RUN_BENCH -p $t bloom-filter-map)"
 done
+
+header "Hashmap without bloom filter vs. hashmap with bloom filter (throughput, 8 threads)"
+for h in {1..10}; do
+subtitle "# hashes: $h"
+	for e in 10000 50000 75000 100000 250000 500000 750000 1000000 2500000 5000000; do
+		printf "%'d entries -\n" $e
+		printf "\t"
+		summarize_total "Hashmap without bloom filter: " \
+			"$($RUN_BENCH --nr_hashes $h --nr_entries $e -p 8 hashmap-without-bloom-filter)"
+		printf "\t"
+		summarize_total "Hashmap with bloom filter: " \
+			"$($RUN_BENCH --nr_hashes $h --nr_entries $e -p 8 hashmap-with-bloom-filter)"
+	done
+	printf "\n"
+done
diff --git a/tools/testing/selftests/bpf/benchs/run_common.sh b/tools/testing/selftests/bpf/benchs/run_common.sh
index 670f23b037c4..9a16be78b180 100644
--- a/tools/testing/selftests/bpf/benchs/run_common.sh
+++ b/tools/testing/selftests/bpf/benchs/run_common.sh
@@ -33,6 +33,11 @@ function percentage()
 	echo "$*" | sed -E "s/.*Percentage\s=\s+([0-9]+\.[0-9]+).*/\1/"
 }
 
+function total()
+{
+	echo "$*" | sed -E "s/.*total operations\s+([0-9]+\.[0-9]+ ± [0-9]+\.[0-9]+M\/s).*/\1/"
+}
+
 function summarize()
 {
 	bench="$1"
@@ -46,3 +51,10 @@ function summarize_percentage()
 	summary=$(echo $2 | tail -n1)
 	printf "%-20s %s%%\n" "$bench" "$(percentage $summary)"
 }
+
+function summarize_total()
+{
+	bench="$1"
+	summary=$(echo $2 | tail -n1)
+	printf "%-20s %s\n" "$bench" "$(total $summary)"
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
@ 2021-09-01  2:55   ` Alexei Starovoitov
  2021-09-02 19:11     ` Joanne Koong
  2021-09-02  1:44   ` Andrii Nakryiko
  2021-09-02  3:16   ` John Fastabend
  2 siblings, 1 reply; 23+ messages in thread
From: Alexei Starovoitov @ 2021-09-01  2:55 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf

On Tue, Aug 31, 2021 at 03:50:01PM -0700, Joanne Koong wrote:
> +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
> +{
> +	struct bpf_bloom_filter *bloom_filter =
> +		container_of(map, struct bpf_bloom_filter, map);
> +	u32 i, hash;
> +
> +	for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
> +		hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
> +			bloom_filter->bit_array_mask;
> +		if (!test_bit(hash, bloom_filter->bit_array))
> +			return -ENOENT;
> +	}

I'm curious what bloom filter theory says about n-hashes > 1
concurrent access with updates in terms of false negative?
Two concurrent updates race is protected by spin_lock,
but what about peek and update?
The update might set one bit, but not the other.
That shouldn't trigger false negative lookup, right?

Is bloom filter supported as inner map?
Hash and lru maps are often used as inner maps.
The lookups from them would be pre-filtered by bloom filter
map that would have to be (in some cases) inner map.
I suspect one bloom filter for all inner maps might be
reasonable workaround in some cases too.
The delete is not supported in bloom filter, of course.
Would be good to mention it in the commit log.
Since there is no delete the users would likely need
to replace the whole bloom filter. So map-in-map would
become necessary.
Do you think 'clear-all' operation might be useful for bloom filter?
It feels that if map-in-map is supported then clear-all is probably
not that useful, since atomic replacement and delete of the map
would work better. 'clear-all' will have issues with
lookup, since it cannot be done in parallel.
Would be good to document all these ideas and restrictions.

Could you collect 'perf annotate' data for the above performance
critical loop?
I wonder whether using jhash2 and forcing u32 value size could speed it up.
Probably not, but would be good to check, since restricting value_size
later would be problematic due to backward compatibility.

The recommended nr_hashes=3 was computed with value_size=8, right?
I wonder whether nr_hashes would be different for value_size=16 and =4
which are ipv6/ipv4 addresses and value_size = 40
an approximation of networking n-tuple.

> +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
> +{
> +	int numa_node = bpf_map_attr_numa_node(attr);
> +	u32 nr_bits, bit_array_bytes, bit_array_mask;
> +	struct bpf_bloom_filter *bloom_filter;
> +
> +	if (!bpf_capable())
> +		return ERR_PTR(-EPERM);
> +
> +	if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
> +	    attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||

Would it make sense to default to nr_hashes=3 if zero is passed?
This way the libbpf changes for nr_hashes will become 'optional'.
Most users wouldn't have to specify it explicitly.

Overall looks great!
Performance numbers are impressive.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 3/5] selftests/bpf: Add bloom filter map test cases
  2021-08-31 22:50 ` [PATCH bpf-next 3/5] selftests/bpf: Add bloom filter map test cases Joanne Koong
@ 2021-09-01  2:55   ` Alexei Starovoitov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexei Starovoitov @ 2021-09-01  2:55 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf

On Tue, Aug 31, 2021 at 03:50:03PM -0700, Joanne Koong wrote:
> +++ b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
> @@ -0,0 +1,49 @@
> +// SPDX-License-Identifier: GPL-3.0

typo :)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
  2021-09-01  2:55   ` Alexei Starovoitov
@ 2021-09-02  1:44   ` Andrii Nakryiko
  2021-09-02  5:11     ` John Fastabend
  2021-09-02  3:16   ` John Fastabend
  2 siblings, 1 reply; 23+ messages in thread
From: Andrii Nakryiko @ 2021-09-02  1:44 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf

On Tue, Aug 31, 2021 at 3:51 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> Bloom filters are a space-efficient probabilistic data structure
> used to quickly test whether an element exists in a set.
> In a bloom filter, false positives are possible whereas false
> negatives are not.
>
> This patch adds a bloom filter map for bpf programs.
> The bloom filter map supports peek (determining whether an element
> is present in the map) and push (adding an element to the map)
> operations.These operations are exposed to userspace applications
> through the already existing syscalls in the following way:
>
> BPF_MAP_LOOKUP_ELEM -> peek
> BPF_MAP_UPDATE_ELEM -> push
>
> The bloom filter map does not have keys, only values. In light of
> this, the bloom filter map's API matches that of queue stack maps:
> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> APIs to query or add an element to the bloom filter map. When the
> bloom filter map is created, it must be created with a key_size of 0.
>
> For updates, the user will pass in the element to add to the map
> as the value, wih a NULL key. For lookups, the user will pass in the
> element to query in the map as the value. In the verifier layer, this
> requires us to modify the argument type of a bloom filter's
> BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> the syscall layer, we need to copy over the user value so that in
> bpf_map_peek_elem, we know which specific value to query.
>
> The maximum number of entries in the bloom filter is not enforced; if
> the user wishes to insert more entries into the bloom filter than they
> specified as the max entries size of the bloom filter, that is permitted
> but the performance of their bloom filter will have a higher false
> positive rate.
>
> The number of hashes to use for the bloom filter is configurable from
> userspace. The benchmarks later in this patchset can help compare the
> performances of different number of hashes on different entry
> sizes. In general, using more hashes decreases the speed of a lookup,
> but increases the false positive rate of an element being detected in the
> bloom filter.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---

This looks nice and simple. I left a few comments below.

But one high-level point I wanted to discuss was that bloom filter
logic is actually simple enough to be implementable by pure BPF
program logic. The only problematic part is generic hashing of a piece
of memory. Regardless of implementing bloom filter as kernel-provided
BPF map or implementing it with custom BPF program logic, having BPF
helper for hashing a piece of memory seems extremely useful and very
generic. I can't recall if we ever discussed adding such helpers, but
maybe we should?

It would be a really interesting experiment to implement the same
logic in pure BPF logic and run it as another benchmark, along the
Bloom filter map. BPF has both spinlock and atomic operation, so we
can compare and contrast. We only miss hashing BPF helper.

Being able to do this in pure BPF code has a bunch of advantages.
Depending on specific application, users can decide to:
  - speed up the operation by ditching spinlock or atomic operation,
if the logic allows to lose some bit updates;
  - decide on optimal size, which might not be a power of 2, depending
on memory vs CPU trade of in any particular case;
  - it's also possible to implement a more general Counting Bloom
filter, all without modifying the kernel.

We could go further, and start implementing other simple data
structures relying on hashing, like HyperLogLog. And all with no
kernel modifications. Map-in-map is no issue as well, because there is
a choice of using either fixed global data arrays for maximum
performance, or using BPF_MAP_TYPE_ARRAY maps that can go inside
map-in-map.

Basically, regardless of having this map in the kernel or not, let's
have a "universal" hashing function as a BPF helper as well.

Thoughts?

>  include/linux/bpf.h            |   3 +-
>  include/linux/bpf_types.h      |   1 +
>  include/uapi/linux/bpf.h       |   3 +
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/bloom_filter.c      | 171 +++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c           |  20 +++-
>  kernel/bpf/verifier.c          |  19 +++-
>  tools/include/uapi/linux/bpf.h |   3 +
>  8 files changed, 214 insertions(+), 8 deletions(-)
>  create mode 100644 kernel/bpf/bloom_filter.c
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f4c16f19f83e..2abaa1052096 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -181,7 +181,8 @@ struct bpf_map {
>         u32 btf_vmlinux_value_type_id;
>         bool bypass_spec_v1;
>         bool frozen; /* write-once; write-protected by freeze_mutex */
> -       /* 22 bytes hole */
> +       u32 nr_hashes; /* used for bloom filter maps */
> +       /* 18 bytes hole */
>
>         /* The 3rd and 4th cacheline with misc members to avoid false sharing
>          * particularly with refcounting.
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 9c81724e4b98..c4424ac2fa02 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
>  #endif
>  BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
> +BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
>
>  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
>  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 791f31dd0abe..c2acb0a510fe 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -906,6 +906,7 @@ enum bpf_map_type {
>         BPF_MAP_TYPE_RINGBUF,
>         BPF_MAP_TYPE_INODE_STORAGE,
>         BPF_MAP_TYPE_TASK_STORAGE,
> +       BPF_MAP_TYPE_BLOOM_FILTER,
>  };
>
>  /* Note that tracing related programs such as
> @@ -1274,6 +1275,7 @@ union bpf_attr {
>                                                    * struct stored as the
>                                                    * map value
>                                                    */
> +               __u32   nr_hashes;      /* used for configuring bloom filter maps */

This feels like a bit too one-off property that won't be ever reused
by any other type of map. Also consider that we should probably limit
nr_hashes to some pretty small sane value (<16? <64?) to prevent easy
DOS from inside BPF programs (e.g., set nr_hash to 2bln, each
operation is now extremely slow and CPU intensive). So with that,
maybe let's provide number of hashes as part of map_flags? And as
Alexei proposed, zero would mean some recommended value (2 or 3,
right?). This would also mean that libbpf won't need to know about
one-off map property in parsing BTF map definitions.

>         };
>
>         struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
> @@ -5594,6 +5596,7 @@ struct bpf_map_info {
>         __u32 btf_id;
>         __u32 btf_key_type_id;
>         __u32 btf_value_type_id;
> +       __u32 nr_hashes; /* used for bloom filter maps */
>  } __attribute__((aligned(8)));
>
>  struct bpf_btf_info {
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 7f33098ca63f..cf6ca339f3cd 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -7,7 +7,7 @@ endif
>  CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
>
>  obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
> -obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
> +obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
>  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
>  obj-${CONFIG_BPF_LSM}    += bpf_inode_storage.o
> diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
> new file mode 100644
> index 000000000000..3ae799ab3747
> --- /dev/null
> +++ b/kernel/bpf/bloom_filter.c
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 Facebook */
> +
> +#include <linux/bitmap.h>
> +#include <linux/bpf.h>
> +#include <linux/err.h>
> +#include <linux/jhash.h>
> +#include <linux/random.h>
> +#include <linux/spinlock.h>
> +
> +#define BLOOM_FILTER_CREATE_FLAG_MASK \
> +       (BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
> +
> +struct bpf_bloom_filter {
> +       struct bpf_map map;
> +       u32 bit_array_mask;
> +       u32 hash_seed;
> +       /* Used for synchronizing parallel writes to the bit array */
> +       spinlock_t spinlock;
> +       unsigned long bit_array[];
> +};
> +
> +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
> +{
> +       struct bpf_bloom_filter *bloom_filter =
> +               container_of(map, struct bpf_bloom_filter, map);
> +       u32 i, hash;
> +
> +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
> +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
> +                       bloom_filter->bit_array_mask;
> +               if (!test_bit(hash, bloom_filter->bit_array))
> +                       return -ENOENT;
> +       }
> +
> +       return 0;
> +}
> +
> +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
> +{
> +       int numa_node = bpf_map_attr_numa_node(attr);
> +       u32 nr_bits, bit_array_bytes, bit_array_mask;
> +       struct bpf_bloom_filter *bloom_filter;
> +
> +       if (!bpf_capable())
> +               return ERR_PTR(-EPERM);
> +
> +       if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
> +           attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
> +           !bpf_map_flags_access_ok(attr->map_flags))
> +               return ERR_PTR(-EINVAL);
> +
> +       /* For the bloom filter, the optimal bit array size that minimizes the
> +        * false positive probability is n * k / ln(2) where n is the number of
> +        * expected entries in the bloom filter and k is the number of hash
> +        * functions. We use 7 / 5 to approximate 1 / ln(2).
> +        *
> +        * We round this up to the nearest power of two to enable more efficient
> +        * hashing using bitmasks. The bitmask will be the bit array size - 1.
> +        *
> +        * If this overflows a u32, the bit array size will have 2^32 (4
> +        * GB) bits.
> +        */
> +       if (unlikely(check_mul_overflow(attr->max_entries, attr->nr_hashes, &nr_bits)) ||
> +           unlikely(check_mul_overflow(nr_bits / 5, (u32)7, &nr_bits)) ||
> +           unlikely(nr_bits > (1UL << 31))) {

nit: map_alloc is not performance-critical (because it's infrequent),
so unlikely() are probably unnecessary?

> +               /* The bit array size is 2^32 bits but to avoid overflowing the
> +                * u32, we use BITS_TO_BYTES(U32_MAX), which will round up to the
> +                * equivalent number of bytes
> +                */
> +               bit_array_bytes = BITS_TO_BYTES(U32_MAX);
> +               bit_array_mask = U32_MAX;
> +       } else {
> +               if (nr_bits <= BITS_PER_LONG)
> +                       nr_bits = BITS_PER_LONG;
> +               else
> +                       nr_bits = roundup_pow_of_two(nr_bits);
> +               bit_array_bytes = BITS_TO_BYTES(nr_bits);
> +               bit_array_mask = nr_bits - 1;
> +       }
> +
> +       bit_array_bytes = roundup(bit_array_bytes, sizeof(unsigned long));
> +       bloom_filter = bpf_map_area_alloc(sizeof(*bloom_filter) + bit_array_bytes,
> +                                         numa_node);
> +
> +       if (!bloom_filter)
> +               return ERR_PTR(-ENOMEM);
> +
> +       bpf_map_init_from_attr(&bloom_filter->map, attr);
> +       bloom_filter->map.nr_hashes = attr->nr_hashes;
> +
> +       bloom_filter->bit_array_mask = bit_array_mask;
> +       spin_lock_init(&bloom_filter->spinlock);
> +
> +       if (!(attr->map_flags & BPF_F_ZERO_SEED))
> +               bloom_filter->hash_seed = get_random_int();
> +
> +       return &bloom_filter->map;
> +}
> +
> +static void bloom_filter_map_free(struct bpf_map *map)
> +{
> +       struct bpf_bloom_filter *bloom_filter =
> +               container_of(map, struct bpf_bloom_filter, map);
> +
> +       bpf_map_area_free(bloom_filter);
> +}
> +
> +static int bloom_filter_map_push_elem(struct bpf_map *map, void *value,
> +                                     u64 flags)
> +{
> +       struct bpf_bloom_filter *bloom_filter =
> +               container_of(map, struct bpf_bloom_filter, map);
> +       unsigned long spinlock_flags;
> +       u32 i, hash;
> +
> +       if (flags != BPF_ANY)
> +               return -EINVAL;
> +
> +       spin_lock_irqsave(&bloom_filter->spinlock, spinlock_flags);
> +

If value_size is pretty big, hashing might take a noticeable amount of
CPU, during which we'll be keeping spinlock. With what I said above
about sane number of hashes, if we bound it to small reasonable number
(e.g., 16), we can have a local 16-element array with hashes
calculated before we take lock. That way spinlock will be held only
for few bit flips.

Also, I wonder if ditching spinlock in favor of atomic bit set
operation would improve performance in typical scenarios. Seems like
set_bit() is an atomic operation, so it should be easy to test. Do you
mind running benchmarks with spinlock and with set_bit()?

> +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
> +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
> +                       bloom_filter->bit_array_mask;
> +               bitmap_set(bloom_filter->bit_array, hash, 1);
> +       }
> +
> +       spin_unlock_irqrestore(&bloom_filter->spinlock, spinlock_flags);
> +
> +       return 0;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
  2021-09-01  2:55   ` Alexei Starovoitov
  2021-09-02  1:44   ` Andrii Nakryiko
@ 2021-09-02  3:16   ` John Fastabend
  2021-09-02  3:28     ` Andrii Nakryiko
  2 siblings, 1 reply; 23+ messages in thread
From: John Fastabend @ 2021-09-02  3:16 UTC (permalink / raw)
  To: Joanne Koong, bpf; +Cc: Joanne Koong

Joanne Koong wrote:
> Bloom filters are a space-efficient probabilistic data structure
> used to quickly test whether an element exists in a set.
> In a bloom filter, false positives are possible whereas false
> negatives are not.
> 
> This patch adds a bloom filter map for bpf programs.
> The bloom filter map supports peek (determining whether an element
> is present in the map) and push (adding an element to the map)
> operations.These operations are exposed to userspace applications
> through the already existing syscalls in the following way:
> 
> BPF_MAP_LOOKUP_ELEM -> peek
> BPF_MAP_UPDATE_ELEM -> push
> 
> The bloom filter map does not have keys, only values. In light of
> this, the bloom filter map's API matches that of queue stack maps:
> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> APIs to query or add an element to the bloom filter map. When the
> bloom filter map is created, it must be created with a key_size of 0.
> 
> For updates, the user will pass in the element to add to the map
> as the value, wih a NULL key. For lookups, the user will pass in the
> element to query in the map as the value. In the verifier layer, this
> requires us to modify the argument type of a bloom filter's
> BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> the syscall layer, we need to copy over the user value so that in
> bpf_map_peek_elem, we know which specific value to query.
> 
> The maximum number of entries in the bloom filter is not enforced; if
> the user wishes to insert more entries into the bloom filter than they
> specified as the max entries size of the bloom filter, that is permitted
> but the performance of their bloom filter will have a higher false
> positive rate.

hmm I'm wondering if this means the memory footprint can grow without
bounds? Typically maps have an upper bound on memory established at
alloc time.

In queue_stack_map_alloc() we have,

 queue_size = sizeof(*qs) + size * attr->value_size);
 bpf_map_area_alloc(queue_size, numa_node) 

In hashmap (not preallocated)  we have, alloc_htab_elem() that will
give us an upper bound.

Is there a practical value in allowing these to grow endlessly? And
should we be charging the value memory against something? In
bpf_map_kmalloc_node we set_active_memcg() for example.
 
I'll review code as well, but think above is worth some thought.

> 
> The number of hashes to use for the bloom filter is configurable from
> userspace. The benchmarks later in this patchset can help compare the
> performances of different number of hashes on different entry
> sizes. In general, using more hashes decreases the speed of a lookup,
> but increases the false positive rate of an element being detected in the
> bloom filter.
> 
> Signed-off-by: Joanne Koong <joannekoong@fb.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-02  3:16   ` John Fastabend
@ 2021-09-02  3:28     ` Andrii Nakryiko
  2021-09-02  4:40       ` John Fastabend
  0 siblings, 1 reply; 23+ messages in thread
From: Andrii Nakryiko @ 2021-09-02  3:28 UTC (permalink / raw)
  To: John Fastabend; +Cc: Joanne Koong, bpf

On Wed, Sep 1, 2021 at 8:18 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Joanne Koong wrote:
> > Bloom filters are a space-efficient probabilistic data structure
> > used to quickly test whether an element exists in a set.
> > In a bloom filter, false positives are possible whereas false
> > negatives are not.
> >
> > This patch adds a bloom filter map for bpf programs.
> > The bloom filter map supports peek (determining whether an element
> > is present in the map) and push (adding an element to the map)
> > operations.These operations are exposed to userspace applications
> > through the already existing syscalls in the following way:
> >
> > BPF_MAP_LOOKUP_ELEM -> peek
> > BPF_MAP_UPDATE_ELEM -> push
> >
> > The bloom filter map does not have keys, only values. In light of
> > this, the bloom filter map's API matches that of queue stack maps:
> > user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> > which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> > and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> > APIs to query or add an element to the bloom filter map. When the
> > bloom filter map is created, it must be created with a key_size of 0.
> >
> > For updates, the user will pass in the element to add to the map
> > as the value, wih a NULL key. For lookups, the user will pass in the
> > element to query in the map as the value. In the verifier layer, this
> > requires us to modify the argument type of a bloom filter's
> > BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> > the syscall layer, we need to copy over the user value so that in
> > bpf_map_peek_elem, we know which specific value to query.
> >
> > The maximum number of entries in the bloom filter is not enforced; if
> > the user wishes to insert more entries into the bloom filter than they
> > specified as the max entries size of the bloom filter, that is permitted
> > but the performance of their bloom filter will have a higher false
> > positive rate.
>
> hmm I'm wondering if this means the memory footprint can grow without
> bounds? Typically maps have an upper bound on memory established at
> alloc time.

It's a bit unfortunate wording, but no, the amount of used memory is
fixed. Bloom filter is a probabilistic data structure in which each
"value" has few designated bits, determined by hash functions on that
value. The number of bits is fixed, though. If all designated bits are
set to 1, then we declare "value" to be present in the Bloom filter.
If at least one is 0, then we definitely didn't see "value" yet
(that's what guarantees no false negatives; this also answers Alexei's
worry about possible false negative due to unsynchronized update and
lookup, it can't happen by the nature of the data structure design,
regardless of synchronization). We can, of course, have all such bits
set to 1 even if the actual value was never "added" into the Bloom
filter, just by the nature of hash collisions with other elements'
hash functions (that's where the false positive comes from). It might
be useful to just leave a link to Wikipedia for description of Bloom
filter data structure ([0]).

  [0] https://en.wikipedia.org/wiki/Bloom_filter

>
> In queue_stack_map_alloc() we have,
>
>  queue_size = sizeof(*qs) + size * attr->value_size);
>  bpf_map_area_alloc(queue_size, numa_node)
>
> In hashmap (not preallocated)  we have, alloc_htab_elem() that will
> give us an upper bound.
>
> Is there a practical value in allowing these to grow endlessly? And
> should we be charging the value memory against something? In
> bpf_map_kmalloc_node we set_active_memcg() for example.
>
> I'll review code as well, but think above is worth some thought.
>
> >
> > The number of hashes to use for the bloom filter is configurable from
> > userspace. The benchmarks later in this patchset can help compare the
> > performances of different number of hashes on different entry
> > sizes. In general, using more hashes decreases the speed of a lookup,
> > but increases the false positive rate of an element being detected in the
> > bloom filter.
> >
> > Signed-off-by: Joanne Koong <joannekoong@fb.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 2/5] libbpf: Allow the number of hashes in bloom filter maps to be configurable
  2021-08-31 22:50 ` [PATCH bpf-next 2/5] libbpf: Allow the number of hashes in bloom filter maps to be configurable Joanne Koong
@ 2021-09-02  3:30   ` Andrii Nakryiko
  0 siblings, 0 replies; 23+ messages in thread
From: Andrii Nakryiko @ 2021-09-02  3:30 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf

On Tue, Aug 31, 2021 at 3:51 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch adds the libbpf infrastructure that will allow the user to
> specify the number of hash functions to use for the bloom filter map.
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---
>  tools/lib/bpf/bpf.c             |  2 ++
>  tools/lib/bpf/bpf.h             |  1 +
>  tools/lib/bpf/libbpf.c          | 32 +++++++++++++++++++++++++++++++-
>  tools/lib/bpf/libbpf.h          |  3 +++
>  tools/lib/bpf/libbpf.map        |  2 ++
>  tools/lib/bpf/libbpf_internal.h |  4 +++-
>  6 files changed, 42 insertions(+), 2 deletions(-)
>

[...]

>  __u32 bpf_map__key_size(const struct bpf_map *map)
>  {
>         return map->def.key_size;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index f177d897c5f7..497b84772be8 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -538,6 +538,9 @@ LIBBPF_API __u32 bpf_map__btf_value_type_id(const struct bpf_map *map);
>  /* get/set map if_index */
>  LIBBPF_API __u32 bpf_map__ifindex(const struct bpf_map *map);
>  LIBBPF_API int bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
> +/* get/set nr_hashes. used for bloom filter maps */
> +LIBBPF_API __u32 bpf_map__nr_hashes(const struct bpf_map *map);
> +LIBBPF_API int bpf_map__set_nr_hashes(struct bpf_map *map, __u32 nr_hashes);
>
>  typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void *);
>  LIBBPF_API int bpf_map__set_priv(struct bpf_map *map, void *priv,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index bbc53bb25f68..372c2478274f 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -385,4 +385,6 @@ LIBBPF_0.5.0 {
>                 btf__load_vmlinux_btf;
>                 btf_dump__dump_type_data;
>                 libbpf_set_strict_mode;
> +               bpf_map__nr_hashes;
> +               bpf_map__set_nr_hashes;

I'd really like to avoid this very niche and specific set of APIs. As
I mentioned on first patch, I think nr_hash can be easily passed
through map_flags property.

>  } LIBBPF_0.4.0;
> diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
> index 533b0211f40a..501ae042980d 100644
> --- a/tools/lib/bpf/libbpf_internal.h
> +++ b/tools/lib/bpf/libbpf_internal.h
> @@ -171,8 +171,9 @@ enum map_def_parts {
>         MAP_DEF_NUMA_NODE       = 0x080,
>         MAP_DEF_PINNING         = 0x100,
>         MAP_DEF_INNER_MAP       = 0x200,
> +       MAP_DEF_NR_HASHES       = 0x400,
>
> -       MAP_DEF_ALL             = 0x3ff, /* combination of all above */
> +       MAP_DEF_ALL             = 0x7ff, /* combination of all above */
>  };
>
>  struct btf_map_def {
> @@ -186,6 +187,7 @@ struct btf_map_def {
>         __u32 map_flags;
>         __u32 numa_node;
>         __u32 pinning;
> +       __u32 nr_hashes; /* used for bloom filter maps */

we can't extend btf_map_def, it's fixed indefinitely

>  };
>
>  int parse_btf_map_def(const char *map_name, struct btf *btf,
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 4/5] bpf/benchs: Add benchmark test for bloom filter maps
  2021-08-31 22:50 ` [PATCH bpf-next 4/5] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
@ 2021-09-02  3:35   ` Andrii Nakryiko
  0 siblings, 0 replies; 23+ messages in thread
From: Andrii Nakryiko @ 2021-09-02  3:35 UTC (permalink / raw)
  To: Joanne Koong; +Cc: bpf

On Tue, Aug 31, 2021 at 3:52 PM Joanne Koong <joannekoong@fb.com> wrote:
>
> This patch adds benchmark tests for the throughput and false positive
> rate of bloom filter map lookups for a given number of entries and a
> given number of hash functions.
>
> These benchmarks show that as the number of hash functions increases,
> the throughput and the false positive rate of the bloom filter map
> decreases. From the benchmark data, the approximate average
> false-positive rates are roughly as follows:
>
> 1 hash function = ~30%
> 2 hash functions = ~15%
> 3 hash functions = ~5%
> 4 hash functions = ~2.5%
> 5 hash functions = ~1%
> 6 hash functions = 0.5%
> 7 hash functions  = ~0.35%
> 8 hash functions = ~0.15%
> 9 hash functions = ~0.1%
> 10 hash functions = ~0%
>
> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> ---
>  tools/testing/selftests/bpf/Makefile          |   4 +-
>  tools/testing/selftests/bpf/bench.c           |  35 ++
>  tools/testing/selftests/bpf/bench.h           |   3 +
>  .../bpf/benchs/bench_bloom_filter_map.c       | 344 ++++++++++++++++++
>  .../bpf/benchs/run_bench_bloom_filter_map.sh  |  28 ++
>  .../bpf/benchs/run_bench_ringbufs.sh          |  30 +-
>  .../selftests/bpf/benchs/run_common.sh        |  48 +++
>  .../selftests/bpf/progs/bloom_filter_map.c    |  74 ++++
>  8 files changed, 537 insertions(+), 29 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/benchs/bench_bloom_filter_map.c
>  create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bloom_filter_map.sh
>  create mode 100644 tools/testing/selftests/bpf/benchs/run_common.sh
>

[...]

> diff --git a/tools/testing/selftests/bpf/progs/bloom_filter_map.c b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
> index 2d9c43a30246..1b139689219e 100644
> --- a/tools/testing/selftests/bpf/progs/bloom_filter_map.c
> +++ b/tools/testing/selftests/bpf/progs/bloom_filter_map.c
> @@ -1,7 +1,9 @@
>  // SPDX-License-Identifier: GPL-3.0
>  /* Copyright (c) 2021 Facebook */
>
> +#include <errno.h>
>  #include <linux/bpf.h>
> +#include <stdbool.h>
>  #include <bpf/bpf_helpers.h>
>
>  char _license[] SEC("license") = "GPL";
> @@ -23,8 +25,38 @@ struct {
>         __uint(nr_hashes, 2);
>  } map_bloom_filter SEC(".maps");
>
> +/* Tracks the number of hits, drops, and false hits */
> +struct {
> +       __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> +       __uint(max_entries, 3);
> +       __type(key, __u32);
> +       __type(value, __u64);
> +} percpu_array SEC(".maps");
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_HASH);
> +       __uint(max_entries, 1000);
> +       __type(key, __u64);
> +       __type(value, __u64);
> +} hashmap SEC(".maps");
> +
> +const __u32 hit_key  = 0;
> +const __u32 drop_key  = 1;
> +const __u32 false_hit_key = 2;
> +
> +bool hashmap_use_bloom_filter = true;
> +
>  int error = 0;
>
> +static __always_inline void log_result(__u32 key)
> +{
> +       __u64 *count;
> +
> +       count = bpf_map_lookup_elem(&percpu_array, &key);
> +       if (count)
> +               *count += 1;

it will be actually more performant to have a global array with some
fixed number of elements (e.g., 256, to support up to 256 CPUs), one
for each CPU, instead of BPF_MAP_TYPE_PERCPU_ARRAY. Don't know how
much impact that has on benchmark, but doing one extra per-cpu map
lookup for each Bloom filter lookup might be a significant portion of
spent CPU.

> +}
> +
>  static __u64
>  check_elem(struct bpf_map *map, __u32 *key, __u64 *val,
>            void *data)
> @@ -37,6 +69,8 @@ check_elem(struct bpf_map *map, __u32 *key, __u64 *val,
>                 return 1; /* stop the iteration */
>         }
>
> +       log_result(hit_key);
> +
>         return 0;
>  }
>
> @@ -47,3 +81,43 @@ int prog_bloom_filter(void *ctx)
>
>         return 0;
>  }
> +
> +SEC("fentry/__x64_sys_getpgid")
> +int prog_bloom_filter_hashmap_lookup(void *ctx)
> +{
> +       __u64 *result;
> +       int i, err;
> +
> +       union {
> +               __u64 data64;
> +               __u32 data32[2];
> +       } val;
> +
> +       for (i = 0; i < 512; i++) {
> +               val.data32[0] = bpf_get_prandom_u32();
> +               val.data32[1] = bpf_get_prandom_u32();
> +
> +               if (hashmap_use_bloom_filter) {
> +                       err = bpf_map_peek_elem(&map_bloom_filter, &val);
> +                       if (err) {
> +                               if (err != -ENOENT) {
> +                                       error |= 2;
> +                                       return 0;
> +                               }
> +                               log_result(drop_key);
> +                               continue;
> +                       }
> +               }
> +
> +               result = bpf_map_lookup_elem(&hashmap, &val);
> +               if (result) {
> +                       log_result(hit_key);
> +               } else {
> +                       if (hashmap_use_bloom_filter)
> +                               log_result(false_hit_key);
> +                       log_result(drop_key);
> +               }
> +       }
> +
> +       return 0;
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-02  3:28     ` Andrii Nakryiko
@ 2021-09-02  4:40       ` John Fastabend
  0 siblings, 0 replies; 23+ messages in thread
From: John Fastabend @ 2021-09-02  4:40 UTC (permalink / raw)
  To: Andrii Nakryiko, John Fastabend; +Cc: Joanne Koong, bpf

Andrii Nakryiko wrote:
> On Wed, Sep 1, 2021 at 8:18 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Joanne Koong wrote:
> > > Bloom filters are a space-efficient probabilistic data structure
> > > used to quickly test whether an element exists in a set.
> > > In a bloom filter, false positives are possible whereas false
> > > negatives are not.
> > >
> > > This patch adds a bloom filter map for bpf programs.
> > > The bloom filter map supports peek (determining whether an element
> > > is present in the map) and push (adding an element to the map)
> > > operations.These operations are exposed to userspace applications
> > > through the already existing syscalls in the following way:
> > >
> > > BPF_MAP_LOOKUP_ELEM -> peek
> > > BPF_MAP_UPDATE_ELEM -> push
> > >
> > > The bloom filter map does not have keys, only values. In light of
> > > this, the bloom filter map's API matches that of queue stack maps:
> > > user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> > > which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> > > and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> > > APIs to query or add an element to the bloom filter map. When the
> > > bloom filter map is created, it must be created with a key_size of 0.
> > >
> > > For updates, the user will pass in the element to add to the map
> > > as the value, wih a NULL key. For lookups, the user will pass in the
> > > element to query in the map as the value. In the verifier layer, this
> > > requires us to modify the argument type of a bloom filter's
> > > BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> > > the syscall layer, we need to copy over the user value so that in
> > > bpf_map_peek_elem, we know which specific value to query.
> > >
> > > The maximum number of entries in the bloom filter is not enforced; if
> > > the user wishes to insert more entries into the bloom filter than they
> > > specified as the max entries size of the bloom filter, that is permitted
> > > but the performance of their bloom filter will have a higher false
> > > positive rate.
> >
> > hmm I'm wondering if this means the memory footprint can grow without
> > bounds? Typically maps have an upper bound on memory established at
> > alloc time.
> 
> It's a bit unfortunate wording, but no, the amount of used memory is
> fixed. Bloom filter is a probabilistic data structure in which each
> "value" has few designated bits, determined by hash functions on that
> value. The number of bits is fixed, though. If all designated bits are
> set to 1, then we declare "value" to be present in the Bloom filter.

Thanks actually reading the code helped as well. Looks like a v2
will likely happen perhaps a small note here for maxiumum number of
entries in the bloom filter is only used to estimate the number of
bits used.

I guess if a BPF user did want to enforce the max number of entries
a simple BPF counter wouldn't be much for users to add. I usually
add these to maps for debug/statistic reasons anyways.

> If at least one is 0, then we definitely didn't see "value" yet
> (that's what guarantees no false negatives; this also answers Alexei's
> worry about possible false negative due to unsynchronized update and
> lookup, it can't happen by the nature of the data structure design,
> regardless of synchronization). We can, of course, have all such bits
> set to 1 even if the actual value was never "added" into the Bloom
> filter, just by the nature of hash collisions with other elements'
> hash functions (that's where the false positive comes from). It might
> be useful to just leave a link to Wikipedia for description of Bloom
> filter data structure ([0]).
> 
>   [0] https://en.wikipedia.org/wiki/Bloom_filter

Thanks. Yep needed a refresher for sure.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-02  1:44   ` Andrii Nakryiko
@ 2021-09-02  5:11     ` John Fastabend
  2021-09-02  6:16       ` Martin KaFai Lau
  2021-09-02 22:07       ` Joanne Koong
  0 siblings, 2 replies; 23+ messages in thread
From: John Fastabend @ 2021-09-02  5:11 UTC (permalink / raw)
  To: Andrii Nakryiko, Joanne Koong; +Cc: bpf

Andrii Nakryiko wrote:
> On Tue, Aug 31, 2021 at 3:51 PM Joanne Koong <joannekoong@fb.com> wrote:
> >
> > Bloom filters are a space-efficient probabilistic data structure
> > used to quickly test whether an element exists in a set.
> > In a bloom filter, false positives are possible whereas false
> > negatives are not.
> >
> > This patch adds a bloom filter map for bpf programs.
> > The bloom filter map supports peek (determining whether an element
> > is present in the map) and push (adding an element to the map)
> > operations.These operations are exposed to userspace applications
> > through the already existing syscalls in the following way:
> >
> > BPF_MAP_LOOKUP_ELEM -> peek
> > BPF_MAP_UPDATE_ELEM -> push
> >
> > The bloom filter map does not have keys, only values. In light of
> > this, the bloom filter map's API matches that of queue stack maps:
> > user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> > which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> > and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> > APIs to query or add an element to the bloom filter map. When the
> > bloom filter map is created, it must be created with a key_size of 0.
> >
> > For updates, the user will pass in the element to add to the map
> > as the value, wih a NULL key. For lookups, the user will pass in the
> > element to query in the map as the value. In the verifier layer, this
> > requires us to modify the argument type of a bloom filter's
> > BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> > the syscall layer, we need to copy over the user value so that in
> > bpf_map_peek_elem, we know which specific value to query.
> >
> > The maximum number of entries in the bloom filter is not enforced; if
> > the user wishes to insert more entries into the bloom filter than they
> > specified as the max entries size of the bloom filter, that is permitted
> > but the performance of their bloom filter will have a higher false
> > positive rate.
> >
> > The number of hashes to use for the bloom filter is configurable from
> > userspace. The benchmarks later in this patchset can help compare the
> > performances of different number of hashes on different entry
> > sizes. In general, using more hashes decreases the speed of a lookup,
> > but increases the false positive rate of an element being detected in the
> > bloom filter.
> >
> > Signed-off-by: Joanne Koong <joannekoong@fb.com>
> > ---
> 
> This looks nice and simple. I left a few comments below.
> 
> But one high-level point I wanted to discuss was that bloom filter
> logic is actually simple enough to be implementable by pure BPF
> program logic. The only problematic part is generic hashing of a piece
> of memory. Regardless of implementing bloom filter as kernel-provided
> BPF map or implementing it with custom BPF program logic, having BPF
> helper for hashing a piece of memory seems extremely useful and very
> generic. I can't recall if we ever discussed adding such helpers, but
> maybe we should?

Aha started typing the same thing :)

Adding generic hash helper has been on my todo list and close to the top
now. The use case is hashing data from skb payloads and such from kprobe
and sockmap side. I'm happy to work on it as soon as possible if no one
else picks it up.

> 
> It would be a really interesting experiment to implement the same
> logic in pure BPF logic and run it as another benchmark, along the
> Bloom filter map. BPF has both spinlock and atomic operation, so we
> can compare and contrast. We only miss hashing BPF helper.

The one issue I've found writing a hash logic is its a bit tricky
to get the verifier to consume it. Especially when the hash is nested
inside a for loop and sometimes a couple for loops so you end up with
things like,

 for (i = 0; i < someTlvs; i++) {
  for (j = 0; j < someKeys; i++) {
    ...
    bpf_hash(someValue)
    ...
 }

I've find small seemingly unrelated changes cause the complexity limit
to explode. Usually we can work around it with code to get pruning
points and such, but its a bit ugly. Perhaps this means we need
to dive into details of why the complexity explodes, but I've not
got to it yet. The todo list is long.

> 
> Being able to do this in pure BPF code has a bunch of advantages.
> Depending on specific application, users can decide to:
>   - speed up the operation by ditching spinlock or atomic operation,
> if the logic allows to lose some bit updates;
>   - decide on optimal size, which might not be a power of 2, depending
> on memory vs CPU trade of in any particular case;
>   - it's also possible to implement a more general Counting Bloom
> filter, all without modifying the kernel.

Also it means no call and if you build it on top of an array
map of size 1 its just a load. Could this be a performance win for
example a Bloom filter in XDP for DDOS? Maybe. Not sure if the program
would be complex enough a call might be in the noise. I don't know.

> 
> We could go further, and start implementing other simple data
> structures relying on hashing, like HyperLogLog. And all with no
> kernel modifications. Map-in-map is no issue as well, because there is
> a choice of using either fixed global data arrays for maximum
> performance, or using BPF_MAP_TYPE_ARRAY maps that can go inside
> map-in-map.

We've been doing most of our array maps as single entry arrays
at this point and just calculating offsets directly in BPF. Same
for some simple hashing algorithms.

> 
> Basically, regardless of having this map in the kernel or not, let's
> have a "universal" hashing function as a BPF helper as well.

Yes please!

> 
> Thoughts?

I like it, but not the bloom filter expert here.

> 
> >  include/linux/bpf.h            |   3 +-
> >  include/linux/bpf_types.h      |   1 +
> >  include/uapi/linux/bpf.h       |   3 +
> >  kernel/bpf/Makefile            |   2 +-
> >  kernel/bpf/bloom_filter.c      | 171 +++++++++++++++++++++++++++++++++
> >  kernel/bpf/syscall.c           |  20 +++-
> >  kernel/bpf/verifier.c          |  19 +++-
> >  tools/include/uapi/linux/bpf.h |   3 +
> >  8 files changed, 214 insertions(+), 8 deletions(-)
> >  create mode 100644 kernel/bpf/bloom_filter.c
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f4c16f19f83e..2abaa1052096 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -181,7 +181,8 @@ struct bpf_map {
> >         u32 btf_vmlinux_value_type_id;
> >         bool bypass_spec_v1;
> >         bool frozen; /* write-once; write-protected by freeze_mutex */
> > -       /* 22 bytes hole */
> > +       u32 nr_hashes; /* used for bloom filter maps */
> > +       /* 18 bytes hole */
> >
> >         /* The 3rd and 4th cacheline with misc members to avoid false sharing
> >          * particularly with refcounting.
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index 9c81724e4b98..c4424ac2fa02 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
> >  #endif
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
> > +BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
> >
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> >  BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 791f31dd0abe..c2acb0a510fe 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -906,6 +906,7 @@ enum bpf_map_type {
> >         BPF_MAP_TYPE_RINGBUF,
> >         BPF_MAP_TYPE_INODE_STORAGE,
> >         BPF_MAP_TYPE_TASK_STORAGE,
> > +       BPF_MAP_TYPE_BLOOM_FILTER,
> >  };
> >
> >  /* Note that tracing related programs such as
> > @@ -1274,6 +1275,7 @@ union bpf_attr {
> >                                                    * struct stored as the
> >                                                    * map value
> >                                                    */
> > +               __u32   nr_hashes;      /* used for configuring bloom filter maps */
> 
> This feels like a bit too one-off property that won't be ever reused
> by any other type of map. Also consider that we should probably limit
> nr_hashes to some pretty small sane value (<16? <64?) to prevent easy
> DOS from inside BPF programs (e.g., set nr_hash to 2bln, each
> operation is now extremely slow and CPU intensive). So with that,
> maybe let's provide number of hashes as part of map_flags? And as
> Alexei proposed, zero would mean some recommended value (2 or 3,
> right?). This would also mean that libbpf won't need to know about
> one-off map property in parsing BTF map definitions.
> 
> >         };
> >
> >         struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
> > @@ -5594,6 +5596,7 @@ struct bpf_map_info {
> >         __u32 btf_id;
> >         __u32 btf_key_type_id;
> >         __u32 btf_value_type_id;
> > +       __u32 nr_hashes; /* used for bloom filter maps */
> >  } __attribute__((aligned(8)));
> >
> >  struct bpf_btf_info {
> > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> > index 7f33098ca63f..cf6ca339f3cd 100644
> > --- a/kernel/bpf/Makefile
> > +++ b/kernel/bpf/Makefile
> > @@ -7,7 +7,7 @@ endif
> >  CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
> >
> >  obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
> > -obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
> > +obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
> >  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
> >  obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
> >  obj-${CONFIG_BPF_LSM}    += bpf_inode_storage.o
> > diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
> > new file mode 100644
> > index 000000000000..3ae799ab3747
> > --- /dev/null
> > +++ b/kernel/bpf/bloom_filter.c
> > @@ -0,0 +1,171 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2021 Facebook */
> > +
> > +#include <linux/bitmap.h>
> > +#include <linux/bpf.h>
> > +#include <linux/err.h>
> > +#include <linux/jhash.h>
> > +#include <linux/random.h>
> > +#include <linux/spinlock.h>
> > +
> > +#define BLOOM_FILTER_CREATE_FLAG_MASK \
> > +       (BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
> > +
> > +struct bpf_bloom_filter {
> > +       struct bpf_map map;
> > +       u32 bit_array_mask;
> > +       u32 hash_seed;
> > +       /* Used for synchronizing parallel writes to the bit array */
> > +       spinlock_t spinlock;
> > +       unsigned long bit_array[];
> > +};
> > +
> > +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
> > +{
> > +       struct bpf_bloom_filter *bloom_filter =
> > +               container_of(map, struct bpf_bloom_filter, map);
> > +       u32 i, hash;
> > +
> > +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
> > +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
> > +                       bloom_filter->bit_array_mask;
> > +               if (!test_bit(hash, bloom_filter->bit_array))
> > +                       return -ENOENT;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
> > +{
> > +       int numa_node = bpf_map_attr_numa_node(attr);
> > +       u32 nr_bits, bit_array_bytes, bit_array_mask;
> > +       struct bpf_bloom_filter *bloom_filter;
> > +
> > +       if (!bpf_capable())
> > +               return ERR_PTR(-EPERM);
> > +
> > +       if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
> > +           attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
> > +           !bpf_map_flags_access_ok(attr->map_flags))
> > +               return ERR_PTR(-EINVAL);
> > +
> > +       /* For the bloom filter, the optimal bit array size that minimizes the
> > +        * false positive probability is n * k / ln(2) where n is the number of
> > +        * expected entries in the bloom filter and k is the number of hash
> > +        * functions. We use 7 / 5 to approximate 1 / ln(2).
> > +        *
> > +        * We round this up to the nearest power of two to enable more efficient
> > +        * hashing using bitmasks. The bitmask will be the bit array size - 1.
> > +        *
> > +        * If this overflows a u32, the bit array size will have 2^32 (4
> > +        * GB) bits.

Would it be better to return E2BIG or EINVAL here? Speculating a bit, but if I was
a user I might want to know that the number of bits I pushed down is not the actual
number?

Another thought, would it be simpler to let user do this calculation and just let
max_elements be number of bits they want? Then we could have examples with the
above comment. Just a thought...

> > +        */
> > +       if (unlikely(check_mul_overflow(attr->max_entries, attr->nr_hashes, &nr_bits)) ||
> > +           unlikely(check_mul_overflow(nr_bits / 5, (u32)7, &nr_bits)) ||
> > +           unlikely(nr_bits > (1UL << 31))) {
> 
> nit: map_alloc is not performance-critical (because it's infrequent),
> so unlikely() are probably unnecessary?
> 
> > +               /* The bit array size is 2^32 bits but to avoid overflowing the
> > +                * u32, we use BITS_TO_BYTES(U32_MAX), which will round up to the
> > +                * equivalent number of bytes
> > +                */
> > +               bit_array_bytes = BITS_TO_BYTES(U32_MAX);
> > +               bit_array_mask = U32_MAX;
> > +       } else {
> > +               if (nr_bits <= BITS_PER_LONG)
> > +                       nr_bits = BITS_PER_LONG;
> > +               else
> > +                       nr_bits = roundup_pow_of_two(nr_bits);
> > +               bit_array_bytes = BITS_TO_BYTES(nr_bits);
> > +               bit_array_mask = nr_bits - 1;
> > +       }
> > +
> > +       bit_array_bytes = roundup(bit_array_bytes, sizeof(unsigned long));
> > +       bloom_filter = bpf_map_area_alloc(sizeof(*bloom_filter) + bit_array_bytes,
> > +                                         numa_node);
> > +
> > +       if (!bloom_filter)
> > +               return ERR_PTR(-ENOMEM);
> > +
> > +       bpf_map_init_from_attr(&bloom_filter->map, attr);
> > +       bloom_filter->map.nr_hashes = attr->nr_hashes;
> > +
> > +       bloom_filter->bit_array_mask = bit_array_mask;
> > +       spin_lock_init(&bloom_filter->spinlock);
> > +
> > +       if (!(attr->map_flags & BPF_F_ZERO_SEED))
> > +               bloom_filter->hash_seed = get_random_int();
> > +
> > +       return &bloom_filter->map;
> > +}
> > +
> > +static void bloom_filter_map_free(struct bpf_map *map)
> > +{
> > +       struct bpf_bloom_filter *bloom_filter =
> > +               container_of(map, struct bpf_bloom_filter, map);
> > +
> > +       bpf_map_area_free(bloom_filter);
> > +}
> > +
> > +static int bloom_filter_map_push_elem(struct bpf_map *map, void *value,
> > +                                     u64 flags)
> > +{
> > +       struct bpf_bloom_filter *bloom_filter =
> > +               container_of(map, struct bpf_bloom_filter, map);
> > +       unsigned long spinlock_flags;
> > +       u32 i, hash;
> > +
> > +       if (flags != BPF_ANY)
> > +               return -EINVAL;
> > +
> > +       spin_lock_irqsave(&bloom_filter->spinlock, spinlock_flags);
> > +
> 
> If value_size is pretty big, hashing might take a noticeable amount of
> CPU, during which we'll be keeping spinlock. With what I said above
> about sane number of hashes, if we bound it to small reasonable number
> (e.g., 16), we can have a local 16-element array with hashes
> calculated before we take lock. That way spinlock will be held only
> for few bit flips.

+1. Anyways we are inside a RCU section here and the map shouldn't
disapper without a grace period so we can READ_ONCE the seed right?
Or are we thinking about sleepable programs here?

> Also, I wonder if ditching spinlock in favor of atomic bit set
> operation would improve performance in typical scenarios. Seems like
> set_bit() is an atomic operation, so it should be easy to test. Do you
> mind running benchmarks with spinlock and with set_bit()?

With the jhash pulled out of lock, I think it might be noticable. Curious
to see.

>
> > +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
> > +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
> > +                       bloom_filter->bit_array_mask;
> > +               bitmap_set(bloom_filter->bit_array, hash, 1);
> > +       }
> > +
> > +       spin_unlock_irqrestore(&bloom_filter->spinlock, spinlock_flags);
> > +
> > +       return 0;
> > +}
> > +
> 
> [...]



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-02  5:11     ` John Fastabend
@ 2021-09-02  6:16       ` Martin KaFai Lau
  2021-09-02 22:07       ` Joanne Koong
  1 sibling, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2021-09-02  6:16 UTC (permalink / raw)
  To: John Fastabend; +Cc: Andrii Nakryiko, Joanne Koong, bpf

On Wed, Sep 01, 2021 at 10:11:20PM -0700, John Fastabend wrote:
[ ... ]

> > > +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
> > > +{
> > > +       int numa_node = bpf_map_attr_numa_node(attr);
> > > +       u32 nr_bits, bit_array_bytes, bit_array_mask;
> > > +       struct bpf_bloom_filter *bloom_filter;
> > > +
> > > +       if (!bpf_capable())
> > > +               return ERR_PTR(-EPERM);
> > > +
> > > +       if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
> > > +           attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
> > > +           !bpf_map_flags_access_ok(attr->map_flags))
> > > +               return ERR_PTR(-EINVAL);
> > > +
> > > +       /* For the bloom filter, the optimal bit array size that minimizes the
> > > +        * false positive probability is n * k / ln(2) where n is the number of
> > > +        * expected entries in the bloom filter and k is the number of hash
> > > +        * functions. We use 7 / 5 to approximate 1 / ln(2).
> > > +        *
> > > +        * We round this up to the nearest power of two to enable more efficient
> > > +        * hashing using bitmasks. The bitmask will be the bit array size - 1.
> > > +        *
> > > +        * If this overflows a u32, the bit array size will have 2^32 (4
> > > +        * GB) bits.
> 
> Would it be better to return E2BIG or EINVAL here? Speculating a bit, but if I was
> a user I might want to know that the number of bits I pushed down is not the actual
> number?
> 
> Another thought, would it be simpler to let user do this calculation and just let
> max_elements be number of bits they want? Then we could have examples with the
> above comment. Just a thought...
Instead of having user second guessing on what max_entries means
for a particular map, I think it is better to keep max_entries
meaning as consistent as possible and let the kernel figure out
the correct nr_bits to use.

[ ... ]

> > > +static int bloom_filter_map_push_elem(struct bpf_map *map, void *value,
> > > +                                     u64 flags)
> > > +{
> > > +       struct bpf_bloom_filter *bloom_filter =
> > > +               container_of(map, struct bpf_bloom_filter, map);
> > > +       unsigned long spinlock_flags;
> > > +       u32 i, hash;
> > > +
> > > +       if (flags != BPF_ANY)
> > > +               return -EINVAL;
> > > +
> > > +       spin_lock_irqsave(&bloom_filter->spinlock, spinlock_flags);
> > > +
> > 
> > If value_size is pretty big, hashing might take a noticeable amount of
> > CPU, during which we'll be keeping spinlock. With what I said above
Good catch on big value_size.

> > about sane number of hashes, if we bound it to small reasonable number
> > (e.g., 16), we can have a local 16-element array with hashes
> > calculated before we take lock. That way spinlock will be held only
> > for few bit flips.
> 
> +1. Anyways we are inside a RCU section here and the map shouldn't
> disapper without a grace period so we can READ_ONCE the seed right?
> Or are we thinking about sleepable programs here?
> 
> > Also, I wonder if ditching spinlock in favor of atomic bit set
> > operation would improve performance in typical scenarios. Seems like
> > set_bit() is an atomic operation, so it should be easy to test. Do you
> > mind running benchmarks with spinlock and with set_bit()?
The atomic set_bit() is a good idea.  Then no need to have a 16-element array
and keep thing simple.
It is in general useful to optimize the update/push path (e.g. I would like to
have the bloom-filter bench populating millions entries faster).   Our current
usecase is to have the userspace populates the map (e.g. a lot of suspicious
IP that we have already learned) at the very beginning and then very sparse
update after that.  The bpf prog will mostly only lookup/peek which I think
is a better optimization and benchmark target.

> 
> With the jhash pulled out of lock, I think it might be noticable. Curious
> to see.
> 
> >
> > > +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
> > > +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
> > > +                       bloom_filter->bit_array_mask;
> > > +               bitmap_set(bloom_filter->bit_array, hash, 1);
> > > +       }
> > > +
> > > +       spin_unlock_irqrestore(&bloom_filter->spinlock, spinlock_flags);
> > > +
> > > +       return 0;
> > > +}
> > > +
> > 
> > [...]
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-01  2:55   ` Alexei Starovoitov
@ 2021-09-02 19:11     ` Joanne Koong
  0 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2021-09-02 19:11 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: bpf

On 8/31/21 7:55 PM, Alexei Starovoitov wrote:

> On Tue, Aug 31, 2021 at 03:50:01PM -0700, Joanne Koong wrote:
>> +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
>> +{
>> +	struct bpf_bloom_filter *bloom_filter =
>> +		container_of(map, struct bpf_bloom_filter, map);
>> +	u32 i, hash;
>> +
>> +	for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
>> +		hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
>> +			bloom_filter->bit_array_mask;
>> +		if (!test_bit(hash, bloom_filter->bit_array))
>> +			return -ENOENT;
>> +	}
> I'm curious what bloom filter theory says about n-hashes > 1
> concurrent access with updates in terms of false negative?
> Two concurrent updates race is protected by spin_lock,
> but what about peek and update?
> The update might set one bit, but not the other.
> That shouldn't trigger false negative lookup, right?

For cases where there is a concurrent peek and update, the user is 
responsible for
synchronizing these operations if they want to ensure that the peek will 
always return
true while the update is occurring.
I will add this to the commit message.

> Is bloom filter supported as inner map?
> Hash and lru maps are often used as inner maps.
> The lookups from them would be pre-filtered by bloom filter
> map that would have to be (in some cases) inner map.
> I suspect one bloom filter for all inner maps might be
> reasonable workaround in some cases too.
> The delete is not supported in bloom filter, of course.
> Would be good to mention it in the commit log.
> Since there is no delete the users would likely need
> to replace the whole bloom filter. So map-in-map would
> become necessary.
> Do you think 'clear-all' operation might be useful for bloom filter?
> It feels that if map-in-map is supported then clear-all is probably
> not that useful, since atomic replacement and delete of the map
> would work better. 'clear-all' will have issues with
> lookup, since it cannot be done in parallel.
> Would be good to document all these ideas and restrictions.

The bloom filter is supported as an inner map. I will include a test for 
this and add it to v2
(and document this in the commit message in v2)

> Could you collect 'perf annotate' data for the above performance
> critical loop?
> I wonder whether using jhash2 and forcing u32 value size could speed it up.
> Probably not, but would be good to check, since restricting value_size
> later would be problematic due to backward compatibility.
>
> The recommended nr_hashes=3 was computed with value_size=8, right?
> I wonder whether nr_hashes would be different for value_size=16 and =4
> which are ipv6/ipv4 addresses and value_size = 40
> an approximation of networking n-tuple.
Great suggestions! I will do all of these you mentioned, after I 
incorporate the edits for v2,
and report back with the results.
>> +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
>> +{
>> +	int numa_node = bpf_map_attr_numa_node(attr);
>> +	u32 nr_bits, bit_array_bytes, bit_array_mask;
>> +	struct bpf_bloom_filter *bloom_filter;
>> +
>> +	if (!bpf_capable())
>> +		return ERR_PTR(-EPERM);
>> +
>> +	if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
>> +	    attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
> Would it make sense to default to nr_hashes=3 if zero is passed?
> This way the libbpf changes for nr_hashes will become 'optional'.
> Most users wouldn't have to specify it explicitly.

I like this idea - it'll make the API more friendly to use as well for 
people who
might not be acquainted with bloom filters :)

>
> Overall looks great!
> Performance numbers are impressive.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-02  5:11     ` John Fastabend
  2021-09-02  6:16       ` Martin KaFai Lau
@ 2021-09-02 22:07       ` Joanne Koong
  2021-09-03  0:56         ` Martin KaFai Lau
  2021-09-03 17:22         ` John Fastabend
  1 sibling, 2 replies; 23+ messages in thread
From: Joanne Koong @ 2021-09-02 22:07 UTC (permalink / raw)
  To: John Fastabend, Andrii Nakryiko; +Cc: bpf

On 9/1/21 10:11 PM, John Fastabend wrote:

> Andrii Nakryiko wrote:
>> On Tue, Aug 31, 2021 at 3:51 PM Joanne Koong <joannekoong@fb.com> wrote:
>>> Bloom filters are a space-efficient probabilistic data structure
>>> used to quickly test whether an element exists in a set.
>>> In a bloom filter, false positives are possible whereas false
>>> negatives are not.
>>>
>>> This patch adds a bloom filter map for bpf programs.
>>> The bloom filter map supports peek (determining whether an element
>>> is present in the map) and push (adding an element to the map)
>>> operations.These operations are exposed to userspace applications
>>> through the already existing syscalls in the following way:
>>>
>>> BPF_MAP_LOOKUP_ELEM -> peek
>>> BPF_MAP_UPDATE_ELEM -> push
>>>
>>> The bloom filter map does not have keys, only values. In light of
>>> this, the bloom filter map's API matches that of queue stack maps:
>>> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
>>> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
>>> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
>>> APIs to query or add an element to the bloom filter map. When the
>>> bloom filter map is created, it must be created with a key_size of 0.
>>>
>>> For updates, the user will pass in the element to add to the map
>>> as the value, wih a NULL key. For lookups, the user will pass in the
>>> element to query in the map as the value. In the verifier layer, this
>>> requires us to modify the argument type of a bloom filter's
>>> BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
>>> the syscall layer, we need to copy over the user value so that in
>>> bpf_map_peek_elem, we know which specific value to query.
>>>
>>> The maximum number of entries in the bloom filter is not enforced; if
>>> the user wishes to insert more entries into the bloom filter than they
>>> specified as the max entries size of the bloom filter, that is permitted
>>> but the performance of their bloom filter will have a higher false
>>> positive rate.
>>>
>>> The number of hashes to use for the bloom filter is configurable from
>>> userspace. The benchmarks later in this patchset can help compare the
>>> performances of different number of hashes on different entry
>>> sizes. In general, using more hashes decreases the speed of a lookup,
>>> but increases the false positive rate of an element being detected in the
>>> bloom filter.
>>>
>>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
>>> ---
>> This looks nice and simple. I left a few comments below.
>>
>> But one high-level point I wanted to discuss was that bloom filter
>> logic is actually simple enough to be implementable by pure BPF
>> program logic. The only problematic part is generic hashing of a piece
>> of memory. Regardless of implementing bloom filter as kernel-provided
>> BPF map or implementing it with custom BPF program logic, having BPF
>> helper for hashing a piece of memory seems extremely useful and very
>> generic. I can't recall if we ever discussed adding such helpers, but
>> maybe we should?
> Aha started typing the same thing :)
>
> Adding generic hash helper has been on my todo list and close to the top
> now. The use case is hashing data from skb payloads and such from kprobe
> and sockmap side. I'm happy to work on it as soon as possible if no one
> else picks it up.
>
>> It would be a really interesting experiment to implement the same
>> logic in pure BPF logic and run it as another benchmark, along the
>> Bloom filter map. BPF has both spinlock and atomic operation, so we
>> can compare and contrast. We only miss hashing BPF helper.
> The one issue I've found writing a hash logic is its a bit tricky
> to get the verifier to consume it. Especially when the hash is nested
> inside a for loop and sometimes a couple for loops so you end up with
> things like,
>
>   for (i = 0; i < someTlvs; i++) {
>    for (j = 0; j < someKeys; i++) {
>      ...
>      bpf_hash(someValue)
>      ...
>   }
>
> I've find small seemingly unrelated changes cause the complexity limit
> to explode. Usually we can work around it with code to get pruning
> points and such, but its a bit ugly. Perhaps this means we need
> to dive into details of why the complexity explodes, but I've not
> got to it yet. The todo list is long.
>
>> Being able to do this in pure BPF code has a bunch of advantages.
>> Depending on specific application, users can decide to:
>>    - speed up the operation by ditching spinlock or atomic operation,
>> if the logic allows to lose some bit updates;
>>    - decide on optimal size, which might not be a power of 2, depending
>> on memory vs CPU trade of in any particular case;
>>    - it's also possible to implement a more general Counting Bloom
>> filter, all without modifying the kernel.
> Also it means no call and if you build it on top of an array
> map of size 1 its just a load. Could this be a performance win for
> example a Bloom filter in XDP for DDOS? Maybe. Not sure if the program
> would be complex enough a call might be in the noise. I don't know.
>
>> We could go further, and start implementing other simple data
>> structures relying on hashing, like HyperLogLog. And all with no
>> kernel modifications. Map-in-map is no issue as well, because there is
>> a choice of using either fixed global data arrays for maximum
>> performance, or using BPF_MAP_TYPE_ARRAY maps that can go inside
>> map-in-map.
> We've been doing most of our array maps as single entry arrays
> at this point and just calculating offsets directly in BPF. Same
> for some simple hashing algorithms.
>
>> Basically, regardless of having this map in the kernel or not, let's
>> have a "universal" hashing function as a BPF helper as well.
> Yes please!
I completely agree!
>> Thoughts?
> I like it, but not the bloom filter expert here.


Ooh, I like your idea of comparing the performance of the bloom filter with
a kernel-provided BPF map vs. custom BPF program logic using a
hash helper, especially if a BPF hash helper is something useful that
we want to add to the codebase in and of itself!

>>>   include/linux/bpf.h            |   3 +-
>>>   include/linux/bpf_types.h      |   1 +
>>>   include/uapi/linux/bpf.h       |   3 +
>>>   kernel/bpf/Makefile            |   2 +-
>>>   kernel/bpf/bloom_filter.c      | 171 +++++++++++++++++++++++++++++++++
>>>   kernel/bpf/syscall.c           |  20 +++-
>>>   kernel/bpf/verifier.c          |  19 +++-
>>>   tools/include/uapi/linux/bpf.h |   3 +
>>>   8 files changed, 214 insertions(+), 8 deletions(-)
>>>   create mode 100644 kernel/bpf/bloom_filter.c
>>>
>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>> index f4c16f19f83e..2abaa1052096 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -181,7 +181,8 @@ struct bpf_map {
>>>          u32 btf_vmlinux_value_type_id;
>>>          bool bypass_spec_v1;
>>>          bool frozen; /* write-once; write-protected by freeze_mutex */
>>> -       /* 22 bytes hole */
>>> +       u32 nr_hashes; /* used for bloom filter maps */
>>> +       /* 18 bytes hole */
>>>
>>>          /* The 3rd and 4th cacheline with misc members to avoid false sharing
>>>           * particularly with refcounting.
>>> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>>> index 9c81724e4b98..c4424ac2fa02 100644
>>> --- a/include/linux/bpf_types.h
>>> +++ b/include/linux/bpf_types.h
>>> @@ -125,6 +125,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
>>>   BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
>>>   #endif
>>>   BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops)
>>> +BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops)
>>>
>>>   BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
>>>   BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 791f31dd0abe..c2acb0a510fe 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -906,6 +906,7 @@ enum bpf_map_type {
>>>          BPF_MAP_TYPE_RINGBUF,
>>>          BPF_MAP_TYPE_INODE_STORAGE,
>>>          BPF_MAP_TYPE_TASK_STORAGE,
>>> +       BPF_MAP_TYPE_BLOOM_FILTER,
>>>   };
>>>
>>>   /* Note that tracing related programs such as
>>> @@ -1274,6 +1275,7 @@ union bpf_attr {
>>>                                                     * struct stored as the
>>>                                                     * map value
>>>                                                     */
>>> +               __u32   nr_hashes;      /* used for configuring bloom filter maps */
>> This feels like a bit too one-off property that won't be ever reused
>> by any other type of map. Also consider that we should probably limit
>> nr_hashes to some pretty small sane value (<16? <64?) to prevent easy
>> DOS from inside BPF programs (e.g., set nr_hash to 2bln, each
>> operation is now extremely slow and CPU intensive). So with that,
>> maybe let's provide number of hashes as part of map_flags? And as
>> Alexei proposed, zero would mean some recommended value (2 or 3,
>> right?). This would also mean that libbpf won't need to know about
>> one-off map property in parsing BTF map definitions.
I think we can limit nr_hashes to 10, since 10 hashes has a false 
positive rate of
roughly ~0%
>>>          };
>>>
>>>          struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
>>> @@ -5594,6 +5596,7 @@ struct bpf_map_info {
>>>          __u32 btf_id;
>>>          __u32 btf_key_type_id;
>>>          __u32 btf_value_type_id;
>>> +       __u32 nr_hashes; /* used for bloom filter maps */
>>>   } __attribute__((aligned(8)));
>>>
>>>   struct bpf_btf_info {
>>> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
>>> index 7f33098ca63f..cf6ca339f3cd 100644
>>> --- a/kernel/bpf/Makefile
>>> +++ b/kernel/bpf/Makefile
>>> @@ -7,7 +7,7 @@ endif
>>>   CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
>>>
>>>   obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
>>> -obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
>>> +obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
>>>   obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
>>>   obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
>>>   obj-${CONFIG_BPF_LSM}    += bpf_inode_storage.o
>>> diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
>>> new file mode 100644
>>> index 000000000000..3ae799ab3747
>>> --- /dev/null
>>> +++ b/kernel/bpf/bloom_filter.c
>>> @@ -0,0 +1,171 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/* Copyright (c) 2021 Facebook */
>>> +
>>> +#include <linux/bitmap.h>
>>> +#include <linux/bpf.h>
>>> +#include <linux/err.h>
>>> +#include <linux/jhash.h>
>>> +#include <linux/random.h>
>>> +#include <linux/spinlock.h>
>>> +
>>> +#define BLOOM_FILTER_CREATE_FLAG_MASK \
>>> +       (BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
>>> +
>>> +struct bpf_bloom_filter {
>>> +       struct bpf_map map;
>>> +       u32 bit_array_mask;
>>> +       u32 hash_seed;
>>> +       /* Used for synchronizing parallel writes to the bit array */
>>> +       spinlock_t spinlock;
>>> +       unsigned long bit_array[];
>>> +};
>>> +
>>> +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
>>> +{
>>> +       struct bpf_bloom_filter *bloom_filter =
>>> +               container_of(map, struct bpf_bloom_filter, map);
>>> +       u32 i, hash;
>>> +
>>> +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
>>> +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
>>> +                       bloom_filter->bit_array_mask;
>>> +               if (!test_bit(hash, bloom_filter->bit_array))
>>> +                       return -ENOENT;
>>> +       }
>>> +
>>> +       return 0;
>>> +}
>>> +
>>> +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
>>> +{
>>> +       int numa_node = bpf_map_attr_numa_node(attr);
>>> +       u32 nr_bits, bit_array_bytes, bit_array_mask;
>>> +       struct bpf_bloom_filter *bloom_filter;
>>> +
>>> +       if (!bpf_capable())
>>> +               return ERR_PTR(-EPERM);
>>> +
>>> +       if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
>>> +           attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
>>> +           !bpf_map_flags_access_ok(attr->map_flags))
>>> +               return ERR_PTR(-EINVAL);
>>> +
>>> +       /* For the bloom filter, the optimal bit array size that minimizes the
>>> +        * false positive probability is n * k / ln(2) where n is the number of
>>> +        * expected entries in the bloom filter and k is the number of hash
>>> +        * functions. We use 7 / 5 to approximate 1 / ln(2).
>>> +        *
>>> +        * We round this up to the nearest power of two to enable more efficient
>>> +        * hashing using bitmasks. The bitmask will be the bit array size - 1.
>>> +        *
>>> +        * If this overflows a u32, the bit array size will have 2^32 (4
>>> +        * GB) bits.
> Would it be better to return E2BIG or EINVAL here? Speculating a bit, but if I was
> a user I might want to know that the number of bits I pushed down is not the actual
> number?

I think if we return E2BIG or EINVAL here, this will fail to create the 
bloom filter map
if the max_entries exceeds some limit (~3 billion, according to my math) 
whereas
automatically setting the bit array size to 2^32 if the max_entries is
extraordinarily large will still allow the user to create and use a 
bloom filter (albeit
one with a higher false positive rate).

> Another thought, would it be simpler to let user do this calculation and just let
> max_elements be number of bits they want? Then we could have examples with the
> above comment. Just a thought...

I like Martin's idea of keeping the max_entries meaning consistent 
across all map types.
I think that makes the interface clearer for users.

>>> +        */
>>> +       if (unlikely(check_mul_overflow(attr->max_entries, attr->nr_hashes, &nr_bits)) ||
>>> +           unlikely(check_mul_overflow(nr_bits / 5, (u32)7, &nr_bits)) ||
>>> +           unlikely(nr_bits > (1UL << 31))) {
>> nit: map_alloc is not performance-critical (because it's infrequent),
>> so unlikely() are probably unnecessary?
>>
>>> +               /* The bit array size is 2^32 bits but to avoid overflowing the
>>> +                * u32, we use BITS_TO_BYTES(U32_MAX), which will round up to the
>>> +                * equivalent number of bytes
>>> +                */
>>> +               bit_array_bytes = BITS_TO_BYTES(U32_MAX);
>>> +               bit_array_mask = U32_MAX;
>>> +       } else {
>>> +               if (nr_bits <= BITS_PER_LONG)
>>> +                       nr_bits = BITS_PER_LONG;
>>> +               else
>>> +                       nr_bits = roundup_pow_of_two(nr_bits);
>>> +               bit_array_bytes = BITS_TO_BYTES(nr_bits);
>>> +               bit_array_mask = nr_bits - 1;
>>> +       }
>>> +
>>> +       bit_array_bytes = roundup(bit_array_bytes, sizeof(unsigned long));
>>> +       bloom_filter = bpf_map_area_alloc(sizeof(*bloom_filter) + bit_array_bytes,
>>> +                                         numa_node);
>>> +
>>> +       if (!bloom_filter)
>>> +               return ERR_PTR(-ENOMEM);
>>> +
>>> +       bpf_map_init_from_attr(&bloom_filter->map, attr);
>>> +       bloom_filter->map.nr_hashes = attr->nr_hashes;
>>> +
>>> +       bloom_filter->bit_array_mask = bit_array_mask;
>>> +       spin_lock_init(&bloom_filter->spinlock);
>>> +
>>> +       if (!(attr->map_flags & BPF_F_ZERO_SEED))
>>> +               bloom_filter->hash_seed = get_random_int();
>>> +
>>> +       return &bloom_filter->map;
>>> +}
>>> +
>>> +static void bloom_filter_map_free(struct bpf_map *map)
>>> +{
>>> +       struct bpf_bloom_filter *bloom_filter =
>>> +               container_of(map, struct bpf_bloom_filter, map);
>>> +
>>> +       bpf_map_area_free(bloom_filter);
>>> +}
>>> +
>>> +static int bloom_filter_map_push_elem(struct bpf_map *map, void *value,
>>> +                                     u64 flags)
>>> +{
>>> +       struct bpf_bloom_filter *bloom_filter =
>>> +               container_of(map, struct bpf_bloom_filter, map);
>>> +       unsigned long spinlock_flags;
>>> +       u32 i, hash;
>>> +
>>> +       if (flags != BPF_ANY)
>>> +               return -EINVAL;
>>> +
>>> +       spin_lock_irqsave(&bloom_filter->spinlock, spinlock_flags);
>>> +
>> If value_size is pretty big, hashing might take a noticeable amount of
>> CPU, during which we'll be keeping spinlock. With what I said above
>> about sane number of hashes, if we bound it to small reasonable number
>> (e.g., 16), we can have a local 16-element array with hashes
>> calculated before we take lock. That way spinlock will be held only
>> for few bit flips.
> +1. Anyways we are inside a RCU section here and the map shouldn't
> disapper without a grace period so we can READ_ONCE the seed right?
> Or are we thinking about sleepable programs here?
>
>> Also, I wonder if ditching spinlock in favor of atomic bit set
>> operation would improve performance in typical scenarios. Seems like
>> set_bit() is an atomic operation, so it should be easy to test. Do you
>> mind running benchmarks with spinlock and with set_bit()?
> With the jhash pulled out of lock, I think it might be noticable. Curious
> to see.
Awesome, I will test this out and report back!
>>> +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
>>> +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
>>> +                       bloom_filter->bit_array_mask;
>>> +               bitmap_set(bloom_filter->bit_array, hash, 1);
>>> +       }
>>> +
>>> +       spin_unlock_irqrestore(&bloom_filter->spinlock, spinlock_flags);
>>> +
>>> +       return 0;
>>> +}
>>> +
>> [...]
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-02 22:07       ` Joanne Koong
@ 2021-09-03  0:56         ` Martin KaFai Lau
  2021-09-03  7:13           ` Joanne Koong
  2021-09-03 17:22         ` John Fastabend
  1 sibling, 1 reply; 23+ messages in thread
From: Martin KaFai Lau @ 2021-09-03  0:56 UTC (permalink / raw)
  To: Joanne Koong; +Cc: John Fastabend, Andrii Nakryiko, bpf

On Thu, Sep 02, 2021 at 03:07:56PM -0700, Joanne Koong wrote:
[ ... ]
> > > But one high-level point I wanted to discuss was that bloom filter
> > > logic is actually simple enough to be implementable by pure BPF
> > > program logic. The only problematic part is generic hashing of a piece
> > > of memory. Regardless of implementing bloom filter as kernel-provided
> > > BPF map or implementing it with custom BPF program logic, having BPF
> > > helper for hashing a piece of memory seems extremely useful and very
> > > generic. I can't recall if we ever discussed adding such helpers, but
> > > maybe we should?
> > Aha started typing the same thing :)
> > 
> > Adding generic hash helper has been on my todo list and close to the top
> > now. The use case is hashing data from skb payloads and such from kprobe
> > and sockmap side. I'm happy to work on it as soon as possible if no one
> > else picks it up.
> > 
> > > It would be a really interesting experiment to implement the same
> > > logic in pure BPF logic and run it as another benchmark, along the
> > > Bloom filter map. BPF has both spinlock and atomic operation, so we
> > > can compare and contrast. We only miss hashing BPF helper.
> > 
> > I've find small seemingly unrelated changes cause the complexity limit
> > to explode. Usually we can work around it with code to get pruning
> > points and such, but its a bit ugly. Perhaps this means we need
> > to dive into details of why the complexity explodes, but I've not
> > got to it yet. The todo list is long.
> > 
> > > Being able to do this in pure BPF code has a bunch of advantages.
> > > Depending on specific application, users can decide to:
> > >    - speed up the operation by ditching spinlock or atomic operation,
> > > if the logic allows to lose some bit updates;
> > >    - decide on optimal size, which might not be a power of 2, depending
> > > on memory vs CPU trade of in any particular case;
> > >    - it's also possible to implement a more general Counting Bloom
> > > filter, all without modifying the kernel.
> > Also it means no call and if you build it on top of an array
> > map of size 1 its just a load. Could this be a performance win for
> > example a Bloom filter in XDP for DDOS? Maybe. Not sure if the program
> > would be complex enough a call might be in the noise. I don't know.
> > 
> > > We could go further, and start implementing other simple data
> > > structures relying on hashing, like HyperLogLog. And all with no
> > > kernel modifications. Map-in-map is no issue as well, because there is
> > > a choice of using either fixed global data arrays for maximum
> > > performance, or using BPF_MAP_TYPE_ARRAY maps that can go inside
> > > map-in-map.
> > We've been doing most of our array maps as single entry arrays
> > at this point and just calculating offsets directly in BPF. Same
> > for some simple hashing algorithms.
> > 
> > > Basically, regardless of having this map in the kernel or not, let's
> > > have a "universal" hashing function as a BPF helper as well.
> > > Thoughts?
> > I like it, but not the bloom filter expert here.
> 
> Ooh, I like your idea of comparing the performance of the bloom filter with
> a kernel-provided BPF map vs. custom BPF program logic using a
> hash helper, especially if a BPF hash helper is something useful that
> we want to add to the codebase in and of itself!
I think a hash helper will be useful in general but could it be a
separate experiment to try using it to implement some bpf maps (probably
a mix of an easy one and a little harder one) ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-03  0:56         ` Martin KaFai Lau
@ 2021-09-03  7:13           ` Joanne Koong
  2021-09-03 17:19             ` Andrii Nakryiko
  0 siblings, 1 reply; 23+ messages in thread
From: Joanne Koong @ 2021-09-03  7:13 UTC (permalink / raw)
  To: Martin KaFai Lau; +Cc: John Fastabend, Andrii Nakryiko, bpf

On 9/2/21 5:56 PM, Martin KaFai Lau wrote:

> On Thu, Sep 02, 2021 at 03:07:56PM -0700, Joanne Koong wrote:
> [ ... ]
>>>> But one high-level point I wanted to discuss was that bloom filter
>>>> logic is actually simple enough to be implementable by pure BPF
>>>> program logic. The only problematic part is generic hashing of a piece
>>>> of memory. Regardless of implementing bloom filter as kernel-provided
>>>> BPF map or implementing it with custom BPF program logic, having BPF
>>>> helper for hashing a piece of memory seems extremely useful and very
>>>> generic. I can't recall if we ever discussed adding such helpers, but
>>>> maybe we should?
>>> Aha started typing the same thing :)
>>>
>>> Adding generic hash helper has been on my todo list and close to the top
>>> now. The use case is hashing data from skb payloads and such from kprobe
>>> and sockmap side. I'm happy to work on it as soon as possible if no one
>>> else picks it up.
>>>
After thinking through this some more, I'm curious to hear your thoughts,
Andrii and John, on how the bitmap would be allocated. From what I
understand, we do not currently support dynamic memory allocations
in bpf programs. Assuming the optimal number of bits the user wants
to use for their bitmap follows something like
num_entries * num_hash_functions / ln(2), I think the bitmap would
have to be dynamically allocated in the bpf program since it'd be too
large to store on the stack, unless there's some other way I'm not seeing?
>>>> It would be a really interesting experiment to implement the same
>>>> logic in pure BPF logic and run it as another benchmark, along the
>>>> Bloom filter map. BPF has both spinlock and atomic operation, so we
>>>> can compare and contrast. We only miss hashing BPF helper.
>>> The one issue I've found writing a hash logic is its a bit tricky
>>> to get the verifier to consume it. Especially when the hash is nested
>>> inside a for loop and sometimes a couple for loops so you end up with
>>> things like,
>>>
>>>   for (i = 0; i < someTlvs; i++) {
>>>    for (j = 0; j < someKeys; i++) {
>>>      ...
>>>      bpf_hash(someValue)
>>>      ...
>>>   }
>>>
>>> I've find small seemingly unrelated changes cause the complexity limit
>>> to explode. Usually we can work around it with code to get pruning
>>> points and such, but its a bit ugly. Perhaps this means we need
>>> to dive into details of why the complexity explodes, but I've not
>>> got to it yet. The todo list is long.
Out of curiosity, why would this helper have trouble in the verifier?
 From a quick glance, it seems like the implementation for it would
be pretty similar to how bpf_get_prandom_u32() is implemented
(except where the arguments for the hash helper would take in a
void* data (ARG_PTR_TO_MEM), the size of the data buffer, and
the seed)? I'm a bit new to bpf, so there's a good chance I might be
completely overlooking something here :)

>>>> Being able to do this in pure BPF code has a bunch of advantages.
>>>> Depending on specific application, users can decide to:
>>>>     - speed up the operation by ditching spinlock or atomic operation,
>>>> if the logic allows to lose some bit updates;
>>>>     - decide on optimal size, which might not be a power of 2, depending
>>>> on memory vs CPU trade of in any particular case;
>>>>     - it's also possible to implement a more general Counting Bloom
>>>> filter, all without modifying the kernel.
>>> Also it means no call and if you build it on top of an array
>>> map of size 1 its just a load. Could this be a performance win for
>>> example a Bloom filter in XDP for DDOS? Maybe. Not sure if the program
>>> would be complex enough a call might be in the noise. I don't know.
>>>
>>>> We could go further, and start implementing other simple data
>>>> structures relying on hashing, like HyperLogLog. And all with no
>>>> kernel modifications. Map-in-map is no issue as well, because there is
>>>> a choice of using either fixed global data arrays for maximum
>>>> performance, or using BPF_MAP_TYPE_ARRAY maps that can go inside
>>>> map-in-map.
>>> We've been doing most of our array maps as single entry arrays
>>> at this point and just calculating offsets directly in BPF. Same
>>> for some simple hashing algorithms.
>>>
>>>> Basically, regardless of having this map in the kernel or not, let's
>>>> have a "universal" hashing function as a BPF helper as well.
>>>> Thoughts?
>>> I like it, but not the bloom filter expert here.
>> Ooh, I like your idea of comparing the performance of the bloom filter with
>> a kernel-provided BPF map vs. custom BPF program logic using a
>> hash helper, especially if a BPF hash helper is something useful that
>> we want to add to the codebase in and of itself!
> I think a hash helper will be useful in general but could it be a
> separate experiment to try using it to implement some bpf maps (probably
> a mix of an easy one and a little harder one) ?

I agree, I think the hash helper implementation should be its own separate
patchset orthogonal to this one.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-03  7:13           ` Joanne Koong
@ 2021-09-03 17:19             ` Andrii Nakryiko
  0 siblings, 0 replies; 23+ messages in thread
From: Andrii Nakryiko @ 2021-09-03 17:19 UTC (permalink / raw)
  To: Joanne Koong; +Cc: Martin KaFai Lau, John Fastabend, bpf

On Fri, Sep 3, 2021 at 12:13 AM Joanne Koong <joannekoong@fb.com> wrote:
>
> On 9/2/21 5:56 PM, Martin KaFai Lau wrote:
>
> > On Thu, Sep 02, 2021 at 03:07:56PM -0700, Joanne Koong wrote:
> > [ ... ]
> >>>> But one high-level point I wanted to discuss was that bloom filter
> >>>> logic is actually simple enough to be implementable by pure BPF
> >>>> program logic. The only problematic part is generic hashing of a piece
> >>>> of memory. Regardless of implementing bloom filter as kernel-provided
> >>>> BPF map or implementing it with custom BPF program logic, having BPF
> >>>> helper for hashing a piece of memory seems extremely useful and very
> >>>> generic. I can't recall if we ever discussed adding such helpers, but
> >>>> maybe we should?
> >>> Aha started typing the same thing :)
> >>>
> >>> Adding generic hash helper has been on my todo list and close to the top
> >>> now. The use case is hashing data from skb payloads and such from kprobe
> >>> and sockmap side. I'm happy to work on it as soon as possible if no one
> >>> else picks it up.
> >>>
> After thinking through this some more, I'm curious to hear your thoughts,
> Andrii and John, on how the bitmap would be allocated. From what I
> understand, we do not currently support dynamic memory allocations
> in bpf programs. Assuming the optimal number of bits the user wants
> to use for their bitmap follows something like
> num_entries * num_hash_functions / ln(2), I think the bitmap would
> have to be dynamically allocated in the bpf program since it'd be too
> large to store on the stack, unless there's some other way I'm not seeing?

You can either use BPF_MAP_TYPE_ARRAY and size it at runtime. Or one
can use compile-time fixed-sized array in BPF program:


u64 bits[HOWEVER_MANY_U64S_WE_NEED];

/* then in BPF program itself */

h = hash(...);
bits[h / 64] |= (1 << (h % 64));

As an example. The latter case avoid map lookups completely, except
you'd need to prove to the verifier that you are not going out of
bounds for bits, which is simple to do if HOWEVER_MANY_U64S_WE_NEED is
power-of-2. Then you can do:

h = hash(...);
bits[(h / 64) & (HOWEVER_MANY_U64S_WE_NEED - 1)] |= (1 << (h % 64));

> >>>> It would be a really interesting experiment to implement the same
> >>>> logic in pure BPF logic and run it as another benchmark, along the
> >>>> Bloom filter map. BPF has both spinlock and atomic operation, so we
> >>>> can compare and contrast. We only miss hashing BPF helper.
> >>> The one issue I've found writing a hash logic is its a bit tricky
> >>> to get the verifier to consume it. Especially when the hash is nested
> >>> inside a for loop and sometimes a couple for loops so you end up with
> >>> things like,
> >>>
> >>>   for (i = 0; i < someTlvs; i++) {
> >>>    for (j = 0; j < someKeys; i++) {
> >>>      ...
> >>>      bpf_hash(someValue)
> >>>      ...
> >>>   }
> >>>
> >>> I've find small seemingly unrelated changes cause the complexity limit
> >>> to explode. Usually we can work around it with code to get pruning

btw, global BPF functions (sub-programs) should limit this complexity
explosion, even if you implement your own hashing function purely in
BPF.

> >>> points and such, but its a bit ugly. Perhaps this means we need
> >>> to dive into details of why the complexity explodes, but I've not
> >>> got to it yet. The todo list is long.
> Out of curiosity, why would this helper have trouble in the verifier?
>  From a quick glance, it seems like the implementation for it would
> be pretty similar to how bpf_get_prandom_u32() is implemented
> (except where the arguments for the hash helper would take in a
> void* data (ARG_PTR_TO_MEM), the size of the data buffer, and
> the seed)? I'm a bit new to bpf, so there's a good chance I might be
> completely overlooking something here :)

Curious as well. I imagine we'd define new helper with this signature:

u64 bpf_hash_mem(void *data, u64 sz, enum bpf_hash_func hash_fn, u64 flags);

Where enum bpf_hash_func { JHASH, MURMUR, CRC32, etc }, whatever is
available in the kernel (or will be added later).

John, would this still cause problems for the verifier?

>
> >>>> Being able to do this in pure BPF code has a bunch of advantages.
> >>>> Depending on specific application, users can decide to:
> >>>>     - speed up the operation by ditching spinlock or atomic operation,
> >>>> if the logic allows to lose some bit updates;
> >>>>     - decide on optimal size, which might not be a power of 2, depending
> >>>> on memory vs CPU trade of in any particular case;
> >>>>     - it's also possible to implement a more general Counting Bloom
> >>>> filter, all without modifying the kernel.
> >>> Also it means no call and if you build it on top of an array
> >>> map of size 1 its just a load. Could this be a performance win for
> >>> example a Bloom filter in XDP for DDOS? Maybe. Not sure if the program
> >>> would be complex enough a call might be in the noise. I don't know.
> >>>
> >>>> We could go further, and start implementing other simple data
> >>>> structures relying on hashing, like HyperLogLog. And all with no
> >>>> kernel modifications. Map-in-map is no issue as well, because there is
> >>>> a choice of using either fixed global data arrays for maximum
> >>>> performance, or using BPF_MAP_TYPE_ARRAY maps that can go inside
> >>>> map-in-map.
> >>> We've been doing most of our array maps as single entry arrays
> >>> at this point and just calculating offsets directly in BPF. Same
> >>> for some simple hashing algorithms.
> >>>
> >>>> Basically, regardless of having this map in the kernel or not, let's
> >>>> have a "universal" hashing function as a BPF helper as well.
> >>>> Thoughts?
> >>> I like it, but not the bloom filter expert here.
> >> Ooh, I like your idea of comparing the performance of the bloom filter with
> >> a kernel-provided BPF map vs. custom BPF program logic using a
> >> hash helper, especially if a BPF hash helper is something useful that
> >> we want to add to the codebase in and of itself!
> > I think a hash helper will be useful in general but could it be a
> > separate experiment to try using it to implement some bpf maps (probably
> > a mix of an easy one and a little harder one) ?
>
> I agree, I think the hash helper implementation should be its own separate
> patchset orthogonal to this one.
>

Sure, I don't feel strongly against having Bloom filter as BPF map.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-02 22:07       ` Joanne Koong
  2021-09-03  0:56         ` Martin KaFai Lau
@ 2021-09-03 17:22         ` John Fastabend
  2021-09-08 19:10           ` Joanne Koong
  1 sibling, 1 reply; 23+ messages in thread
From: John Fastabend @ 2021-09-03 17:22 UTC (permalink / raw)
  To: Joanne Koong, John Fastabend, Andrii Nakryiko; +Cc: bpf

Joanne Koong wrote:
> On 9/1/21 10:11 PM, John Fastabend wrote:
> 
> > Andrii Nakryiko wrote:
> >> On Tue, Aug 31, 2021 at 3:51 PM Joanne Koong <joannekoong@fb.com> wrote:
> >>> Bloom filters are a space-efficient probabilistic data structure
> >>> used to quickly test whether an element exists in a set.
> >>> In a bloom filter, false positives are possible whereas false
> >>> negatives are not.
> >>>
> >>> This patch adds a bloom filter map for bpf programs.
> >>> The bloom filter map supports peek (determining whether an element
> >>> is present in the map) and push (adding an element to the map)
> >>> operations.These operations are exposed to userspace applications
> >>> through the already existing syscalls in the following way:
> >>>
> >>> BPF_MAP_LOOKUP_ELEM -> peek
> >>> BPF_MAP_UPDATE_ELEM -> push
> >>>
> >>> The bloom filter map does not have keys, only values. In light of
> >>> this, the bloom filter map's API matches that of queue stack maps:
> >>> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> >>> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> >>> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> >>> APIs to query or add an element to the bloom filter map. When the
> >>> bloom filter map is created, it must be created with a key_size of 0.
> >>>
> >>> For updates, the user will pass in the element to add to the map
> >>> as the value, wih a NULL key. For lookups, the user will pass in the
> >>> element to query in the map as the value. In the verifier layer, this
> >>> requires us to modify the argument type of a bloom filter's
> >>> BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> >>> the syscall layer, we need to copy over the user value so that in
> >>> bpf_map_peek_elem, we know which specific value to query.
> >>>
> >>> The maximum number of entries in the bloom filter is not enforced; if
> >>> the user wishes to insert more entries into the bloom filter than they
> >>> specified as the max entries size of the bloom filter, that is permitted
> >>> but the performance of their bloom filter will have a higher false
> >>> positive rate.
> >>>
> >>> The number of hashes to use for the bloom filter is configurable from
> >>> userspace. The benchmarks later in this patchset can help compare the
> >>> performances of different number of hashes on different entry
> >>> sizes. In general, using more hashes decreases the speed of a lookup,
> >>> but increases the false positive rate of an element being detected in the
> >>> bloom filter.
> >>>
> >>> Signed-off-by: Joanne Koong <joannekoong@fb.com>

[...]

> >>> +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
> >>> +{
> >>> +       int numa_node = bpf_map_attr_numa_node(attr);
> >>> +       u32 nr_bits, bit_array_bytes, bit_array_mask;
> >>> +       struct bpf_bloom_filter *bloom_filter;
> >>> +
> >>> +       if (!bpf_capable())
> >>> +               return ERR_PTR(-EPERM);
> >>> +
> >>> +       if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
> >>> +           attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
> >>> +           !bpf_map_flags_access_ok(attr->map_flags))
> >>> +               return ERR_PTR(-EINVAL);
> >>> +
> >>> +       /* For the bloom filter, the optimal bit array size that minimizes the
> >>> +        * false positive probability is n * k / ln(2) where n is the number of
> >>> +        * expected entries in the bloom filter and k is the number of hash
> >>> +        * functions. We use 7 / 5 to approximate 1 / ln(2).
> >>> +        *
> >>> +        * We round this up to the nearest power of two to enable more efficient
> >>> +        * hashing using bitmasks. The bitmask will be the bit array size - 1.
> >>> +        *
> >>> +        * If this overflows a u32, the bit array size will have 2^32 (4
> >>> +        * GB) bits.
> > Would it be better to return E2BIG or EINVAL here? Speculating a bit, but if I was
> > a user I might want to know that the number of bits I pushed down is not the actual
> > number?
> 
> I think if we return E2BIG or EINVAL here, this will fail to create the 
> bloom filter map
> if the max_entries exceeds some limit (~3 billion, according to my math) 
> whereas
> automatically setting the bit array size to 2^32 if the max_entries is
> extraordinarily large will still allow the user to create and use a 
> bloom filter (albeit
> one with a higher false positive rate).

It doesn't matter much to me, but I think if a user request 3+billion max entries
its ok to return E2BIG and then they can use a lower limit and know the
false positive rate is going to go up. 

> 
> > Another thought, would it be simpler to let user do this calculation and just let
> > max_elements be number of bits they want? Then we could have examples with the
> > above comment. Just a thought...
> 
> I like Martin's idea of keeping the max_entries meaning consistent 
> across all map types.
> I think that makes the interface clearer for users.

I'm convinced as well, lets keep it consistent. Thanks.

[...]

> >> Also, I wonder if ditching spinlock in favor of atomic bit set
> >> operation would improve performance in typical scenarios. Seems like
> >> set_bit() is an atomic operation, so it should be easy to test. Do you
> >> mind running benchmarks with spinlock and with set_bit()?
> > With the jhash pulled out of lock, I think it might be noticable. Curious
> > to see.
> Awesome, I will test this out and report back!

It looks like the benchmark tests were done with value size of __u64 should
we do larger entry? I guess (you tell me?) if this is used from XDP for
DDOS you would use a flow tuple and with IPv6 this could be
{dstIp, srcIp, sport, dport, proto} with roughly 44B.

> >>> +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
> >>> +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
> >>> +                       bloom_filter->bit_array_mask;
> >>> +               bitmap_set(bloom_filter->bit_array, hash, 1);
> >>> +       }
> >>> +
> >>> +       spin_unlock_irqrestore(&bloom_filter->spinlock, spinlock_flags);
> >>> +
> >>> +       return 0;
> >>> +}
> >>> +
> >> [...]
> >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
  2021-09-03 17:22         ` John Fastabend
@ 2021-09-08 19:10           ` Joanne Koong
  0 siblings, 0 replies; 23+ messages in thread
From: Joanne Koong @ 2021-09-08 19:10 UTC (permalink / raw)
  To: John Fastabend, Andrii Nakryiko; +Cc: bpf

On 9/3/21 10:22 AM, John Fastabend wrote:

> Joanne Koong wrote:
>> On 9/1/21 10:11 PM, John Fastabend wrote:
>>
>>> Andrii Nakryiko wrote:
>>>> On Tue, Aug 31, 2021 at 3:51 PM Joanne Koong <joannekoong@fb.com> wrote:
>>>>> Bloom filters are a space-efficient probabilistic data structure
>>>>> used to quickly test whether an element exists in a set.
>>>>> In a bloom filter, false positives are possible whereas false
>>>>> negatives are not.
>>>>>
>>>>> This patch adds a bloom filter map for bpf programs.
>>>>> The bloom filter map supports peek (determining whether an element
>>>>> is present in the map) and push (adding an element to the map)
>>>>> operations.These operations are exposed to userspace applications
>>>>> through the already existing syscalls in the following way:
>>>>>
>>>>> BPF_MAP_LOOKUP_ELEM -> peek
>>>>> BPF_MAP_UPDATE_ELEM -> push
>>>>>
>>>>> The bloom filter map does not have keys, only values. In light of
>>>>> this, the bloom filter map's API matches that of queue stack maps:
>>>>> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
>>>>> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
>>>>> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
>>>>> APIs to query or add an element to the bloom filter map. When the
>>>>> bloom filter map is created, it must be created with a key_size of 0.
>>>>>
>>>>> For updates, the user will pass in the element to add to the map
>>>>> as the value, wih a NULL key. For lookups, the user will pass in the
>>>>> element to query in the map as the value. In the verifier layer, this
>>>>> requires us to modify the argument type of a bloom filter's
>>>>> BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
>>>>> the syscall layer, we need to copy over the user value so that in
>>>>> bpf_map_peek_elem, we know which specific value to query.
>>>>>
>>>>> The maximum number of entries in the bloom filter is not enforced; if
>>>>> the user wishes to insert more entries into the bloom filter than they
>>>>> specified as the max entries size of the bloom filter, that is permitted
>>>>> but the performance of their bloom filter will have a higher false
>>>>> positive rate.
>>>>>
>>>>> The number of hashes to use for the bloom filter is configurable from
>>>>> userspace. The benchmarks later in this patchset can help compare the
>>>>> performances of different number of hashes on different entry
>>>>> sizes. In general, using more hashes decreases the speed of a lookup,
>>>>> but increases the false positive rate of an element being detected in the
>>>>> bloom filter.
>>>>>
>>>>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
> [...]
>
>>>>> +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
>>>>> +{
>>>>> +       int numa_node = bpf_map_attr_numa_node(attr);
>>>>> +       u32 nr_bits, bit_array_bytes, bit_array_mask;
>>>>> +       struct bpf_bloom_filter *bloom_filter;
>>>>> +
>>>>> +       if (!bpf_capable())
>>>>> +               return ERR_PTR(-EPERM);
>>>>> +
>>>>> +       if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
>>>>> +           attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
>>>>> +           !bpf_map_flags_access_ok(attr->map_flags))
>>>>> +               return ERR_PTR(-EINVAL);
>>>>> +
>>>>> +       /* For the bloom filter, the optimal bit array size that minimizes the
>>>>> +        * false positive probability is n * k / ln(2) where n is the number of
>>>>> +        * expected entries in the bloom filter and k is the number of hash
>>>>> +        * functions. We use 7 / 5 to approximate 1 / ln(2).
>>>>> +        *
>>>>> +        * We round this up to the nearest power of two to enable more efficient
>>>>> +        * hashing using bitmasks. The bitmask will be the bit array size - 1.
>>>>> +        *
>>>>> +        * If this overflows a u32, the bit array size will have 2^32 (4
>>>>> +        * GB) bits.
>>> Would it be better to return E2BIG or EINVAL here? Speculating a bit, but if I was
>>> a user I might want to know that the number of bits I pushed down is not the actual
>>> number?
>> I think if we return E2BIG or EINVAL here, this will fail to create the
>> bloom filter map
>> if the max_entries exceeds some limit (~3 billion, according to my math)
>> whereas
>> automatically setting the bit array size to 2^32 if the max_entries is
>> extraordinarily large will still allow the user to create and use a
>> bloom filter (albeit
>> one with a higher false positive rate).
> It doesn't matter much to me, but I think if a user request 3+billion max entries
> its ok to return E2BIG and then they can use a lower limit and know the
> false positive rate is going to go up.
>
>>> Another thought, would it be simpler to let user do this calculation and just let
>>> max_elements be number of bits they want? Then we could have examples with the
>>> above comment. Just a thought...
>> I like Martin's idea of keeping the max_entries meaning consistent
>> across all map types.
>> I think that makes the interface clearer for users.
> I'm convinced as well, lets keep it consistent. Thanks.
>
> [...]
>
>>>> Also, I wonder if ditching spinlock in favor of atomic bit set
>>>> operation would improve performance in typical scenarios. Seems like
>>>> set_bit() is an atomic operation, so it should be easy to test. Do you
>>>> mind running benchmarks with spinlock and with set_bit()?
>>> With the jhash pulled out of lock, I think it might be noticable. Curious
>>> to see.
>> Awesome, I will test this out and report back!
> It looks like the benchmark tests were done with value size of __u64 should
> we do larger entry? I guess (you tell me?) if this is used from XDP for
> DDOS you would use a flow tuple and with IPv6 this could be
> {dstIp, srcIp, sport, dport, proto} with roughly 44B.

Great suggestion. Alexei mentioned this as well in his earlier reply. I 
am planning to run benchmarks on
the v2 version using value sizes of 4, 8, 16, and 40 bytes.

>>>>> +       for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
>>>>> +               hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
>>>>> +                       bloom_filter->bit_array_mask;
>>>>> +               bitmap_set(bloom_filter->bit_array, hash, 1);
>>>>> +       }
>>>>> +
>>>>> +       spin_unlock_irqrestore(&bloom_filter->spinlock, spinlock_flags);
>>>>> +
>>>>> +       return 0;
>>>>> +}
>>>>> +
>>>> [...]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-09-08 19:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31 22:50 [PATCH bpf-next 0/5] Implement bloom filter map Joanne Koong
2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
2021-09-01  2:55   ` Alexei Starovoitov
2021-09-02 19:11     ` Joanne Koong
2021-09-02  1:44   ` Andrii Nakryiko
2021-09-02  5:11     ` John Fastabend
2021-09-02  6:16       ` Martin KaFai Lau
2021-09-02 22:07       ` Joanne Koong
2021-09-03  0:56         ` Martin KaFai Lau
2021-09-03  7:13           ` Joanne Koong
2021-09-03 17:19             ` Andrii Nakryiko
2021-09-03 17:22         ` John Fastabend
2021-09-08 19:10           ` Joanne Koong
2021-09-02  3:16   ` John Fastabend
2021-09-02  3:28     ` Andrii Nakryiko
2021-09-02  4:40       ` John Fastabend
2021-08-31 22:50 ` [PATCH bpf-next 2/5] libbpf: Allow the number of hashes in bloom filter maps to be configurable Joanne Koong
2021-09-02  3:30   ` Andrii Nakryiko
2021-08-31 22:50 ` [PATCH bpf-next 3/5] selftests/bpf: Add bloom filter map test cases Joanne Koong
2021-09-01  2:55   ` Alexei Starovoitov
2021-08-31 22:50 ` [PATCH bpf-next 4/5] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
2021-09-02  3:35   ` Andrii Nakryiko
2021-08-31 22:50 ` [PATCH bpf-next 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.