Re: [PATCH v3 bpf-next 1/5] bpf: Add bloom filter map implementation

From: Joanne Koong <joannekoong@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 1/5] bpf: Add bloom filter map implementation
Date: Wed, 22 Sep 2021 12:06:02 -0700	[thread overview]
Message-ID: <517a137d-66aa-8aa8-a064-fad8ae0c7fa8@fb.com> (raw)
In-Reply-To: <CAEf4BzZfeGGv+gBbfBJq5W8eQESgdqeNaByk-agOgMaB8BjQhA@mail.gmail.com>


On 9/21/21 4:44 PM, Andrii Nakryiko wrote:
> On Tue, Sep 21, 2021 at 2:30 PM Joanne Koong <joannekoong@fb.com> wrote:
>> Bloom filters are a space-efficient probabilistic data structure
>> used to quickly test whether an element exists in a set.
>> In a bloom filter, false positives are possible whereas false
>> negatives should never be.
>>
>> This patch adds a bloom filter map for bpf programs.
>> The bloom filter map supports peek (determining whether an element
>> is present in the map) and push (adding an element to the map)
>> operations.These operations are exposed to userspace applications
>> through the already existing syscalls in the following way:
>>
>> BPF_MAP_LOOKUP_ELEM -> peek
>> BPF_MAP_UPDATE_ELEM -> push
>>
>> The bloom filter map does not have keys, only values. In light of
>> this, the bloom filter map's API matches that of queue stack maps:
>> user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
>> which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
>> and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
>> APIs to query or add an element to the bloom filter map. When the
>> bloom filter map is created, it must be created with a key_size of 0.
>>
>> For updates, the user will pass in the element to add to the map
>> as the value, with a NULL key. For lookups, the user will pass in the
>> element to query in the map as the value. In the verifier layer, this
>> requires us to modify the argument type of a bloom filter's
>> BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
>> the syscall layer, we need to copy over the user value so that in
>> bpf_map_peek_elem, we know which specific value to query.
>>
>> A few things to please take note of:
>>   * If there are any concurrent lookups + updates, the user is
>> responsible for synchronizing this to ensure no false negative lookups
>> occur.
>>   * The number of hashes to use for the bloom filter is configurable from
>> userspace. If no number is specified, the default used will be 5 hash
>> functions. The benchmarks later in this patchset can help compare the
>> performance of using different number of hashes on different entry
>> sizes. In general, using more hashes decreases the speed of a lookup,
>> but increases the false positive rate of an element being detected in the
>> bloom filter.
>>   * Deleting an element in the bloom filter map is not supported.
>>   * The bloom filter map may be used as an inner map.
>>   * The "max_entries" size that is specified at map creation time is used to
>> approximate a reasonable bitmap size for the bloom filter, and is not
>> otherwise strictly enforced. If the user wishes to insert more entries into
>> the bloom filter than "max_entries", they may do so but they should be
>> aware that this may lead to a higher false positive rate.
>>
>> Signed-off-by: Joanne Koong <joannekoong@fb.com>
>> ---
>>   include/linux/bpf_types.h      |   1 +
>>   include/uapi/linux/bpf.h       |   1 +
>>   kernel/bpf/Makefile            |   2 +-
>>   kernel/bpf/bloom_filter.c      | 185 +++++++++++++++++++++++++++++++++
>>   kernel/bpf/syscall.c           |  14 ++-
>>   kernel/bpf/verifier.c          |  19 +++-
>>   tools/include/uapi/linux/bpf.h |   1 +
>>   7 files changed, 217 insertions(+), 6 deletions(-)
>>   create mode 100644 kernel/bpf/bloom_filter.c
>>
> See some stylistic nitpicking below (and not a nitpicking about BTF).
>
> But I just wanted to say that I'm a bit amazed by how much special
> casing this BLOOM_FILTER map requires in syscall.c and verifier.c. I
> still believe that starting with a BPF helper for hashing would be a
> better approach, but oh well.
>
> [...]
I liked your comment on v1 regarding using a BPF helper and I agree with 
the benefits
you outlined. I'm curious to see what the performance differences 
between that approach
and this one end up being, if any. I plan to test out the BPF helper 
approach in a few weeks,
and if the performance is comparable or better, I am definitely open to 
reverting this code
and just going with the BPF helper approach :)
>> +
>> +static inline u32 hash(struct bpf_bloom_filter *bloom_filter, void *value,
>> +                      u64 value_size, u32 index)
>> +{
>> +       if (bloom_filter->aligned_u32_count)
>> +               return jhash2(value, bloom_filter->aligned_u32_count,
>> +                             bloom_filter->hash_seed + index) &
>> +                       bloom_filter->bit_array_mask;
>> +
>> +       return jhash(value, value_size, bloom_filter->hash_seed + index) &
>> +               bloom_filter->bit_array_mask;
> stylistic nit, but this feels way to dense text-wise, this seems
> easier to follow
>
> u32 h;
>
> if (bloom_filter->aligned_u32_count)
>      h = jhash2(...);
> else
>      h = jhash(...);
> return h & bloom_filter->bit_array_mask;
>
> WDYT?
I think this sounds great! I will make these changes for v4.
>> +}
>> +
>> +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
>> +{
>> +       struct bpf_bloom_filter *bloom_filter =
>> +               container_of(map, struct bpf_bloom_filter, map);
>> +       u32 i;
>> +
>> +       for (i = 0; i < bloom_filter->nr_hash_funcs; i++) {
>> +               if (!test_bit(hash(bloom_filter, value, map->value_size, i),
>> +                             bloom_filter->bit_array))
>> +                       return -ENOENT;
> same here, I think the hash calculation deserves a separate statement
> and a local variable
>
>> +       }
>> +
>> +       return 0;
>> +}
>> +
> [...]
>
>> +static void bloom_filter_map_free(struct bpf_map *map)
>> +{
>> +       struct bpf_bloom_filter *bloom_filter =
>> +               container_of(map, struct bpf_bloom_filter, map);
>> +
>> +       bpf_map_area_free(bloom_filter);
>> +}
>> +
>> +static int bloom_filter_map_push_elem(struct bpf_map *map, void *value,
>> +                                     u64 flags)
>> +{
>> +       struct bpf_bloom_filter *bloom_filter =
>> +               container_of(map, struct bpf_bloom_filter, map);
>> +       u32 i;
>> +
>> +       if (flags != BPF_ANY)
>> +               return -EINVAL;
>> +
>> +       for (i = 0; i < bloom_filter->nr_hash_funcs; i++)
>> +               set_bit(hash(bloom_filter, value, map->value_size, i),
>> +                       bloom_filter->bit_array);
> same as above about hash() call on separate line
>
>> +
>> +       return 0;
>> +}
>> +
>> +static void *bloom_filter_map_lookup_elem(struct bpf_map *map, void *key)
>> +{
>> +       /* The eBPF program should use map_peek_elem instead */
>> +       return ERR_PTR(-EINVAL);
>> +}
>> +
>> +static int bloom_filter_map_update_elem(struct bpf_map *map, void *key,
>> +                                       void *value, u64 flags)
>> +{
>> +       /* The eBPF program should use map_push_elem instead */
>> +       return -EINVAL;
>> +}
>> +
>> +static int bloom_filter_map_delete_elem(struct bpf_map *map, void *key)
>> +{
>> +       return -EOPNOTSUPP;
>> +}
>> +
>> +static int bloom_filter_map_get_next_key(struct bpf_map *map, void *key,
>> +                                        void *next_key)
>> +{
>> +       return -EOPNOTSUPP;
>> +}
>> +
>> +static int bloom_filter_map_btf_id;
>> +const struct bpf_map_ops bloom_filter_map_ops = {
>> +       .map_meta_equal = bpf_map_meta_equal,
>> +       .map_alloc = bloom_filter_map_alloc,
>> +       .map_free = bloom_filter_map_free,
>> +       .map_push_elem = bloom_filter_map_push_elem,
>> +       .map_peek_elem = bloom_filter_map_peek_elem,
>> +       .map_lookup_elem = bloom_filter_map_lookup_elem,
>> +       .map_update_elem = bloom_filter_map_update_elem,
>> +       .map_delete_elem = bloom_filter_map_delete_elem,
>> +       .map_get_next_key = bloom_filter_map_get_next_key,
>> +       .map_check_btf = map_check_no_btf,
> can you please implement basically a no-op callback here to allow
> specifying btf_value_id, there is no good reason to restrict this new
> map to not allow BTF type being specified for its value
Sounds great, will add this in v4.
>> +       .map_btf_name = "bpf_bloom_filter",
>> +       .map_btf_id = &bloom_filter_map_btf_id,
>> +};
> [...]