Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation

From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: Joanne Koong <joannekoong@fb.com>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
Date: Wed, 1 Sep 2021 20:28:10 -0700	[thread overview]
Message-ID: <CAEf4BzZFExb-EQcmvPV2KCc-ey8k6S-9YziY2e2MEE+NOQ9DAw@mail.gmail.com> (raw)
In-Reply-To: <61304218227e8_1aed208dd@john-XPS-13-9370.notmuch>

On Wed, Sep 1, 2021 at 8:18 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Joanne Koong wrote:
> > Bloom filters are a space-efficient probabilistic data structure
> > used to quickly test whether an element exists in a set.
> > In a bloom filter, false positives are possible whereas false
> > negatives are not.
> >
> > This patch adds a bloom filter map for bpf programs.
> > The bloom filter map supports peek (determining whether an element
> > is present in the map) and push (adding an element to the map)
> > operations.These operations are exposed to userspace applications
> > through the already existing syscalls in the following way:
> >
> > BPF_MAP_LOOKUP_ELEM -> peek
> > BPF_MAP_UPDATE_ELEM -> push
> >
> > The bloom filter map does not have keys, only values. In light of
> > this, the bloom filter map's API matches that of queue stack maps:
> > user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> > which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> > and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> > APIs to query or add an element to the bloom filter map. When the
> > bloom filter map is created, it must be created with a key_size of 0.
> >
> > For updates, the user will pass in the element to add to the map
> > as the value, wih a NULL key. For lookups, the user will pass in the
> > element to query in the map as the value. In the verifier layer, this
> > requires us to modify the argument type of a bloom filter's
> > BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> > the syscall layer, we need to copy over the user value so that in
> > bpf_map_peek_elem, we know which specific value to query.
> >
> > The maximum number of entries in the bloom filter is not enforced; if
> > the user wishes to insert more entries into the bloom filter than they
> > specified as the max entries size of the bloom filter, that is permitted
> > but the performance of their bloom filter will have a higher false
> > positive rate.
>
> hmm I'm wondering if this means the memory footprint can grow without
> bounds? Typically maps have an upper bound on memory established at
> alloc time.

It's a bit unfortunate wording, but no, the amount of used memory is
fixed. Bloom filter is a probabilistic data structure in which each
"value" has few designated bits, determined by hash functions on that
value. The number of bits is fixed, though. If all designated bits are
set to 1, then we declare "value" to be present in the Bloom filter.
If at least one is 0, then we definitely didn't see "value" yet
(that's what guarantees no false negatives; this also answers Alexei's
worry about possible false negative due to unsynchronized update and
lookup, it can't happen by the nature of the data structure design,
regardless of synchronization). We can, of course, have all such bits
set to 1 even if the actual value was never "added" into the Bloom
filter, just by the nature of hash collisions with other elements'
hash functions (that's where the false positive comes from). It might
be useful to just leave a link to Wikipedia for description of Bloom
filter data structure ([0]).

  [0] https://en.wikipedia.org/wiki/Bloom_filter

>
> In queue_stack_map_alloc() we have,
>
>  queue_size = sizeof(*qs) + size * attr->value_size);
>  bpf_map_area_alloc(queue_size, numa_node)
>
> In hashmap (not preallocated)  we have, alloc_htab_elem() that will
> give us an upper bound.
>
> Is there a practical value in allowing these to grow endlessly? And
> should we be charging the value memory against something? In
> bpf_map_kmalloc_node we set_active_memcg() for example.
>
> I'll review code as well, but think above is worth some thought.
>
> >
> > The number of hashes to use for the bloom filter is configurable from
> > userspace. The benchmarks later in this patchset can help compare the
> > performances of different number of hashes on different entry
> > sizes. In general, using more hashes decreases the speed of a lookup,
> > but increases the false positive rate of an element being detected in the
> > bloom filter.
> >
> > Signed-off-by: Joanne Koong <joannekoong@fb.com>