bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	John Fastabend <john.fastabend@gmail.com>
Cc: Joanne Koong <joannekoong@fb.com>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
Date: Wed, 01 Sep 2021 21:40:33 -0700	[thread overview]
Message-ID: <613055c16831d_439b20825@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <CAEf4BzZFExb-EQcmvPV2KCc-ey8k6S-9YziY2e2MEE+NOQ9DAw@mail.gmail.com>

Andrii Nakryiko wrote:
> On Wed, Sep 1, 2021 at 8:18 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Joanne Koong wrote:
> > > Bloom filters are a space-efficient probabilistic data structure
> > > used to quickly test whether an element exists in a set.
> > > In a bloom filter, false positives are possible whereas false
> > > negatives are not.
> > >
> > > This patch adds a bloom filter map for bpf programs.
> > > The bloom filter map supports peek (determining whether an element
> > > is present in the map) and push (adding an element to the map)
> > > operations.These operations are exposed to userspace applications
> > > through the already existing syscalls in the following way:
> > >
> > > BPF_MAP_LOOKUP_ELEM -> peek
> > > BPF_MAP_UPDATE_ELEM -> push
> > >
> > > The bloom filter map does not have keys, only values. In light of
> > > this, the bloom filter map's API matches that of queue stack maps:
> > > user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
> > > which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
> > > and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
> > > APIs to query or add an element to the bloom filter map. When the
> > > bloom filter map is created, it must be created with a key_size of 0.
> > >
> > > For updates, the user will pass in the element to add to the map
> > > as the value, wih a NULL key. For lookups, the user will pass in the
> > > element to query in the map as the value. In the verifier layer, this
> > > requires us to modify the argument type of a bloom filter's
> > > BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in
> > > the syscall layer, we need to copy over the user value so that in
> > > bpf_map_peek_elem, we know which specific value to query.
> > >
> > > The maximum number of entries in the bloom filter is not enforced; if
> > > the user wishes to insert more entries into the bloom filter than they
> > > specified as the max entries size of the bloom filter, that is permitted
> > > but the performance of their bloom filter will have a higher false
> > > positive rate.
> >
> > hmm I'm wondering if this means the memory footprint can grow without
> > bounds? Typically maps have an upper bound on memory established at
> > alloc time.
> 
> It's a bit unfortunate wording, but no, the amount of used memory is
> fixed. Bloom filter is a probabilistic data structure in which each
> "value" has few designated bits, determined by hash functions on that
> value. The number of bits is fixed, though. If all designated bits are
> set to 1, then we declare "value" to be present in the Bloom filter.

Thanks actually reading the code helped as well. Looks like a v2
will likely happen perhaps a small note here for maxiumum number of
entries in the bloom filter is only used to estimate the number of
bits used.

I guess if a BPF user did want to enforce the max number of entries
a simple BPF counter wouldn't be much for users to add. I usually
add these to maps for debug/statistic reasons anyways.

> If at least one is 0, then we definitely didn't see "value" yet
> (that's what guarantees no false negatives; this also answers Alexei's
> worry about possible false negative due to unsynchronized update and
> lookup, it can't happen by the nature of the data structure design,
> regardless of synchronization). We can, of course, have all such bits
> set to 1 even if the actual value was never "added" into the Bloom
> filter, just by the nature of hash collisions with other elements'
> hash functions (that's where the false positive comes from). It might
> be useful to just leave a link to Wikipedia for description of Bloom
> filter data structure ([0]).
> 
>   [0] https://en.wikipedia.org/wiki/Bloom_filter

Thanks. Yep needed a refresher for sure.

  reply	other threads:[~2021-09-02  4:40 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-31 22:50 [PATCH bpf-next 0/5] Implement bloom filter map Joanne Koong
2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
2021-09-01  2:55   ` Alexei Starovoitov
2021-09-02 19:11     ` Joanne Koong
2021-09-02  1:44   ` Andrii Nakryiko
2021-09-02  5:11     ` John Fastabend
2021-09-02  6:16       ` Martin KaFai Lau
2021-09-02 22:07       ` Joanne Koong
2021-09-03  0:56         ` Martin KaFai Lau
2021-09-03  7:13           ` Joanne Koong
2021-09-03 17:19             ` Andrii Nakryiko
2021-09-03 17:22         ` John Fastabend
2021-09-08 19:10           ` Joanne Koong
2021-09-02  3:16   ` John Fastabend
2021-09-02  3:28     ` Andrii Nakryiko
2021-09-02  4:40       ` John Fastabend [this message]
2021-08-31 22:50 ` [PATCH bpf-next 2/5] libbpf: Allow the number of hashes in bloom filter maps to be configurable Joanne Koong
2021-09-02  3:30   ` Andrii Nakryiko
2021-08-31 22:50 ` [PATCH bpf-next 3/5] selftests/bpf: Add bloom filter map test cases Joanne Koong
2021-09-01  2:55   ` Alexei Starovoitov
2021-08-31 22:50 ` [PATCH bpf-next 4/5] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
2021-09-02  3:35   ` Andrii Nakryiko
2021-08-31 22:50 ` [PATCH bpf-next 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=613055c16831d_439b20825@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=joannekoong@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).