Re: [PATCH v3 bpf-next 1/5] bpf: Add bloom filter map implementation

From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Martin KaFai Lau <kafai@fb.com>
Cc: Joanne Koong <joannekoong@fb.com>, bpf <bpf@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 1/5] bpf: Add bloom filter map implementation
Date: Wed, 22 Sep 2021 16:07:52 -0700	[thread overview]
Message-ID: <CAEf4BzaQ42NTx9tcP43N-+SkXbFin9U+jSVy6HAmO8e+Cci5Dw@mail.gmail.com> (raw)
In-Reply-To: <20210922220844.ihzoapwytaz2o7nn@kafai-mbp.dhcp.thefacebook.com>

On Wed, Sep 22, 2021 at 3:08 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Wed, Sep 22, 2021 at 01:52:12PM -0700, Andrii Nakryiko wrote:
> > > Agree that a generic hash helper is in general useful.  It may be
> > > useful in hashing the skb also.  The bpf prog only implementation could
> > > have more flexibility in configuring roundup to pow2 or not, how to hash,
> > > how many hashes, nr of bits ...etc.  In the mean time, the bpf prog and
> >
> > Exactly. If I know better how many bits I need, I'll have to reverse
> > engineer kernel's heuristic to provide such max_entries values to
> > arrive at the desired amount of memory that Bloom filter will be
> > using.
> Good point. I don't think it needs to guess.  The formula is stable
> and publicly known also.  The formula comment from kernel/bpf/bloom_filter.c
> should be moved to the include/uapi/linux/bpf.h.

By guessing I mean you'd need to invert the formula to calculate the
necessary max_entries you'd need just to size bloom filter to the
desired size. At which point max_entries won't really mean much.

Also the whole "let's choose 5 as number of hash functions" approach
seems pretty arbitrary and not explained. How was this chosen? Based
on one very particular benchmark? How can we tell it's good for most
use cases? How did we arrive at 5 and not 3 or 4?

Looking at [0], there are many ways to go about sizing Bloom filter. 5
hash functions, assuming the wikipedia formula we are using (N * K *
ln(2)), and according to "This means that for a given false positive
probability ε, the length of a Bloom filter m is proportionate to the
number of elements being filtered n and the required number of hash
functions only depends on the target false positive probability ε."
(from [1]) and corresponding formula, gives a false positive
probability or around 3%, if my math is right. Did we state those 3%
anywhere and how we arrived at them?

But there are multiple parameters involved in this decision, they are
interconnected, and different subsets of them might be driving user's
choice:
  - number of expected unique elements;
  - number of bits allocated;
  - acceptable false positive rate;
  - number of hash functions.

What if I want a false positive probability of 1%, or maybe 10%, or
0.1%, but not 3%? What if I'm concerned about memory usage and rather
save some memory at the expense of more false positives? Or calling
too many hash functions is prohibitive, but I can allocate more memory
to reduce the chance of hash collisions. There are many ways to slice
this cat. Kernel implementation makes *some* assumptions with little
justification and explanation. It's opinionated in one place (M * N *
ln(2)), but leaves it up to the user to make sense of what number of
functions it needs (K = -log2(eps)). Feels quite unusual and
underspecified for the kernel data structure. It would be probably
good to provide a bit of guidance in map description, I suppose.

Anyways, I've spent way too much time on this. It was fun, but I'll
shut up and go do something else.

  [0] https://en.wikipedia.org/wiki/Bloom_filter
  [1] https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions

>
> > > user space need to co-ordinate more and worry about more things,
> > > e.g. how to reuse a bloom filter with different nr_hashes,
> > > nr_bits, handle synchronization...etc.
> >
> > Please see my RFC ([0]). I don't think there is much to coordinate. It
> > could be purely BPF-side code, or BPF + user-space initialization
> > code, depending on the need. It's a simple and beautiful algorithm,
> > which BPF is powerful enough to implement customly and easily.
> >
> >   [0] https://lore.kernel.org/bpf/20210922203224.912809-1-andrii@kernel.org/T/#t
> In practice, the bloom filter will be populated only once by the userspace.
>
> The future update will be done by map-in-map to replace the whole bloom filter.
> May be with more max_entries with more nr_hashes.  May be fewer
> max_entries with fewer nr_hashes.
>
> Currently, the continuous running bpf prog using this bloom filter does
> not need to worry about any change in the newer bloom filter
> configure/setup.
>
> I wonder how that may look like in the custom bpf bloom filter in the
> bench prog for the map-in-map usage.

You'd have to use BPF_MAP_TYPE_ARRAY for the map-in-map use case.

>
> >
> > >
> > > It is useful to have a default implementation in the kernel
> > > for some useful maps like this one that works for most
> > > common cases and the bpf user can just use it as get-and-go
> > > like all other common bpf maps do.
> >
> > I disagree with the premise that Bloom filter is a common and
> > generally useful data structure, tbh. It has its nice niche
> > applications, but its semantics isn't applicable generally, which is
> > why I hesitate to claim that this should live in kernel.
> I don't agree the application is nice niche.  I have encountered this
> many times when bumping into networking usecase discussion and not
> necessary limited to security usage also.  Yes, it is not a link-list
> like data structure but its usage is very common.