All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joanne Koong <joannekoong@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
Date: Thu, 2 Sep 2021 12:11:11 -0700	[thread overview]
Message-ID: <fa1f83f8-2b25-feea-a352-f138c4276e0c@fb.com> (raw)
In-Reply-To: <20210901025507.3hx4wpx3kmtjipad@ast-mbp.dhcp.thefacebook.com>

On 8/31/21 7:55 PM, Alexei Starovoitov wrote:

> On Tue, Aug 31, 2021 at 03:50:01PM -0700, Joanne Koong wrote:
>> +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
>> +{
>> +	struct bpf_bloom_filter *bloom_filter =
>> +		container_of(map, struct bpf_bloom_filter, map);
>> +	u32 i, hash;
>> +
>> +	for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
>> +		hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
>> +			bloom_filter->bit_array_mask;
>> +		if (!test_bit(hash, bloom_filter->bit_array))
>> +			return -ENOENT;
>> +	}
> I'm curious what bloom filter theory says about n-hashes > 1
> concurrent access with updates in terms of false negative?
> Two concurrent updates race is protected by spin_lock,
> but what about peek and update?
> The update might set one bit, but not the other.
> That shouldn't trigger false negative lookup, right?

For cases where there is a concurrent peek and update, the user is 
responsible for
synchronizing these operations if they want to ensure that the peek will 
always return
true while the update is occurring.
I will add this to the commit message.

> Is bloom filter supported as inner map?
> Hash and lru maps are often used as inner maps.
> The lookups from them would be pre-filtered by bloom filter
> map that would have to be (in some cases) inner map.
> I suspect one bloom filter for all inner maps might be
> reasonable workaround in some cases too.
> The delete is not supported in bloom filter, of course.
> Would be good to mention it in the commit log.
> Since there is no delete the users would likely need
> to replace the whole bloom filter. So map-in-map would
> become necessary.
> Do you think 'clear-all' operation might be useful for bloom filter?
> It feels that if map-in-map is supported then clear-all is probably
> not that useful, since atomic replacement and delete of the map
> would work better. 'clear-all' will have issues with
> lookup, since it cannot be done in parallel.
> Would be good to document all these ideas and restrictions.

The bloom filter is supported as an inner map. I will include a test for 
this and add it to v2
(and document this in the commit message in v2)

> Could you collect 'perf annotate' data for the above performance
> critical loop?
> I wonder whether using jhash2 and forcing u32 value size could speed it up.
> Probably not, but would be good to check, since restricting value_size
> later would be problematic due to backward compatibility.
>
> The recommended nr_hashes=3 was computed with value_size=8, right?
> I wonder whether nr_hashes would be different for value_size=16 and =4
> which are ipv6/ipv4 addresses and value_size = 40
> an approximation of networking n-tuple.
Great suggestions! I will do all of these you mentioned, after I 
incorporate the edits for v2,
and report back with the results.
>> +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
>> +{
>> +	int numa_node = bpf_map_attr_numa_node(attr);
>> +	u32 nr_bits, bit_array_bytes, bit_array_mask;
>> +	struct bpf_bloom_filter *bloom_filter;
>> +
>> +	if (!bpf_capable())
>> +		return ERR_PTR(-EPERM);
>> +
>> +	if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
>> +	    attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||
> Would it make sense to default to nr_hashes=3 if zero is passed?
> This way the libbpf changes for nr_hashes will become 'optional'.
> Most users wouldn't have to specify it explicitly.

I like this idea - it'll make the API more friendly to use as well for 
people who
might not be acquainted with bloom filters :)

>
> Overall looks great!
> Performance numbers are impressive.

  reply	other threads:[~2021-09-02 19:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-31 22:50 [PATCH bpf-next 0/5] Implement bloom filter map Joanne Koong
2021-08-31 22:50 ` [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation Joanne Koong
2021-09-01  2:55   ` Alexei Starovoitov
2021-09-02 19:11     ` Joanne Koong [this message]
2021-09-02  1:44   ` Andrii Nakryiko
2021-09-02  5:11     ` John Fastabend
2021-09-02  6:16       ` Martin KaFai Lau
2021-09-02 22:07       ` Joanne Koong
2021-09-03  0:56         ` Martin KaFai Lau
2021-09-03  7:13           ` Joanne Koong
2021-09-03 17:19             ` Andrii Nakryiko
2021-09-03 17:22         ` John Fastabend
2021-09-08 19:10           ` Joanne Koong
2021-09-02  3:16   ` John Fastabend
2021-09-02  3:28     ` Andrii Nakryiko
2021-09-02  4:40       ` John Fastabend
2021-08-31 22:50 ` [PATCH bpf-next 2/5] libbpf: Allow the number of hashes in bloom filter maps to be configurable Joanne Koong
2021-09-02  3:30   ` Andrii Nakryiko
2021-08-31 22:50 ` [PATCH bpf-next 3/5] selftests/bpf: Add bloom filter map test cases Joanne Koong
2021-09-01  2:55   ` Alexei Starovoitov
2021-08-31 22:50 ` [PATCH bpf-next 4/5] bpf/benchs: Add benchmark test for bloom filter maps Joanne Koong
2021-09-02  3:35   ` Andrii Nakryiko
2021-08-31 22:50 ` [PATCH bpf-next 5/5] bpf/benchs: Add benchmarks for comparing hashmap lookups with vs. without bloom filter Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa1f83f8-2b25-feea-a352-f138c4276e0c@fb.com \
    --to=joannekoong@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.