Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Joanne Koong <joannekoong@fb.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next 1/5] bpf: Add bloom filter map implementation
Date: Tue, 31 Aug 2021 19:55:07 -0700	[thread overview]
Message-ID: <20210901025507.3hx4wpx3kmtjipad@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20210831225005.2762202-2-joannekoong@fb.com>

On Tue, Aug 31, 2021 at 03:50:01PM -0700, Joanne Koong wrote:
> +static int bloom_filter_map_peek_elem(struct bpf_map *map, void *value)
> +{
> +	struct bpf_bloom_filter *bloom_filter =
> +		container_of(map, struct bpf_bloom_filter, map);
> +	u32 i, hash;
> +
> +	for (i = 0; i < bloom_filter->map.nr_hashes; i++) {
> +		hash = jhash(value, map->value_size, bloom_filter->hash_seed + i) &
> +			bloom_filter->bit_array_mask;
> +		if (!test_bit(hash, bloom_filter->bit_array))
> +			return -ENOENT;
> +	}

I'm curious what bloom filter theory says about n-hashes > 1
concurrent access with updates in terms of false negative?
Two concurrent updates race is protected by spin_lock,
but what about peek and update?
The update might set one bit, but not the other.
That shouldn't trigger false negative lookup, right?

Is bloom filter supported as inner map?
Hash and lru maps are often used as inner maps.
The lookups from them would be pre-filtered by bloom filter
map that would have to be (in some cases) inner map.
I suspect one bloom filter for all inner maps might be
reasonable workaround in some cases too.
The delete is not supported in bloom filter, of course.
Would be good to mention it in the commit log.
Since there is no delete the users would likely need
to replace the whole bloom filter. So map-in-map would
become necessary.
Do you think 'clear-all' operation might be useful for bloom filter?
It feels that if map-in-map is supported then clear-all is probably
not that useful, since atomic replacement and delete of the map
would work better. 'clear-all' will have issues with
lookup, since it cannot be done in parallel.
Would be good to document all these ideas and restrictions.

Could you collect 'perf annotate' data for the above performance
critical loop?
I wonder whether using jhash2 and forcing u32 value size could speed it up.
Probably not, but would be good to check, since restricting value_size
later would be problematic due to backward compatibility.

The recommended nr_hashes=3 was computed with value_size=8, right?
I wonder whether nr_hashes would be different for value_size=16 and =4
which are ipv6/ipv4 addresses and value_size = 40
an approximation of networking n-tuple.

> +static struct bpf_map *bloom_filter_map_alloc(union bpf_attr *attr)
> +{
> +	int numa_node = bpf_map_attr_numa_node(attr);
> +	u32 nr_bits, bit_array_bytes, bit_array_mask;
> +	struct bpf_bloom_filter *bloom_filter;
> +
> +	if (!bpf_capable())
> +		return ERR_PTR(-EPERM);
> +
> +	if (attr->key_size != 0 || attr->value_size == 0 || attr->max_entries == 0 ||
> +	    attr->nr_hashes == 0 || attr->map_flags & ~BLOOM_FILTER_CREATE_FLAG_MASK ||

Would it make sense to default to nr_hashes=3 if zero is passed?
This way the libbpf changes for nr_hashes will become 'optional'.
Most users wouldn't have to specify it explicitly.

Overall looks great!
Performance numbers are impressive.