Re: [PATCH v3 bpf-next 1/5] bpf: Add bloom filter map implementation

From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Joanne Koong <joannekoong@fb.com>,
	Martin KaFai Lau <kafai@fb.com>, bpf <bpf@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 1/5] bpf: Add bloom filter map implementation
Date: Mon, 27 Sep 2021 14:14:09 -0700	[thread overview]
Message-ID: <CAEf4Bzb7bASQ3jXJ3JiKBinnb4ct9Y5pAhn-eVsdCY7rRus8Fg@mail.gmail.com> (raw)
In-Reply-To: <20210927164110.gg33uypguty45huk@ast-mbp.dhcp.thefacebook.com>

On Mon, Sep 27, 2021 at 9:41 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Sep 24, 2021 at 04:12:11PM -0700, Andrii Nakryiko wrote:
> >
> > That's not what I proposed. So let's say somewhere in the kernel we
> > have this variable:
> >
> > static int bpf_bloom_exists = 1;
> >
> > Now, for bpf_map_lookup_elem() helper we pass data as key pointer. If
> > all its hashed bits are set in Bloom filter (it "exists"), we return
> > &bpf_bloom_exists. So it's not a NULL pointer.
>
> imo that's too much of a hack.

too bad, because this feels pretty natural in BPF code:

int my_key = 1234;

if (bpf_map_lookup_elem(&my_bloom_filter, &my_key)) {
    /* handle "maybe exists" */
} else {
    /* handle "definitely doesn't exist" */
}

With map->value_size == 0 verifier should be able to ensure that
whatever non-NULL pointer is returned isn't readable/writable. We
might need to fix some BPF verifier internals if that's not supported
yet, I haven't checked thoroughly.

>
> > Now, despite returning a valid pointer, it would be good to prevent
> > reading/writing from/to it in a valid BPF program. I'm hoping it is as
> > easy as just seetting map->value_size = 0 during map creation. But
> > worst case, we can let BPF program just overwrite that 1 with whatever
> > they want. It doesn't matter because the contract is that
> > bpf_map_lookup_elem() returns non-NULL for "exists" and NULL
> > otherwise.
> >
> > Now, for MAP_LOOKUP_ELEM command on user-space side. I'd say the
> > contract should be 0 return code on "exists" (and nothing is written
> > to value pointer, perhaps even disallow to specify the non-NULL value
> > pointer altogether); -ENOENT, otherwise. Again, worst-case we can
> > specify that "value_size" is 1 or 4 bytes and write 1 to it.
> >
> > Does it make sense?
> >
> >
> > BTW, for STACK/QUEUE maps, I'm not clear why we had to add
> > map_peek_elem/map_pop_elem/map_push_elem operations, to be honest.
> > They could have been pretty reasonably mapped to map_lookup_elem (peek
> > is non-destructive lookup), map_lookup_elem + some flag (pop is
> > lookup, but destructive, so flag allows "destruction"), and
> > map_update_elem (for push). map_delete_elem would be pop for when
> > value can be ignored.
> >
> > That's to say that I don't consider STACK/QUEUE maps as good examples,
> > it's rather a counter-example of maps that barely anyone is using, yet
> > it just adds clutter to BPF internals.
>
> Repurposing lookup/update ops for stack/queue and forcing bpf progs
> to pass dummy key would have looked quite ugly.

The idea was to pass non-NULL *key* pointer. For bpf_map_update_elem()
you'd pass non-NULL key and NULL value. But it's fine, we have
peek/pop/push now, might as well use them. From user-space side it's
still mapped to LOOKUP/UPDATE/DELETE commands, right? BTW, seems like
we have LOOKUP_AND_DELETE command as well which is implemented for
STACK/QUEUE maps, but not BLOOM_FILTER. I guess for consistency we'd
have to implement that one for user-space as well?

> peek/pop/push have one pointer. That pointer points to value.
> For bitset we have single pointer as well.
> So it makes perfect sense to reuse push/pop/peek and keep bitset
> as a value from the verifier side.

Ok.

> bpf_helper_defs.h could have static inline functions:
> bool bpf_bitset_clear(map, key);
> bool bpf_bitset_set(map, key);
> bool bpf_bitset_test(map, key);
>
> But they will map to FUNC_map_pop_elem/push/peek as func ids
> and will be seen as values from the verifier pov.
>
> The bpf progs might think of them as keys, but they will be value_size.
> The bitset creation could use key_size as an input, but internally
> set it as value_size.
> Not sure whether such internal vs uapi remaping will not lead
> to confusion and bugs though.

I think it will and I wouldn't do it.

> I agree that key as an input to bitset_clear/set/test make more sense,
> but the verifier needs value_size to plug into peek/pop infra.

Which is why I initially proposed to not use peek/pop infra and
instead do lookup/update/delete, but if a NULL value pointer passed
into bpf_map_update_elem() disqualifies this, then it's fine. But I
didn't ask to repurpose peek/pop for working with key pointers.

>
> I don't think it's worth to extend ops with yet another 3 callbacks
> and have clean ARG_PTR_TO_UNINIT_MAP_KEY there.
> That is probably too much clutter.
>
> I think
> bool bpf_bitset_clear(map, value);
> bool bpf_bitset_set(map, value);
> bool bpf_bitset_test(map, value);
> doesn't look too bad either.
> At least this way the bit set map_create op will use value_size
> and keep it as value_size. The bpf prog won't be confusing.
> That's my preference, so far.