Re: [PATCH v4 bpf-next] bpf: add lookup_and_delete_elem support to hashtab

From: Yonghong Song <yhs@fb.com>
To: Denis Salopek <denis.salopek@sartura.hr>, <bpf@vger.kernel.org>
Cc: <juraj.vijtiuk@sartura.hr>, <luka.oreskovic@sartura.hr>,
	<luka.perkov@sartura.hr>, <daniel@iogearbox.net>,
	<andrii.nakryiko@gmail.com>
Subject: Re: [PATCH v4 bpf-next] bpf: add lookup_and_delete_elem support to hashtab
Date: Sun, 4 Apr 2021 22:42:49 -0700	[thread overview]
Message-ID: <9325297a-66b8-eb26-22cb-aa4a740e8df6@fb.com> (raw)
In-Reply-To: <ce69af50-3667-c52d-1f1a-b924bbe0fc58@fb.com>

On 4/4/21 9:47 PM, Yonghong Song wrote:
> 
> 
> On 3/29/21 5:57 AM, Denis Salopek wrote:
>> Extend the existing bpf_map_lookup_and_delete_elem() functionality to
>> hashtab maps, in addition to stacks and queues.
>> Add bpf_map_lookup_and_delete_elem_flags() libbpf API in order to use
>> the BPF_F_LOCK flag.
>> Create a new hashtab bpf_map_ops function that does lookup and deletion
>> of the element under the same bucket lock and add the created map_ops to
>> bpf.h.
>> Add the appropriate test cases to 'maps' and 'lru_map' selftests
>> accompanied with new test_progs.
>>
>> Cc: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
>> Cc: Luka Oreskovic <luka.oreskovic@sartura.hr>
>> Cc: Luka Perkov <luka.perkov@sartura.hr>
>> Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
>> ---
>> v2: Add functionality for LRU/per-CPU, add test_progs tests.
>> v3: Add bpf_map_lookup_and_delete_elem_flags() and enable BPF_F_LOCK
>> flag, change CHECKs to ASSERT_OKs, initialize variables to 0.
>> v4: Fix the return value for unsupported map types.
>> ---
>>   include/linux/bpf.h                           |   2 +
>>   kernel/bpf/hashtab.c                          |  97 ++++++
>>   kernel/bpf/syscall.c                          |  27 +-
>>   tools/lib/bpf/bpf.c                           |  13 +
>>   tools/lib/bpf/bpf.h                           |   2 +
>>   tools/lib/bpf/libbpf.map                      |   1 +
>>   .../bpf/prog_tests/lookup_and_delete.c        | 279 ++++++++++++++++++
>>   .../bpf/progs/test_lookup_and_delete.c        |  26 ++
>>   tools/testing/selftests/bpf/test_lru_map.c    |   8 +
>>   tools/testing/selftests/bpf/test_maps.c       |  19 +-
>>   10 files changed, 469 insertions(+), 5 deletions(-)
>>   create mode 100644 
>> tools/testing/selftests/bpf/prog_tests/lookup_and_delete.c
>>   create mode 100644 
>> tools/testing/selftests/bpf/progs/test_lookup_and_delete.c
> 
> Since another revision is needed, could you break the patch to several 
> commits which will make it easy to review?
> Patch 1: kernel + uapi header:
>    include/linux/bpf.h
>    kernel/bpf/hashtab.c
>    kernel/bpf/syscall.c
>    include/uapi/linux/bpf.h and tools/include/uapi/linux/bpf.h (see below)
> Patch 2: libbpf change
>    tools/lib/bpf/bpf.{c,h}, libbpf.map
> Patch 3: selftests/bpf change
>    tools/testing/selftests/bpf/...
> 
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 9fdd839b418c..8af4bd7c7229 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -68,6 +68,8 @@ struct bpf_map_ops {
>>       void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
>>       int (*map_lookup_batch)(struct bpf_map *map, const union 
>> bpf_attr *attr,
>>                   union bpf_attr __user *uattr);
>> +    int (*map_lookup_and_delete_elem)(struct bpf_map *map, void *key,
>> +                      void *value, u64 flags);
>>       int (*map_lookup_and_delete_batch)(struct bpf_map *map,
>>                          const union bpf_attr *attr,
>>                          union bpf_attr __user *uattr);
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index d7ebb12ffffc..0d2085ce9a38 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -1401,6 +1401,99 @@ static void htab_map_seq_show_elem(struct 
>> bpf_map *map, void *key,
>>       rcu_read_unlock();
>>   }
>> +static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, 
>> void *key,
>> +                         void *value, bool is_lru_map,
>> +                         bool is_percpu, u64 flags)
>> +{
>> +    struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
>> +    u32 hash, key_size, value_size;
>> +    struct hlist_nulls_head *head;
>> +    int cpu, off = 0, ret;
>> +    struct htab_elem *l;
>> +    unsigned long bflags;
>> +    void __percpu *pptr;
>> +    struct bucket *b;
>> +
>> +    if ((flags & ~BPF_F_LOCK) ||
>> +        ((flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
>> +        return -EINVAL;
>> +
>> +    key_size = map->key_size;
>> +    value_size = round_up(map->value_size, 8);
> 
> This value_size is actually a round_up size, so maybe
> rename it as "roundup_size". Also, I see it is only

Sorry, I actually mean "roundup_value_size" here.

> used in is_percpu case. It would be good if it can be
> moved inside is_percpu branch.
> 
>> +
>> +    hash = htab_map_hash(key, key_size, htab->hashrnd);
>> +    b = __select_bucket(htab, hash);
>> +    head = &b->head;
>> +
>> +    ret = htab_lock_bucket(htab, b, hash, &bflags);
>> +    if (ret)
>> +        return ret;
>> +
>> +    l = lookup_elem_raw(head, hash, key, key_size);
>> +    if (l) {
>> +        if (is_percpu) {
>> +            pptr = htab_elem_get_ptr(l, key_size);
>> +            for_each_possible_cpu(cpu) {
>> +                bpf_long_memcpy(value + off,
>> +                        per_cpu_ptr(pptr, cpu),
>> +                        value_size);
>> +                off += value_size;
>> +            }
>> +        } else {
>> +            if (flags & BPF_F_LOCK)
>> +                copy_map_value_locked(map, value, l->key +
>> +                              round_up(key_size, 8),
>> +                              true);
>> +            else
>> +                copy_map_value(map, value, l->key +
>> +                           round_up(key_size, 8));
>> +            check_and_init_map_lock(map, value);
>> +        }
>> +
>> +        hlist_nulls_del_rcu(&l->hash_node);
>> +        if (!is_lru_map)
>> +            free_htab_elem(htab, l);
>> +    } else
>> +        ret = -ENOENT;
>> +
>> +    htab_unlock_bucket(htab, b, hash, bflags);
>> +
>> +    if (is_lru_map && l)
>> +        bpf_lru_push_free(&htab->lru, &l->lru_node);
>> +
>> +    return ret;
>> +}
>> +
[...]