Re: [PATCH v4 bpf-next] bpf: add lookup_and_delete_elem support to hashtab

From: Yonghong Song <yhs@fb.com>
To: Denis Salopek <denis.salopek@sartura.hr>
Cc: <bpf@vger.kernel.org>, <juraj.vijtiuk@sartura.hr>,
	<luka.oreskovic@sartura.hr>, <luka.perkov@sartura.hr>,
	<daniel@iogearbox.net>, <andrii.nakryiko@gmail.com>
Subject: Re: [PATCH v4 bpf-next] bpf: add lookup_and_delete_elem support to hashtab
Date: Mon, 12 Apr 2021 07:19:37 -0700	[thread overview]
Message-ID: <34014e95-57c3-dbc4-6899-8765f4145cfa@fb.com> (raw)
In-Reply-To: <YHPtBett3HLzhHNc@gmail.com>

On 4/11/21 11:47 PM, Denis Salopek wrote:
> On Sun, Apr 04, 2021 at 09:47:30PM -0700, Yonghong Song wrote:
>>
>>
>> On 3/29/21 5:57 AM, Denis Salopek wrote:
>>> Extend the existing bpf_map_lookup_and_delete_elem() functionality to
>>> hashtab maps, in addition to stacks and queues.
>>> Add bpf_map_lookup_and_delete_elem_flags() libbpf API in order to use
>>> the BPF_F_LOCK flag.
>>> Create a new hashtab bpf_map_ops function that does lookup and deletion
>>> of the element under the same bucket lock and add the created map_ops to
>>> bpf.h.
>>> Add the appropriate test cases to 'maps' and 'lru_map' selftests
>>> accompanied with new test_progs.
>>>
>>> Cc: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
>>> Cc: Luka Oreskovic <luka.oreskovic@sartura.hr>
>>> Cc: Luka Perkov <luka.perkov@sartura.hr>
>>> Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
>>> ---
>>> v2: Add functionality for LRU/per-CPU, add test_progs tests.
>>> v3: Add bpf_map_lookup_and_delete_elem_flags() and enable BPF_F_LOCK
>>> flag, change CHECKs to ASSERT_OKs, initialize variables to 0.
>>> v4: Fix the return value for unsupported map types.
>>> ---
>>>    include/linux/bpf.h                           |   2 +
>>>    kernel/bpf/hashtab.c                          |  97 ++++++
>>>    kernel/bpf/syscall.c                          |  27 +-
>>>    tools/lib/bpf/bpf.c                           |  13 +
>>>    tools/lib/bpf/bpf.h                           |   2 +
>>>    tools/lib/bpf/libbpf.map                      |   1 +
>>>    .../bpf/prog_tests/lookup_and_delete.c        | 279 ++++++++++++++++++
>>>    .../bpf/progs/test_lookup_and_delete.c        |  26 ++
>>>    tools/testing/selftests/bpf/test_lru_map.c    |   8 +
>>>    tools/testing/selftests/bpf/test_maps.c       |  19 +-
>>>    10 files changed, 469 insertions(+), 5 deletions(-)
>>>    create mode 100644 tools/testing/selftests/bpf/prog_tests/lookup_and_delete.c
>>>    create mode 100644 tools/testing/selftests/bpf/progs/test_lookup_and_delete.c
>>
>> Since another revision is needed, could you break the patch to several
>> commits which will make it easy to review?
>> Patch 1: kernel + uapi header:
>>     include/linux/bpf.h
>>     kernel/bpf/hashtab.c
>>     kernel/bpf/syscall.c
>>     include/uapi/linux/bpf.h and tools/include/uapi/linux/bpf.h (see below)
>> Patch 2: libbpf change
>>     tools/lib/bpf/bpf.{c,h}, libbpf.map
>> Patch 3: selftests/bpf change
>>     tools/testing/selftests/bpf/...
>>
> 
> Sure, I'll do that.
> 
[...]
>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>> index 9603de81811a..e3851bafb603 100644
>>> --- a/kernel/bpf/syscall.c
>>> +++ b/kernel/bpf/syscall.c
>>> @@ -1468,7 +1468,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
>>>    	return err;
>>>    }
>>> -#define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value
>>> +#define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD flags
>>>    static int map_lookup_and_delete_elem(union bpf_attr *attr)
>>>    {
>>> @@ -1484,6 +1484,9 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>>>    	if (CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM))
>>>    		return -EINVAL;
>>> +	if (attr->flags & ~BPF_F_LOCK)
>>> +		return -EINVAL; > +
>>>    	f = fdget(ufd);
>>>    	map = __bpf_map_get(f);
>>>    	if (IS_ERR(map))
>>> @@ -1494,24 +1497,40 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
>>>    		goto err_put;
>>>    	}
>>
>> Previously, for QUEUE and STACK map, flags are not allowed, and libbpf
>> sets it as 0. Let us enforce attr->flags with 0 here for QUEUE and STACK.
>>
> 
> Ok, so just to make this clear: if map_type is QUEUE or STACK, and if
> attr->flags != 0, return -EINVAL?

Yes.

> 
>>> +	if ((attr->flags & BPF_F_LOCK) &&
>>> +	    !map_value_has_spin_lock(map)) {
>>> +		err = -EINVAL;
>>> +		goto err_put;
>>> +	}
>>> +
>>>    	key = __bpf_copy_key(ukey, map->key_size);
>>>    	if (IS_ERR(key)) {
>>>    		err = PTR_ERR(key);
>>>    		goto err_put;
>>>    	}
>>> -	value_size = map->value_size;
>>> +	value_size = bpf_map_value_size(map);
>>>    	err = -ENOMEM;
>>>    	value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
>>>    	if (!value)
>>>    		goto free_key;
>>> +	err = -ENOTSUPP;
>>>    	if (map->map_type == BPF_MAP_TYPE_QUEUE ||
>>>    	    map->map_type == BPF_MAP_TYPE_STACK) {
>>>    		err = map->ops->map_pop_elem(map, value);
>>> -	} else {
>>> -		err = -ENOTSUPP;
>>> +	} else if (map->map_type == BPF_MAP_TYPE_HASH ||
>>> +		   map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
>>> +		   map->map_type == BPF_MAP_TYPE_LRU_HASH ||
>>> +		   map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
>>
>> Since we added four new map support for lookup_and_delete, we should
>> update documentation in uapi bpf.h.
>>
>> Currently, it is:
>>
>>   * BPF_MAP_LOOKUP_AND_DELETE_ELEM
>>   *      Description
>>   *              Look up an element with the given *key* in the map referred
>> to
>>   *              by the file descriptor *fd*, and if found, delete the
>> element.
>>   *
>>   *              The **BPF_MAP_TYPE_QUEUE** and **BPF_MAP_TYPE_STACK** map
>> types
>>   *              implement this command as a "pop" operation, deleting the
>> top
>>   *              element rather than one corresponding to *key*.
>>   *              The *key* and *key_len* parameters should be zeroed when
>>   *              issuing this operation for these map types.
>>   *
>>   *              This command is only valid for the following map types:
>>   *              * **BPF_MAP_TYPE_QUEUE**
>>   *              * **BPF_MAP_TYPE_STACK**
>>   *
>>   *      Return
>>   *              Returns zero on success. On error, -1 is returned and
>> *errno*
>>   *              is set appropriately.
>>
>> Please remember to sync updated linux/include/uapi/linux/bpf.h to
>> linux/tools/include/uapi/linux/bpf.h.
>>
> 
> The description looks ok to me as it is, I should just extend the list
> of valid map types and add the description for the flags argument (needs
> to be 0 for QUEUE and STACK, can be BPF_F_LOCK for rest)?

Yes.

> 
>>> +		if (!bpf_map_is_dev_bound(map)) {
>>> +			bpf_disable_instrumentation();
>>> +			rcu_read_lock();
>>> +			err = map->ops->map_lookup_and_delete_elem(map, key, value, attr->flags);
>>> +			rcu_read_unlock();
>>> +			bpf_enable_instrumentation();
>>> +		}
>>>    	}
>>>    	if (err)
[...]