From: Yonghong Song <yhs@fb.com>
To: Denis Salopek <denis.salopek@sartura.hr>, <bpf@vger.kernel.org>
Cc: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>,
Luka Oreskovic <luka.oreskovic@sartura.hr>,
Luka Perkov <luka.perkov@sartura.hr>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [PATCH v5 bpf-next 1/3] bpf: add lookup_and_delete_elem support to hashtab
Date: Fri, 16 Apr 2021 09:50:08 -0700 [thread overview]
Message-ID: <89e0177e-2ff4-f2a2-9e17-7f86fbc1b13c@fb.com> (raw)
In-Reply-To: <20210416095814.2771-1-denis.salopek@sartura.hr>
On 4/16/21 2:58 AM, Denis Salopek wrote:
> Extend the existing bpf_map_lookup_and_delete_elem() functionality to
> hashtab map types, in addition to stacks and queues.
> Create a new hashtab bpf_map_ops function that does lookup and deletion
> of the element under the same bucket lock and add the created map_ops to
> bpf.h.
>
> Cc: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
> Cc: Luka Oreskovic <luka.oreskovic@sartura.hr>
> Cc: Luka Perkov <luka.perkov@sartura.hr>
> Cc: Yonghong Song <yhs@fb.com>
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Denis Salopek <denis.salopek@sartura.hr>
> ---
> v2: Add functionality for LRU/per-CPU, add test_progs tests.
> v3: Add bpf_map_lookup_and_delete_elem_flags() and enable BPF_F_LOCK
> flag, change CHECKs to ASSERT_OKs, initialize variables to 0.
> v4: Fix the return value for unsupported map types.
> v5: Split patch to 3 patches. Extend BPF_MAP_LOOKUP_AND_DELETE_ELEM
> documentation with this changes.
> ---
> include/linux/bpf.h | 2 +
> include/uapi/linux/bpf.h | 13 +++++
> kernel/bpf/hashtab.c | 99 ++++++++++++++++++++++++++++++++++
> kernel/bpf/syscall.c | 33 ++++++++++--
> tools/include/uapi/linux/bpf.h | 13 +++++
> 5 files changed, 156 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index ff8cd68c01b3..d39fe682799e 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -68,6 +68,8 @@ struct bpf_map_ops {
> void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
> int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
> union bpf_attr __user *uattr);
> + int (*map_lookup_and_delete_elem)(struct bpf_map *map, void *key,
> + void *value, u64 flags);
> int (*map_lookup_and_delete_batch)(struct bpf_map *map,
> const union bpf_attr *attr,
> union bpf_attr __user *uattr);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index df164a44bb41..f30cabe02814 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -527,6 +527,15 @@ union bpf_iter_link_info {
> * Look up an element with the given *key* in the map referred to
> * by the file descriptor *fd*, and if found, delete the element.
> *
> + * For **BPF_MAP_TYPE_QUEUE** and **BPF_MAP_TYPE_STACK** map
> + * types, the *flags* argument needs to be set to 0, but for other
> + * map types, it may be specified as:
> + *
> + * **BPF_F_LOCK**
> + * Look up and delete the value of a spin-locked map
> + * without returning the lock. This must be specified if
> + * the elements contain a spinlock.
> + *
> * The **BPF_MAP_TYPE_QUEUE** and **BPF_MAP_TYPE_STACK** map types
> * implement this command as a "pop" operation, deleting the top
> * element rather than one corresponding to *key*.
> @@ -536,6 +545,10 @@ union bpf_iter_link_info {
> * This command is only valid for the following map types:
> * * **BPF_MAP_TYPE_QUEUE**
> * * **BPF_MAP_TYPE_STACK**
> + * * **BPF_MAP_TYPE_HASH**
> + * * **BPF_MAP_TYPE_PERCPU_HASH**
> + * * **BPF_MAP_TYPE_LRU_HASH**
> + * * **BPF_MAP_TYPE_LRU_PERCPU_HASH**
> *
> * Return
> * Returns zero on success. On error, -1 is returned and *errno*
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index d7ebb12ffffc..5e57503d4706 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -1401,6 +1401,101 @@ static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
> rcu_read_unlock();
> }
>
> +static int __htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
> + void *value, bool is_lru_map,
> + bool is_percpu, u64 flags)
> +{
> + struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> + struct hlist_nulls_head *head;
> + unsigned long bflags;
> + struct htab_elem *l;
> + u32 hash, key_size;
> + struct bucket *b;
> + int ret;
> +
> + if ((flags & ~BPF_F_LOCK) ||
> + ((flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
> + return -EINVAL;
We don't need to check here. It has been checked in
map_lookup_and_delete_elem().
> +
> + key_size = map->key_size;
> +
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
> + b = __select_bucket(htab, hash);
> + head = &b->head;
> +
> + ret = htab_lock_bucket(htab, b, hash, &bflags);
> + if (ret)
> + return ret;
> +
> + l = lookup_elem_raw(head, hash, key, key_size);
> + if (l) {
> + if (is_percpu) {
> + u32 roundup_value_size = round_up(map->value_size, 8);
> + void __percpu *pptr;
> + int off = 0, cpu;
> +
> + pptr = htab_elem_get_ptr(l, key_size);
> + for_each_possible_cpu(cpu) {
> + bpf_long_memcpy(value + off,
> + per_cpu_ptr(pptr, cpu),
> + roundup_value_size);
> + off += roundup_value_size;
> + }
> + } else {
> + if (flags & BPF_F_LOCK)
> + copy_map_value_locked(map, value, l->key +
> + round_up(key_size, 8),
> + true);
> + else
> + copy_map_value(map, value, l->key +
> + round_up(key_size, 8));
You can have a common declaration like below in the beginning of the block.
u32 roundup_key_size = round_up(map->key_size, 8);
and use roundup_key_size in copy_map_value_locked() and
copy_map_value().
> + check_and_init_map_lock(map, value);
> + }
> +
> + hlist_nulls_del_rcu(&l->hash_node);
> + if (!is_lru_map)
> + free_htab_elem(htab, l);
> + } else
> + ret = -ENOENT;
Probably it is more readable if you write above like
if (!l) {
ret = -ENOENT;
} else {
...
}
> +
> + htab_unlock_bucket(htab, b, hash, bflags);
> +
> + if (is_lru_map && l)
> + bpf_lru_push_free(&htab->lru, &l->lru_node);
> +
> + return ret;
> +}
> +
> +static int htab_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
> + void *value, u64 flags)
> +{
> + return __htab_map_lookup_and_delete_elem(map, key, value, false, false,
> + flags);
> +}
> +
> +static int htab_percpu_map_lookup_and_delete_elem(struct bpf_map *map,
> + void *key, void *value,
> + u64 flags)
> +{
> + return __htab_map_lookup_and_delete_elem(map, key, value, false, true,
> + flags);
> +}
> +
> +static int htab_lru_map_lookup_and_delete_elem(struct bpf_map *map, void *key,
> + void *value, u64 flags)
> +{
> + return __htab_map_lookup_and_delete_elem(map, key, value, true, false,
> + flags);
> +}
> +
> +static int htab_lru_percpu_map_lookup_and_delete_elem(struct bpf_map *map,
> + void *key, void *value,
> + u64 flags)
> +{
> + return __htab_map_lookup_and_delete_elem(map, key, value, true, true,
> + flags);
> +}
> +
> static int
> __htab_map_lookup_and_delete_batch(struct bpf_map *map,
> const union bpf_attr *attr,
> @@ -1934,6 +2029,7 @@ const struct bpf_map_ops htab_map_ops = {
> .map_free = htab_map_free,
> .map_get_next_key = htab_map_get_next_key,
> .map_lookup_elem = htab_map_lookup_elem,
> + .map_lookup_and_delete_elem = htab_map_lookup_and_delete_elem,
> .map_update_elem = htab_map_update_elem,
> .map_delete_elem = htab_map_delete_elem,
> .map_gen_lookup = htab_map_gen_lookup,
> @@ -1954,6 +2050,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
> .map_free = htab_map_free,
> .map_get_next_key = htab_map_get_next_key,
> .map_lookup_elem = htab_lru_map_lookup_elem,
> + .map_lookup_and_delete_elem = htab_lru_map_lookup_and_delete_elem,
> .map_lookup_elem_sys_only = htab_lru_map_lookup_elem_sys,
> .map_update_elem = htab_lru_map_update_elem,
> .map_delete_elem = htab_lru_map_delete_elem,
> @@ -2077,6 +2174,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
> .map_free = htab_map_free,
> .map_get_next_key = htab_map_get_next_key,
> .map_lookup_elem = htab_percpu_map_lookup_elem,
> + .map_lookup_and_delete_elem = htab_percpu_map_lookup_and_delete_elem,
> .map_update_elem = htab_percpu_map_update_elem,
> .map_delete_elem = htab_map_delete_elem,
> .map_seq_show_elem = htab_percpu_map_seq_show_elem,
> @@ -2096,6 +2194,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
> .map_free = htab_map_free,
> .map_get_next_key = htab_map_get_next_key,
> .map_lookup_elem = htab_lru_percpu_map_lookup_elem,
> + .map_lookup_and_delete_elem = htab_lru_percpu_map_lookup_and_delete_elem,
> .map_update_elem = htab_lru_percpu_map_update_elem,
> .map_delete_elem = htab_lru_map_delete_elem,
> .map_seq_show_elem = htab_percpu_map_seq_show_elem,
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index fd495190115e..78f6312d9bdb 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1468,7 +1468,7 @@ int generic_map_lookup_batch(struct bpf_map *map,
> return err;
> }
>
> -#define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value
> +#define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD flags
>
> static int map_lookup_and_delete_elem(union bpf_attr *attr)
> {
> @@ -1484,6 +1484,9 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
> if (CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM))
> return -EINVAL;
>
> + if (attr->flags & ~BPF_F_LOCK)
> + return -EINVAL;
> +
> f = fdget(ufd);
> map = __bpf_map_get(f);
> if (IS_ERR(map))
> @@ -1494,24 +1497,46 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
> goto err_put;
> }
>
> + if (attr->flags && (map->map_type == BPF_MAP_TYPE_QUEUE ||
> + map->map_type == BPF_MAP_TYPE_STACK)) {
Need better code alignment.
if (attr->flags &&
(map->map_type == BPF_MAP_TYPE_QUEUE ||
map->map_type == BPF_MAP_TYPE_STACK)) {
...
}
> + err = -EINVAL;
> + goto err_put;
> + }
> +
> + if ((attr->flags & BPF_F_LOCK) &&
> + !map_value_has_spin_lock(map)) {
> + err = -EINVAL;
> + goto err_put;
> + }
> +
> key = __bpf_copy_key(ukey, map->key_size);
> if (IS_ERR(key)) {
> err = PTR_ERR(key);
> goto err_put;
> }
>
> - value_size = map->value_size;
> + value_size = bpf_map_value_size(map);
>
> err = -ENOMEM;
> value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
> if (!value)
> goto free_key;
>
> + err = -ENOTSUPP;
> if (map->map_type == BPF_MAP_TYPE_QUEUE ||
> map->map_type == BPF_MAP_TYPE_STACK) {
> err = map->ops->map_pop_elem(map, value);
> - } else {
> - err = -ENOTSUPP;
> + } else if (map->map_type == BPF_MAP_TYPE_HASH ||
> + map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
> + map->map_type == BPF_MAP_TYPE_LRU_HASH ||
> + map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
> + if (!bpf_map_is_dev_bound(map)) {
> + bpf_disable_instrumentation();
> + rcu_read_lock();
> + err = map->ops->map_lookup_and_delete_elem(map, key, value, attr->flags);
> + rcu_read_unlock();
> + bpf_enable_instrumentation();
> + }
> }
>
> if (err)
[...]
prev parent reply other threads:[~2021-04-16 16:50 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-16 9:58 [PATCH v5 bpf-next 1/3] bpf: add lookup_and_delete_elem support to hashtab Denis Salopek
2021-04-16 9:58 ` [PATCH v5 bpf-next 2/3] bpf: extend libbpf with bpf_map_lookup_and_delete_elem_flags Denis Salopek
2021-04-16 16:54 ` Yonghong Song
2021-04-16 9:58 ` [PATCH v5 bpf-next 3/3] selftests/bpf: add bpf_lookup_and_delete_elem tests Denis Salopek
2021-04-16 17:39 ` Yonghong Song
2021-04-16 16:50 ` Yonghong Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=89e0177e-2ff4-f2a2-9e17-7f86fbc1b13c@fb.com \
--to=yhs@fb.com \
--cc=andrii.nakryiko@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=denis.salopek@sartura.hr \
--cc=juraj.vijtiuk@sartura.hr \
--cc=luka.oreskovic@sartura.hr \
--cc=luka.perkov@sartura.hr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).