linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem
@ 2019-11-19 19:30 Brian Vazquez
  2019-11-19 19:30 ` [PATCH v2 bpf-next 1/9] bpf: add bpf_map_{value_size,update_value,map_copy_value} functions Brian Vazquez
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

This patch series introduce batch ops that can be added to bpf maps to
lookup/lookup_and_delete/update/delete more than 1 element at the time,
this is specially useful when syscall overhead is a problem and in case
of hmap it will provide a reliable way of traversing them.

The implementation inclues a generic approach that could potentially be
used by any bpf map and adds it to arraymap, it also includes the specific
implementation of hashmaps which are traversed using buckets instead
of keys.

The bpf syscall subcommands introduced are:

  BPF_MAP_LOOKUP_BATCH
  BPF_MAP_LOOKUP_AND_DELETE_BATCH
  BPF_MAP_UPDATE_BATCH
  BPF_MAP_DELETE_BATCH

The UAPI attribute is:

  struct { /* struct used by BPF_MAP_*_BATCH commands */
         __aligned_u64   in_batch;       /* start batch,
                                          * NULL to start from beginning
                                          */
         __aligned_u64   out_batch;      /* output: next start batch */
         __aligned_u64   keys;
         __aligned_u64   values;
         __u32           count;          /* input/output:
                                          * input: # of key/value
                                          * elements
                                          * output: # of filled elements
                                          */
         __u32           map_fd;
         __u64           elem_flags;
         __u64           flags;
  } batch;


in_batch and out_batch are only used for lookup and lookup_and_delete since
those are the only two operations that attempt to traverse the map.

update/delete batch ops should provide the keys/values that user wants
to modify.

Here are the previous discussions on the batch processing:
 - https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
 - https://lore.kernel.org/bpf/20190829064502.2750303-1-yhs@fb.com/
 - https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/

Changelog since v1:
 - Fix SOB ordering and remove Co-authored-by tag (Alexei)

Changelog since RFC:
 - Change batch to in_batch and out_batch to support more flexible opaque
   values to iterate the bpf maps.
 - Remove update/delete specific batch ops for htab and use the generic
   implementations instead.

Brian Vazquez (5):
  bpf: add bpf_map_{value_size,update_value,map_copy_value} functions
  bpf: add generic support for lookup and lookup_and_delete batch ops
  bpf: add generic support for update and delete batch ops
  bpf: add lookup and updated batch ops to arraymap
  selftests/bpf: add batch ops testing to array bpf map

Yonghong Song (4):
  bpf: add batch ops to all htab bpf map
  tools/bpf: sync uapi header bpf.h
  libbpf: add libbpf support to batch ops
  selftests/bpf: add batch ops testing for hmap and hmap_percpu

 include/linux/bpf.h                           |  21 +
 include/uapi/linux/bpf.h                      |  21 +
 kernel/bpf/arraymap.c                         |   2 +
 kernel/bpf/hashtab.c                          | 244 ++++++++
 kernel/bpf/syscall.c                          | 571 ++++++++++++++----
 tools/include/uapi/linux/bpf.h                |  21 +
 tools/lib/bpf/bpf.c                           |  61 ++
 tools/lib/bpf/bpf.h                           |  14 +
 tools/lib/bpf/libbpf.map                      |   4 +
 .../map_lookup_and_delete_batch_array.c       | 119 ++++
 .../map_lookup_and_delete_batch_htab.c        | 257 ++++++++
 11 files changed, 1215 insertions(+), 120 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
 create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c

-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 1/9] bpf: add bpf_map_{value_size,update_value,map_copy_value} functions
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-22 16:36   ` John Fastabend
  2019-11-19 19:30 ` [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops Brian Vazquez
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

This commit moves reusable code from map_lookup_elem and map_update_elem
to avoid code duplication in kernel/bpf/syscall.c.

Signed-off-by: Brian Vazquez <brianvv@google.com>
---
 kernel/bpf/syscall.c | 271 ++++++++++++++++++++++++-------------------
 1 file changed, 151 insertions(+), 120 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c88c815c2154d..cc714c9d5b4cc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -127,6 +127,153 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
 	return map;
 }
 
+static u32 bpf_map_value_size(struct bpf_map *map)
+{
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
+	    map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
+		return round_up(map->value_size, 8) * num_possible_cpus();
+	else if (IS_FD_MAP(map))
+		return sizeof(u32);
+	else
+		return  map->value_size;
+}
+
+static void maybe_wait_bpf_programs(struct bpf_map *map)
+{
+	/* Wait for any running BPF programs to complete so that
+	 * userspace, when we return to it, knows that all programs
+	 * that could be running use the new map value.
+	 */
+	if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
+	    map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
+		synchronize_rcu();
+}
+
+static int bpf_map_update_value(struct bpf_map *map, struct fd f, void *key,
+				void *value, __u64 flags)
+{
+	int err;
+	/* Need to create a kthread, thus must support schedule */
+	if (bpf_map_is_dev_bound(map)) {
+		return bpf_map_offload_update_elem(map, key, value, flags);
+	} else if (map->map_type == BPF_MAP_TYPE_CPUMAP ||
+		   map->map_type == BPF_MAP_TYPE_SOCKHASH ||
+		   map->map_type == BPF_MAP_TYPE_SOCKMAP) {
+		return map->ops->map_update_elem(map, key, value, flags);
+	}
+
+	/* must increment bpf_prog_active to avoid kprobe+bpf triggering from
+	 * inside bpf map update or delete otherwise deadlocks are possible
+	 */
+	preempt_disable();
+	__this_cpu_inc(bpf_prog_active);
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
+		err = bpf_percpu_hash_update(map, key, value, flags);
+	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
+		err = bpf_percpu_array_update(map, key, value, flags);
+	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
+		err = bpf_percpu_cgroup_storage_update(map, key, value,
+						       flags);
+	} else if (IS_FD_ARRAY(map)) {
+		rcu_read_lock();
+		err = bpf_fd_array_map_update_elem(map, f.file, key, value,
+						   flags);
+		rcu_read_unlock();
+	} else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
+		rcu_read_lock();
+		err = bpf_fd_htab_map_update_elem(map, f.file, key, value,
+						  flags);
+		rcu_read_unlock();
+	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
+		/* rcu_read_lock() is not needed */
+		err = bpf_fd_reuseport_array_update_elem(map, key, value,
+							 flags);
+	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
+		   map->map_type == BPF_MAP_TYPE_STACK) {
+		err = map->ops->map_push_elem(map, value, flags);
+	} else {
+		rcu_read_lock();
+		err = map->ops->map_update_elem(map, key, value, flags);
+		rcu_read_unlock();
+	}
+	__this_cpu_dec(bpf_prog_active);
+	preempt_enable();
+	maybe_wait_bpf_programs(map);
+
+	return err;
+}
+
+static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
+			      __u64 flags, bool do_delete)
+{
+	void *ptr;
+	int err;
+
+
+	if (bpf_map_is_dev_bound(map)) {
+		err =  bpf_map_offload_lookup_elem(map, key, value);
+
+		if (!err && do_delete)
+			err = bpf_map_offload_delete_elem(map, key);
+
+		return err;
+	}
+
+	preempt_disable();
+	this_cpu_inc(bpf_prog_active);
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
+		err = bpf_percpu_hash_copy(map, key, value);
+	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
+		err = bpf_percpu_array_copy(map, key, value);
+	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
+		err = bpf_percpu_cgroup_storage_copy(map, key, value);
+	} else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
+		err = bpf_stackmap_copy(map, key, value);
+	} else if (IS_FD_ARRAY(map)) {
+		err = bpf_fd_array_map_lookup_elem(map, key, value);
+	} else if (IS_FD_HASH(map)) {
+		err = bpf_fd_htab_map_lookup_elem(map, key, value);
+	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
+		err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
+	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
+		   map->map_type == BPF_MAP_TYPE_STACK) {
+		err = map->ops->map_peek_elem(map, value);
+	} else {
+		rcu_read_lock();
+		if (map->ops->map_lookup_elem_sys_only)
+			ptr = map->ops->map_lookup_elem_sys_only(map, key);
+		else
+			ptr = map->ops->map_lookup_elem(map, key);
+		if (IS_ERR(ptr)) {
+			err = PTR_ERR(ptr);
+		} else if (!ptr) {
+			err = -ENOENT;
+		} else {
+			err = 0;
+			if (flags & BPF_F_LOCK)
+				/* lock 'ptr' and copy everything but lock */
+				copy_map_value_locked(map, value, ptr, true);
+			else
+				copy_map_value(map, value, ptr);
+			/* mask lock, since value wasn't zero inited */
+			check_and_init_map_lock(map, value);
+		}
+		rcu_read_unlock();
+	}
+	if (do_delete)
+		err = err ? err : map->ops->map_delete_elem(map, key);
+
+	this_cpu_dec(bpf_prog_active);
+	preempt_enable();
+	maybe_wait_bpf_programs(map);
+
+	return err;
+}
+
 void *bpf_map_area_alloc(size_t size, int numa_node)
 {
 	/* We really just want to fail instead of triggering OOM killer
@@ -740,7 +887,7 @@ static int map_lookup_elem(union bpf_attr *attr)
 	void __user *uvalue = u64_to_user_ptr(attr->value);
 	int ufd = attr->map_fd;
 	struct bpf_map *map;
-	void *key, *value, *ptr;
+	void *key, *value;
 	u32 value_size;
 	struct fd f;
 	int err;
@@ -772,72 +919,14 @@ static int map_lookup_elem(union bpf_attr *attr)
 		goto err_put;
 	}
 
-	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
-	    map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
-		value_size = round_up(map->value_size, 8) * num_possible_cpus();
-	else if (IS_FD_MAP(map))
-		value_size = sizeof(u32);
-	else
-		value_size = map->value_size;
+	value_size = bpf_map_value_size(map);
 
 	err = -ENOMEM;
 	value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
 	if (!value)
 		goto free_key;
 
-	if (bpf_map_is_dev_bound(map)) {
-		err = bpf_map_offload_lookup_elem(map, key, value);
-		goto done;
-	}
-
-	preempt_disable();
-	this_cpu_inc(bpf_prog_active);
-	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
-		err = bpf_percpu_hash_copy(map, key, value);
-	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
-		err = bpf_percpu_array_copy(map, key, value);
-	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
-		err = bpf_percpu_cgroup_storage_copy(map, key, value);
-	} else if (map->map_type == BPF_MAP_TYPE_STACK_TRACE) {
-		err = bpf_stackmap_copy(map, key, value);
-	} else if (IS_FD_ARRAY(map)) {
-		err = bpf_fd_array_map_lookup_elem(map, key, value);
-	} else if (IS_FD_HASH(map)) {
-		err = bpf_fd_htab_map_lookup_elem(map, key, value);
-	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
-		err = bpf_fd_reuseport_array_lookup_elem(map, key, value);
-	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
-		err = map->ops->map_peek_elem(map, value);
-	} else {
-		rcu_read_lock();
-		if (map->ops->map_lookup_elem_sys_only)
-			ptr = map->ops->map_lookup_elem_sys_only(map, key);
-		else
-			ptr = map->ops->map_lookup_elem(map, key);
-		if (IS_ERR(ptr)) {
-			err = PTR_ERR(ptr);
-		} else if (!ptr) {
-			err = -ENOENT;
-		} else {
-			err = 0;
-			if (attr->flags & BPF_F_LOCK)
-				/* lock 'ptr' and copy everything but lock */
-				copy_map_value_locked(map, value, ptr, true);
-			else
-				copy_map_value(map, value, ptr);
-			/* mask lock, since value wasn't zero inited */
-			check_and_init_map_lock(map, value);
-		}
-		rcu_read_unlock();
-	}
-	this_cpu_dec(bpf_prog_active);
-	preempt_enable();
-
-done:
+	err = bpf_map_copy_value(map, key, value, attr->flags, false);
 	if (err)
 		goto free_value;
 
@@ -856,16 +945,6 @@ static int map_lookup_elem(union bpf_attr *attr)
 	return err;
 }
 
-static void maybe_wait_bpf_programs(struct bpf_map *map)
-{
-	/* Wait for any running BPF programs to complete so that
-	 * userspace, when we return to it, knows that all programs
-	 * that could be running use the new map value.
-	 */
-	if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
-	    map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
-		synchronize_rcu();
-}
 
 #define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
 
@@ -921,56 +1000,8 @@ static int map_update_elem(union bpf_attr *attr)
 	if (copy_from_user(value, uvalue, value_size) != 0)
 		goto free_value;
 
-	/* Need to create a kthread, thus must support schedule */
-	if (bpf_map_is_dev_bound(map)) {
-		err = bpf_map_offload_update_elem(map, key, value, attr->flags);
-		goto out;
-	} else if (map->map_type == BPF_MAP_TYPE_CPUMAP ||
-		   map->map_type == BPF_MAP_TYPE_SOCKHASH ||
-		   map->map_type == BPF_MAP_TYPE_SOCKMAP) {
-		err = map->ops->map_update_elem(map, key, value, attr->flags);
-		goto out;
-	}
+	err = bpf_map_update_value(map, f, key, value, attr->flags);
 
-	/* must increment bpf_prog_active to avoid kprobe+bpf triggering from
-	 * inside bpf map update or delete otherwise deadlocks are possible
-	 */
-	preempt_disable();
-	__this_cpu_inc(bpf_prog_active);
-	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
-		err = bpf_percpu_hash_update(map, key, value, attr->flags);
-	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
-		err = bpf_percpu_array_update(map, key, value, attr->flags);
-	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE) {
-		err = bpf_percpu_cgroup_storage_update(map, key, value,
-						       attr->flags);
-	} else if (IS_FD_ARRAY(map)) {
-		rcu_read_lock();
-		err = bpf_fd_array_map_update_elem(map, f.file, key, value,
-						   attr->flags);
-		rcu_read_unlock();
-	} else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
-		rcu_read_lock();
-		err = bpf_fd_htab_map_update_elem(map, f.file, key, value,
-						  attr->flags);
-		rcu_read_unlock();
-	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
-		/* rcu_read_lock() is not needed */
-		err = bpf_fd_reuseport_array_update_elem(map, key, value,
-							 attr->flags);
-	} else if (map->map_type == BPF_MAP_TYPE_QUEUE ||
-		   map->map_type == BPF_MAP_TYPE_STACK) {
-		err = map->ops->map_push_elem(map, value, attr->flags);
-	} else {
-		rcu_read_lock();
-		err = map->ops->map_update_elem(map, key, value, attr->flags);
-		rcu_read_unlock();
-	}
-	__this_cpu_dec(bpf_prog_active);
-	preempt_enable();
-	maybe_wait_bpf_programs(map);
-out:
 free_value:
 	kfree(value);
 free_key:
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
  2019-11-19 19:30 ` [PATCH v2 bpf-next 1/9] bpf: add bpf_map_{value_size,update_value,map_copy_value} functions Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-21 17:36   ` Yonghong Song
  2019-11-22 17:25   ` John Fastabend
  2019-11-19 19:30 ` [PATCH v2 bpf-next 3/9] bpf: add generic support for update and delete " Brian Vazquez
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

This commit introduces generic support for the bpf_map_lookup_batch and
bpf_map_lookup_and_delete_batch ops. This implementation can be used by
almost all the bpf maps since its core implementation is relying on the
existing map_get_next_key, map_lookup_elem and map_delete_elem
functions. The bpf syscall subcommands introduced are:

  BPF_MAP_LOOKUP_BATCH
  BPF_MAP_LOOKUP_AND_DELETE_BATCH

The UAPI attribute is:

  struct { /* struct used by BPF_MAP_*_BATCH commands */
         __aligned_u64   in_batch;       /* start batch,
                                          * NULL to start from beginning
                                          */
         __aligned_u64   out_batch;      /* output: next start batch */
         __aligned_u64   keys;
         __aligned_u64   values;
         __u32           count;          /* input/output:
                                          * input: # of key/value
                                          * elements
                                          * output: # of filled elements
                                          */
         __u32           map_fd;
         __u64           elem_flags;
         __u64           flags;
  } batch;

in_batch/out_batch are opaque values use to communicate between
user/kernel space, in_batch/out_batch must be of key_size length.

To start iterating from the beginning in_batch must be null,
count is the # of key/value elements to retrieve. Note that the 'keys'
buffer must be a buffer of key_size * count size and the 'values' buffer
must be value_size * count, where value_size must be aligned to 8 bytes
by userspace if it's dealing with percpu maps. 'count' will contain the
number of keys/values successfully retrieved. Note that 'count' is an
input/output variable and it can contain a lower value after a call.

If there's no more entries to retrieve, ENOENT will be returned. If error
is ENOENT, count might be > 0 in case it copied some values but there were
no more entries to retrieve.

Note that if the return code is an error and not -EFAULT,
count indicates the number of elements successfully processed.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h      |  11 +++
 include/uapi/linux/bpf.h |  19 +++++
 kernel/bpf/syscall.c     | 176 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 206 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b81cde47314e..767a823dbac74 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -41,6 +41,11 @@ struct bpf_map_ops {
 	int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
 	void (*map_release_uref)(struct bpf_map *map);
 	void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
+	int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
+				union bpf_attr __user *uattr);
+	int (*map_lookup_and_delete_batch)(struct bpf_map *map,
+					   const union bpf_attr *attr,
+					   union bpf_attr __user *uattr);
 
 	/* funcs callable from userspace and from eBPF programs */
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
@@ -797,6 +802,12 @@ void bpf_map_charge_move(struct bpf_map_memory *dst,
 void *bpf_map_area_alloc(size_t size, int numa_node);
 void bpf_map_area_free(void *base);
 void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
+int  generic_map_lookup_batch(struct bpf_map *map,
+			      const union bpf_attr *attr,
+			      union bpf_attr __user *uattr);
+int  generic_map_lookup_and_delete_batch(struct bpf_map *map,
+					 const union bpf_attr *attr,
+					 union bpf_attr __user *uattr);
 
 extern int sysctl_unprivileged_bpf_disabled;
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4842a134b202a..e60b7b7cda61a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -107,6 +107,8 @@ enum bpf_cmd {
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 	BPF_MAP_FREEZE,
 	BPF_BTF_GET_NEXT_ID,
+	BPF_MAP_LOOKUP_BATCH,
+	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
 };
 
 enum bpf_map_type {
@@ -400,6 +402,23 @@ union bpf_attr {
 		__u64		flags;
 	};
 
+	struct { /* struct used by BPF_MAP_*_BATCH commands */
+		__aligned_u64	in_batch;	/* start batch,
+						 * NULL to start from beginning
+						 */
+		__aligned_u64	out_batch;	/* output: next start batch */
+		__aligned_u64	keys;
+		__aligned_u64	values;
+		__u32		count;		/* input/output:
+						 * input: # of key/value
+						 * elements
+						 * output: # of filled elements
+						 */
+		__u32		map_fd;
+		__u64		elem_flags;
+		__u64		flags;
+	} batch;
+
 	struct { /* anonymous struct used by BPF_PROG_LOAD command */
 		__u32		prog_type;	/* one of enum bpf_prog_type */
 		__u32		insn_cnt;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index cc714c9d5b4cc..d0d3d0e0eaca4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1127,6 +1127,124 @@ static int map_get_next_key(union bpf_attr *attr)
 	return err;
 }
 
+static int __generic_map_lookup_batch(struct bpf_map *map,
+				      const union bpf_attr *attr,
+				      union bpf_attr __user *uattr,
+				      bool do_delete)
+{
+	void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
+	void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch);
+	void __user *values = u64_to_user_ptr(attr->batch.values);
+	void __user *keys = u64_to_user_ptr(attr->batch.keys);
+	void *buf, *prev_key, *key, *value;
+	u32 value_size, cp, max_count;
+	bool first_key = false;
+	int err, retry = 3;
+
+	if (attr->batch.elem_flags & ~BPF_F_LOCK)
+		return -EINVAL;
+
+	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
+	    !map_value_has_spin_lock(map)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	if (map->map_type == BPF_MAP_TYPE_QUEUE ||
+	    map->map_type == BPF_MAP_TYPE_STACK) {
+		err = -ENOTSUPP;
+		goto err_put;
+	}
+
+	value_size = bpf_map_value_size(map);
+
+	max_count = attr->batch.count;
+	if (!max_count)
+		return 0;
+
+	err = -ENOMEM;
+	buf = kmalloc(map->key_size + value_size, GFP_USER | __GFP_NOWARN);
+	if (!buf)
+		goto err_put;
+
+	err = -EFAULT;
+	first_key = false;
+	if (ubatch && copy_from_user(buf, ubatch, map->key_size))
+		goto free_buf;
+	key = buf;
+	value = key + map->key_size;
+	if (!ubatch) {
+		prev_key = NULL;
+		first_key = true;
+	}
+
+
+	for (cp = 0; cp < max_count; cp++) {
+		if (cp || first_key) {
+			rcu_read_lock();
+			err = map->ops->map_get_next_key(map, prev_key, key);
+			rcu_read_unlock();
+			if (err)
+				break;
+		}
+		err = bpf_map_copy_value(map, key, value,
+					 attr->batch.elem_flags, do_delete);
+
+		if (err == -ENOENT) {
+			if (retry) {
+				retry--;
+				continue;
+			}
+			err = -EINTR;
+			break;
+		}
+
+		if (err)
+			goto free_buf;
+
+		if (copy_to_user(keys + cp * map->key_size, key,
+				 map->key_size)) {
+			err = -EFAULT;
+			goto free_buf;
+		}
+		if (copy_to_user(values + cp * value_size, value, value_size)) {
+			err = -EFAULT;
+			goto free_buf;
+		}
+
+		prev_key = key;
+		retry = 3;
+	}
+	if (!err) {
+		rcu_read_lock();
+		err = map->ops->map_get_next_key(map, prev_key, key);
+		rcu_read_unlock();
+	}
+
+	if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) ||
+		    (copy_to_user(uobatch, key, map->key_size))))
+		err = -EFAULT;
+
+free_buf:
+	kfree(buf);
+err_put:
+	return err;
+}
+
+int generic_map_lookup_batch(struct bpf_map *map,
+			     const union bpf_attr *attr,
+			     union bpf_attr __user *uattr)
+{
+	return __generic_map_lookup_batch(map, attr, uattr, false);
+}
+
+int generic_map_lookup_and_delete_batch(struct bpf_map *map,
+					const union bpf_attr *attr,
+					union bpf_attr __user *uattr)
+{
+	return __generic_map_lookup_batch(map, attr, uattr, true);
+}
+
 #define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value
 
 static int map_lookup_and_delete_elem(union bpf_attr *attr)
@@ -2956,6 +3074,57 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
 	return err;
 }
 
+#define BPF_MAP_BATCH_LAST_FIELD batch.flags
+
+#define BPF_DO_BATCH(fn)			\
+	do {					\
+		if (!fn) {			\
+			err = -ENOTSUPP;	\
+			goto err_put;		\
+		}				\
+		err = fn(map, attr, uattr);	\
+	} while (0)
+
+static int bpf_map_do_batch(const union bpf_attr *attr,
+			    union bpf_attr __user *uattr,
+			    int cmd)
+{
+	struct bpf_map *map;
+	int err, ufd;
+	struct fd f;
+
+	if (CHECK_ATTR(BPF_MAP_BATCH))
+		return -EINVAL;
+
+	ufd = attr->batch.map_fd;
+	f = fdget(ufd);
+	map = __bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	if ((cmd == BPF_MAP_LOOKUP_BATCH ||
+	     cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) &&
+	    !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
+	if (cmd != BPF_MAP_LOOKUP_BATCH &&
+	    !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
+	if (cmd == BPF_MAP_LOOKUP_BATCH)
+		BPF_DO_BATCH(map->ops->map_lookup_batch);
+	else
+		BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch);
+
+err_put:
+	fdput(f);
+	return err;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -3053,6 +3222,13 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
 		err = map_lookup_and_delete_elem(&attr);
 		break;
+	case BPF_MAP_LOOKUP_BATCH:
+		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_LOOKUP_BATCH);
+		break;
+	case BPF_MAP_LOOKUP_AND_DELETE_BATCH:
+		err = bpf_map_do_batch(&attr, uattr,
+				       BPF_MAP_LOOKUP_AND_DELETE_BATCH);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 3/9] bpf: add generic support for update and delete batch ops
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
  2019-11-19 19:30 ` [PATCH v2 bpf-next 1/9] bpf: add bpf_map_{value_size,update_value,map_copy_value} functions Brian Vazquez
  2019-11-19 19:30 ` [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-21 18:00   ` Yonghong Song
  2019-11-19 19:30 ` [PATCH v2 bpf-next 4/9] bpf: add lookup and updated batch ops to arraymap Brian Vazquez
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

This commit adds generic support for update and delete batch ops that
can be used for almost all the bpf maps. These commands share the same
UAPI attr that lookup and lookup_and_delete batch ops use and the
syscall commands are:

  BPF_MAP_UPDATE_BATCH
  BPF_MAP_DELETE_BATCH

The main difference between update/delete and lookup/lookup_and_delete
batch ops is that for update/delete keys/values must be specified for
userspace and because of that, neither in_batch nor out_batch are used.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h      |  10 ++++
 include/uapi/linux/bpf.h |   2 +
 kernel/bpf/syscall.c     | 126 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 137 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 767a823dbac74..96a19e1fd2b5b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -46,6 +46,10 @@ struct bpf_map_ops {
 	int (*map_lookup_and_delete_batch)(struct bpf_map *map,
 					   const union bpf_attr *attr,
 					   union bpf_attr __user *uattr);
+	int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr,
+				union bpf_attr __user *uattr);
+	int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
+				union bpf_attr __user *uattr);
 
 	/* funcs callable from userspace and from eBPF programs */
 	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
@@ -808,6 +812,12 @@ int  generic_map_lookup_batch(struct bpf_map *map,
 int  generic_map_lookup_and_delete_batch(struct bpf_map *map,
 					 const union bpf_attr *attr,
 					 union bpf_attr __user *uattr);
+int  generic_map_update_batch(struct bpf_map *map,
+			      const union bpf_attr *attr,
+			      union bpf_attr __user *uattr);
+int  generic_map_delete_batch(struct bpf_map *map,
+			      const union bpf_attr *attr,
+			      union bpf_attr __user *uattr);
 
 extern int sysctl_unprivileged_bpf_disabled;
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e60b7b7cda61a..0f6ff0c4d79dd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -109,6 +109,8 @@ enum bpf_cmd {
 	BPF_BTF_GET_NEXT_ID,
 	BPF_MAP_LOOKUP_BATCH,
 	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
+	BPF_MAP_UPDATE_BATCH,
+	BPF_MAP_DELETE_BATCH,
 };
 
 enum bpf_map_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d0d3d0e0eaca4..06e1bcf40fb8d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1127,6 +1127,120 @@ static int map_get_next_key(union bpf_attr *attr)
 	return err;
 }
 
+int generic_map_delete_batch(struct bpf_map *map,
+			     const union bpf_attr *attr,
+			     union bpf_attr __user *uattr)
+{
+	void __user *keys = u64_to_user_ptr(attr->batch.keys);
+	int ufd = attr->map_fd;
+	u32 cp, max_count;
+	struct fd f;
+	void *key;
+	int err;
+
+	f = fdget(ufd);
+	if (attr->batch.elem_flags & ~BPF_F_LOCK)
+		return -EINVAL;
+
+	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
+	    !map_value_has_spin_lock(map)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	max_count = attr->batch.count;
+	if (!max_count)
+		return 0;
+
+	err = -ENOMEM;
+	for (cp = 0; cp < max_count; cp++) {
+		key = __bpf_copy_key(keys + cp * map->key_size, map->key_size);
+		if (IS_ERR(key)) {
+			err = PTR_ERR(key);
+			break;
+		}
+
+		if (err)
+			break;
+		if (bpf_map_is_dev_bound(map)) {
+			err = bpf_map_offload_delete_elem(map, key);
+			break;
+		}
+
+		preempt_disable();
+		__this_cpu_inc(bpf_prog_active);
+		rcu_read_lock();
+		err = map->ops->map_delete_elem(map, key);
+		rcu_read_unlock();
+		__this_cpu_dec(bpf_prog_active);
+		preempt_enable();
+		maybe_wait_bpf_programs(map);
+		if (err)
+			break;
+	}
+	if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
+		err = -EFAULT;
+err_put:
+	return err;
+}
+int generic_map_update_batch(struct bpf_map *map,
+			     const union bpf_attr *attr,
+			     union bpf_attr __user *uattr)
+{
+	void __user *values = u64_to_user_ptr(attr->batch.values);
+	void __user *keys = u64_to_user_ptr(attr->batch.keys);
+	u32 value_size, cp, max_count;
+	int ufd = attr->map_fd;
+	void *key, *value;
+	struct fd f;
+	int err;
+
+	f = fdget(ufd);
+	if (attr->batch.elem_flags & ~BPF_F_LOCK)
+		return -EINVAL;
+
+	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
+	    !map_value_has_spin_lock(map)) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
+	value_size = bpf_map_value_size(map);
+
+	max_count = attr->batch.count;
+	if (!max_count)
+		return 0;
+
+	err = -ENOMEM;
+	value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
+	if (!value)
+		goto err_put;
+
+	for (cp = 0; cp < max_count; cp++) {
+		key = __bpf_copy_key(keys + cp * map->key_size, map->key_size);
+		if (IS_ERR(key)) {
+			err = PTR_ERR(key);
+			break;
+		}
+		err = -EFAULT;
+		if (copy_from_user(value, values + cp * value_size, value_size))
+			break;
+
+		err = bpf_map_update_value(map, f, key, value,
+					   attr->batch.elem_flags);
+
+		if (err)
+			break;
+	}
+
+	if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
+		err = -EFAULT;
+
+	kfree(value);
+err_put:
+	return err;
+}
+
 static int __generic_map_lookup_batch(struct bpf_map *map,
 				      const union bpf_attr *attr,
 				      union bpf_attr __user *uattr,
@@ -3117,8 +3231,12 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
 
 	if (cmd == BPF_MAP_LOOKUP_BATCH)
 		BPF_DO_BATCH(map->ops->map_lookup_batch);
-	else
+	else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH)
 		BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch);
+	else if (cmd == BPF_MAP_UPDATE_BATCH)
+		BPF_DO_BATCH(map->ops->map_update_batch);
+	else
+		BPF_DO_BATCH(map->ops->map_delete_batch);
 
 err_put:
 	fdput(f);
@@ -3229,6 +3347,12 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 		err = bpf_map_do_batch(&attr, uattr,
 				       BPF_MAP_LOOKUP_AND_DELETE_BATCH);
 		break;
+	case BPF_MAP_UPDATE_BATCH:
+		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_UPDATE_BATCH);
+		break;
+	case BPF_MAP_DELETE_BATCH:
+		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_DELETE_BATCH);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 4/9] bpf: add lookup and updated batch ops to arraymap
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
                   ` (2 preceding siblings ...)
  2019-11-19 19:30 ` [PATCH v2 bpf-next 3/9] bpf: add generic support for update and delete " Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-19 19:30 ` [PATCH v2 bpf-next 5/9] bpf: add batch ops to all htab bpf map Brian Vazquez
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

This adds the generic batch ops functionality to bpf arraymap, note that
since deletion is not a valid operation for arraymap, only batch and
lookup are added.

Signed-off-by: Brian Vazquez <brianvv@google.com>
---
 kernel/bpf/arraymap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 1c65ce0098a95..680d4e99ef583 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -457,6 +457,8 @@ const struct bpf_map_ops array_map_ops = {
 	.map_direct_value_meta = array_map_direct_value_meta,
 	.map_seq_show_elem = array_map_seq_show_elem,
 	.map_check_btf = array_map_check_btf,
+	.map_lookup_batch = generic_map_lookup_batch,
+	.map_update_batch = generic_map_update_batch,
 };
 
 const struct bpf_map_ops percpu_array_map_ops = {
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 5/9] bpf: add batch ops to all htab bpf map
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
                   ` (3 preceding siblings ...)
  2019-11-19 19:30 ` [PATCH v2 bpf-next 4/9] bpf: add lookup and updated batch ops to arraymap Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-21 18:27   ` Yonghong Song
  2019-11-19 19:30 ` [PATCH v2 bpf-next 6/9] tools/bpf: sync uapi header bpf.h Brian Vazquez
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

From: Yonghong Song <yhs@fb.com>

htab can't use generic batch support due some problematic behaviours
inherent to the data structre, i.e. while iterating the bpf map  a
concurrent program might delete the next entry that batch was about to
use, in that case there's no easy solution to retrieve the next entry,
the issue has been discussed multiple times (see [1] and [2]).

The only way hmap can be traversed without the problem previously
exposed is by making sure that the map is traversing entire buckets.
This commit implements those strict requirements for hmap, the
implementation follows the same interaction that generic support with
some exceptions:

 - If keys/values buffer are not big enough to traverse a bucket,
   ENOSPC will be returned.
 - out_batch contains the value of the next bucket in the iteration, not
   the next key, but this is transparent for the user since the user
   should never use out_batch for other than bpf batch syscalls.

Note that only lookup and lookup_and_delete batch ops require the hmap
specific implementation, update/delete batch ops can be the generic
ones.

[1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
[2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
---
 kernel/bpf/hashtab.c | 244 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 244 insertions(+)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 22066a62c8c97..3402174b292ea 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -17,6 +17,17 @@
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
 	 BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
 
+#define BATCH_OPS(_name)			\
+	.map_lookup_batch =			\
+	_name##_map_lookup_batch,		\
+	.map_lookup_and_delete_batch =		\
+	_name##_map_lookup_and_delete_batch,	\
+	.map_update_batch =			\
+	generic_map_update_batch,		\
+	.map_delete_batch =			\
+	generic_map_delete_batch
+
+
 struct bucket {
 	struct hlist_nulls_head head;
 	raw_spinlock_t lock;
@@ -1232,6 +1243,235 @@ static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
 	rcu_read_unlock();
 }
 
+static int
+__htab_map_lookup_and_delete_batch(struct bpf_map *map,
+				   const union bpf_attr *attr,
+				   union bpf_attr __user *uattr,
+				   bool do_delete, bool is_lru_map,
+				   bool is_percpu)
+{
+	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+	u32 bucket_cnt, total, key_size, value_size, roundup_key_size;
+	void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val;
+	void __user *ukeys, *uvalues, *ubatch;
+	u64 elem_map_flags, map_flags;
+	struct hlist_nulls_head *head;
+	struct hlist_nulls_node *n;
+	u32 batch, max_count, size;
+	unsigned long flags;
+	struct htab_elem *l;
+	struct bucket *b;
+	int ret = 0;
+
+	max_count = attr->batch.count;
+	if (!max_count)
+		return 0;
+
+	elem_map_flags = attr->batch.elem_flags;
+	if ((elem_map_flags & ~BPF_F_LOCK) ||
+	    ((elem_map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
+		return -EINVAL;
+
+	map_flags = attr->batch.flags;
+	if (map_flags)
+		return -EINVAL;
+
+	batch = 0;
+	ubatch = u64_to_user_ptr(attr->batch.in_batch);
+	if (ubatch && copy_from_user(&batch, ubatch, sizeof(batch)))
+		return -EFAULT;
+
+	if (batch >= htab->n_buckets)
+		return -ENOENT;
+
+	/* We cannot do copy_from_user or copy_to_user inside
+	 * the rcu_read_lock. Allocate enough space here.
+	 */
+	key_size = htab->map.key_size;
+	roundup_key_size = round_up(htab->map.key_size, 8);
+	value_size = htab->map.value_size;
+	size = round_up(value_size, 8);
+	if (is_percpu)
+		value_size = size * num_possible_cpus();
+	keys = kvmalloc(key_size * max_count, GFP_USER | __GFP_NOWARN);
+	values = kvmalloc(value_size * max_count, GFP_USER | __GFP_NOWARN);
+	if (!keys || !values) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	dst_key = keys;
+	dst_val = values;
+	total = 0;
+
+	preempt_disable();
+	this_cpu_inc(bpf_prog_active);
+	rcu_read_lock();
+
+again:
+	b = &htab->buckets[batch];
+	head = &b->head;
+	raw_spin_lock_irqsave(&b->lock, flags);
+
+	bucket_cnt = 0;
+	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
+		bucket_cnt++;
+
+	if (bucket_cnt > (max_count - total)) {
+		if (total == 0)
+			ret = -ENOSPC;
+		goto after_loop;
+	}
+
+	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) {
+		memcpy(dst_key, l->key, key_size);
+
+		if (is_percpu) {
+			int off = 0, cpu;
+			void __percpu *pptr;
+
+			pptr = htab_elem_get_ptr(l, map->key_size);
+			for_each_possible_cpu(cpu) {
+				bpf_long_memcpy(dst_val + off,
+						per_cpu_ptr(pptr, cpu), size);
+				off += size;
+			}
+		} else {
+			value = l->key + roundup_key_size;
+			if (elem_map_flags & BPF_F_LOCK)
+				copy_map_value_locked(map, dst_val, value,
+						      true);
+			else
+				copy_map_value(map, dst_val, value);
+			check_and_init_map_lock(map, dst_val);
+		}
+		dst_key += key_size;
+		dst_val += value_size;
+		total++;
+	}
+
+	if (do_delete) {
+		hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) {
+			hlist_nulls_del_rcu(&l->hash_node);
+			if (is_lru_map)
+				bpf_lru_push_free(&htab->lru, &l->lru_node);
+			else
+				free_htab_elem(htab, l);
+		}
+	}
+
+	batch++;
+	if (batch >= htab->n_buckets) {
+		ret = -ENOENT;
+		goto after_loop;
+	}
+
+	raw_spin_unlock_irqrestore(&b->lock, flags);
+	goto again;
+
+after_loop:
+	raw_spin_unlock_irqrestore(&b->lock, flags);
+
+	rcu_read_unlock();
+	this_cpu_dec(bpf_prog_active);
+	preempt_enable();
+
+	if (ret && ret != -ENOENT)
+		goto out;
+
+	/* copy data back to user */
+	ukeys = u64_to_user_ptr(attr->batch.keys);
+	uvalues = u64_to_user_ptr(attr->batch.values);
+	ubatch = u64_to_user_ptr(attr->batch.out_batch);
+	if (copy_to_user(ubatch, &batch, sizeof(batch)) ||
+	    copy_to_user(ukeys, keys, total * key_size) ||
+	    copy_to_user(uvalues, values, total * value_size) ||
+	    put_user(total, &uattr->batch.count))
+		ret = -EFAULT;
+
+out:
+	kvfree(keys);
+	kvfree(values);
+	return ret;
+}
+
+static int
+htab_percpu_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+			     union bpf_attr __user *uattr)
+{
+	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
+						  false, true);
+}
+
+static int
+htab_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
+					const union bpf_attr *attr,
+					union bpf_attr __user *uattr)
+{
+	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
+						  false, true);
+}
+
+static int
+htab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+		      union bpf_attr __user *uattr)
+{
+	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
+						  false, false);
+}
+
+static int
+htab_map_lookup_and_delete_batch(struct bpf_map *map,
+				 const union bpf_attr *attr,
+				 union bpf_attr __user *uattr)
+{
+	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
+						  false, false);
+}
+
+static int
+htab_map_delete_batch(struct bpf_map *map,
+		      const union bpf_attr *attr,
+		      union bpf_attr __user *uattr)
+{
+	return generic_map_delete_batch(map, attr, uattr);
+}
+
+static int
+htab_lru_percpu_map_lookup_batch(struct bpf_map *map,
+				 const union bpf_attr *attr,
+				 union bpf_attr __user *uattr)
+{
+	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
+						  true, true);
+}
+
+static int
+htab_lru_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
+					    const union bpf_attr *attr,
+					    union bpf_attr __user *uattr)
+{
+	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
+						  true, true);
+}
+
+static int
+htab_lru_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
+			  union bpf_attr __user *uattr)
+{
+	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
+						  true, false);
+}
+
+static int
+htab_lru_map_lookup_and_delete_batch(struct bpf_map *map,
+				     const union bpf_attr *attr,
+				     union bpf_attr __user *uattr)
+{
+	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
+						  true, false);
+}
+
 const struct bpf_map_ops htab_map_ops = {
 	.map_alloc_check = htab_map_alloc_check,
 	.map_alloc = htab_map_alloc,
@@ -1242,6 +1482,7 @@ const struct bpf_map_ops htab_map_ops = {
 	.map_delete_elem = htab_map_delete_elem,
 	.map_gen_lookup = htab_map_gen_lookup,
 	.map_seq_show_elem = htab_map_seq_show_elem,
+	BATCH_OPS(htab),
 };
 
 const struct bpf_map_ops htab_lru_map_ops = {
@@ -1255,6 +1496,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
 	.map_delete_elem = htab_lru_map_delete_elem,
 	.map_gen_lookup = htab_lru_map_gen_lookup,
 	.map_seq_show_elem = htab_map_seq_show_elem,
+	BATCH_OPS(htab_lru),
 };
 
 /* Called from eBPF program */
@@ -1368,6 +1610,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
 	.map_update_elem = htab_percpu_map_update_elem,
 	.map_delete_elem = htab_map_delete_elem,
 	.map_seq_show_elem = htab_percpu_map_seq_show_elem,
+	BATCH_OPS(htab_percpu),
 };
 
 const struct bpf_map_ops htab_lru_percpu_map_ops = {
@@ -1379,6 +1622,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
 	.map_update_elem = htab_lru_percpu_map_update_elem,
 	.map_delete_elem = htab_lru_map_delete_elem,
 	.map_seq_show_elem = htab_percpu_map_seq_show_elem,
+	BATCH_OPS(htab_lru_percpu),
 };
 
 static int fd_htab_map_alloc_check(union bpf_attr *attr)
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 6/9] tools/bpf: sync uapi header bpf.h
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
                   ` (4 preceding siblings ...)
  2019-11-19 19:30 ` [PATCH v2 bpf-next 5/9] bpf: add batch ops to all htab bpf map Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-19 19:30 ` [PATCH v2 bpf-next 7/9] libbpf: add libbpf support to batch ops Brian Vazquez
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf

From: Yonghong Song <yhs@fb.com>

sync uapi header include/uapi/linux/bpf.h to
tools/include/uapi/linux/bpf.h

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/bpf.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4842a134b202a..0f6ff0c4d79dd 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -107,6 +107,10 @@ enum bpf_cmd {
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
 	BPF_MAP_FREEZE,
 	BPF_BTF_GET_NEXT_ID,
+	BPF_MAP_LOOKUP_BATCH,
+	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
+	BPF_MAP_UPDATE_BATCH,
+	BPF_MAP_DELETE_BATCH,
 };
 
 enum bpf_map_type {
@@ -400,6 +404,23 @@ union bpf_attr {
 		__u64		flags;
 	};
 
+	struct { /* struct used by BPF_MAP_*_BATCH commands */
+		__aligned_u64	in_batch;	/* start batch,
+						 * NULL to start from beginning
+						 */
+		__aligned_u64	out_batch;	/* output: next start batch */
+		__aligned_u64	keys;
+		__aligned_u64	values;
+		__u32		count;		/* input/output:
+						 * input: # of key/value
+						 * elements
+						 * output: # of filled elements
+						 */
+		__u32		map_fd;
+		__u64		elem_flags;
+		__u64		flags;
+	} batch;
+
 	struct { /* anonymous struct used by BPF_PROG_LOAD command */
 		__u32		prog_type;	/* one of enum bpf_prog_type */
 		__u32		insn_cnt;
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 7/9] libbpf: add libbpf support to batch ops
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
                   ` (5 preceding siblings ...)
  2019-11-19 19:30 ` [PATCH v2 bpf-next 6/9] tools/bpf: sync uapi header bpf.h Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-21 18:30   ` Yonghong Song
  2019-11-19 19:30 ` [PATCH v2 bpf-next 8/9] selftests/bpf: add batch ops testing for hmap and hmap_percpu Brian Vazquez
  2019-11-19 19:30 ` [PATCH v2 bpf-next 9/9] selftests/bpf: add batch ops testing to array bpf map Brian Vazquez
  8 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf

From: Yonghong Song <yhs@fb.com>

Added four libbpf API functions to support map batch operations:
  . int bpf_map_delete_batch( ... )
  . int bpf_map_lookup_batch( ... )
  . int bpf_map_lookup_and_delete_batch( ... )
  . int bpf_map_update_batch( ... )

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/bpf.c      | 61 ++++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/bpf.h      | 14 +++++++++
 tools/lib/bpf/libbpf.map |  4 +++
 3 files changed, 79 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 98596e15390fb..9acd9309b47b3 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -443,6 +443,67 @@ int bpf_map_freeze(int fd)
 	return sys_bpf(BPF_MAP_FREEZE, &attr, sizeof(attr));
 }
 
+static int bpf_map_batch_common(int cmd, int fd, void  *in_batch,
+				void *out_batch, void *keys, void *values,
+				__u32 *count, __u64 elem_flags,
+				__u64 flags)
+{
+	union bpf_attr attr = {};
+	int ret;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.batch.map_fd = fd;
+	attr.batch.in_batch = ptr_to_u64(in_batch);
+	attr.batch.out_batch = ptr_to_u64(out_batch);
+	attr.batch.keys = ptr_to_u64(keys);
+	attr.batch.values = ptr_to_u64(values);
+	if (count)
+		attr.batch.count = *count;
+	attr.batch.elem_flags = elem_flags;
+	attr.batch.flags = flags;
+
+	ret = sys_bpf(cmd, &attr, sizeof(attr));
+	if (count)
+		*count = attr.batch.count;
+
+	return ret;
+}
+
+int bpf_map_delete_batch(int fd, void *in_batch, void *out_batch, __u32 *count,
+			 __u64 elem_flags, __u64 flags)
+{
+	return bpf_map_batch_common(BPF_MAP_DELETE_BATCH, fd, in_batch,
+				    out_batch, NULL, NULL, count,
+				    elem_flags, flags);
+}
+
+int bpf_map_lookup_batch(int fd, void *in_batch, void *out_batch, void *keys,
+			 void *values, __u32 *count,
+			 __u64 elem_flags, __u64 flags)
+{
+	return bpf_map_batch_common(BPF_MAP_LOOKUP_BATCH, fd, in_batch,
+				    out_batch, keys, values, count,
+				    elem_flags, flags);
+}
+
+int bpf_map_lookup_and_delete_batch(int fd, void *in_batch, void *out_batch,
+				    void *keys, void *values,
+				    __u32 *count, __u64 elem_flags,
+				    __u64 flags)
+{
+	return bpf_map_batch_common(BPF_MAP_LOOKUP_AND_DELETE_BATCH,
+				    fd, in_batch, out_batch, keys, values,
+				    count, elem_flags, flags);
+}
+
+int bpf_map_update_batch(int fd, void *keys, void *values, __u32 *count,
+			 __u64 elem_flags, __u64 flags)
+{
+	return bpf_map_batch_common(BPF_MAP_UPDATE_BATCH,
+				    fd, NULL, NULL, keys, values,
+				    count, elem_flags, flags);
+}
+
 int bpf_obj_pin(int fd, const char *pathname)
 {
 	union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 3c791fa8e68e8..3ec63384400f1 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -126,6 +126,20 @@ LIBBPF_API int bpf_map_lookup_and_delete_elem(int fd, const void *key,
 LIBBPF_API int bpf_map_delete_elem(int fd, const void *key);
 LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
 LIBBPF_API int bpf_map_freeze(int fd);
+LIBBPF_API int bpf_map_delete_batch(int fd, void *in_batch, void *out_batch,
+				    __u32 *count, __u64 elem_flags,
+				    __u64 flags);
+LIBBPF_API int bpf_map_lookup_batch(int fd, void *in_batch, void *out_batch,
+				    void *keys, void *values, __u32 *count,
+				    __u64 elem_flags, __u64 flags);
+LIBBPF_API int bpf_map_lookup_and_delete_batch(int fd, void *in_batch,
+					       void *out_batch, void *keys,
+					       void *values, __u32 *count,
+					       __u64 elem_flags, __u64 flags);
+LIBBPF_API int bpf_map_update_batch(int fd, void *keys, void *values,
+				    __u32 *count, __u64 elem_flags,
+				    __u64 flags);
+
 LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
 LIBBPF_API int bpf_obj_get(const char *pathname);
 LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 8ddc2c40e482d..56462fea66f74 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -207,4 +207,8 @@ LIBBPF_0.0.6 {
 		bpf_program__size;
 		btf__find_by_name_kind;
 		libbpf_find_vmlinux_btf_id;
+		bpf_map_delete_batch;
+		bpf_map_lookup_and_delete_batch;
+		bpf_map_lookup_batch;
+		bpf_map_update_batch;
 } LIBBPF_0.0.5;
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 8/9] selftests/bpf: add batch ops testing for hmap and hmap_percpu
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
                   ` (6 preceding siblings ...)
  2019-11-19 19:30 ` [PATCH v2 bpf-next 7/9] libbpf: add libbpf support to batch ops Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-21 18:36   ` Yonghong Song
  2019-11-19 19:30 ` [PATCH v2 bpf-next 9/9] selftests/bpf: add batch ops testing to array bpf map Brian Vazquez
  8 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

From: Yonghong Song <yhs@fb.com>

Tested bpf_map_lookup_and_delete_batch() and bpf_map_update_batch()
functionality.
  $ ./test_maps
    ...
      test_hmap_lookup_and_delete_batch:PASS
      test_pcpu_hmap_lookup_and_delete_batch:PASS
    ...

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
---
 .../map_lookup_and_delete_batch_htab.c        | 257 ++++++++++++++++++
 1 file changed, 257 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c

diff --git a/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c
new file mode 100644
index 0000000000000..93e024cb85c60
--- /dev/null
+++ b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019 Facebook  */
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include <bpf_util.h>
+#include <test_maps.h>
+
+static void map_batch_update(int map_fd, __u32 max_entries, int *keys,
+			     void *values, bool is_pcpu)
+{
+	typedef BPF_DECLARE_PERCPU(int, value);
+	int i, j, err;
+	value *v;
+
+	if (is_pcpu)
+		v = (value *)values;
+
+	for (i = 0; i < max_entries; i++) {
+		keys[i] = i + 1;
+		if (is_pcpu)
+			for (j = 0; j < bpf_num_possible_cpus(); j++)
+				bpf_percpu(v[i], j) = i + 2 + j;
+		else
+			((int *)values)[i] = i + 2;
+	}
+
+	err = bpf_map_update_batch(map_fd, keys, values, &max_entries, 0, 0);
+	CHECK(err, "bpf_map_update_batch()", "error:%s\n", strerror(errno));
+}
+
+static void map_batch_verify(int *visited, __u32 max_entries,
+			     int *keys, void *values, bool is_pcpu)
+{
+	typedef BPF_DECLARE_PERCPU(int, value);
+	value *v;
+	int i, j;
+
+	if (is_pcpu)
+		v = (value *)values;
+
+	memset(visited, 0, max_entries * sizeof(*visited));
+	for (i = 0; i < max_entries; i++) {
+
+		if (is_pcpu) {
+			for (j = 0; j < bpf_num_possible_cpus(); j++) {
+				CHECK(keys[i] + 1 + j != bpf_percpu(v[i], j),
+				      "key/value checking",
+				      "error: i %d j %d key %d value %d\n",
+				      i, j, keys[i], bpf_percpu(v[i],  j));
+			}
+		} else {
+			CHECK(keys[i] + 1 != ((int *)values)[i],
+			      "key/value checking",
+			      "error: i %d key %d value %d\n", i, keys[i],
+			      ((int *)values)[i]);
+		}
+
+		visited[i] = 1;
+
+	}
+	for (i = 0; i < max_entries; i++) {
+		CHECK(visited[i] != 1, "visited checking",
+		      "error: keys array at index %d missing\n", i);
+	}
+}
+
+void __test_map_lookup_and_delete_batch(bool is_pcpu)
+{
+	int map_type = is_pcpu ? BPF_MAP_TYPE_PERCPU_HASH : BPF_MAP_TYPE_HASH;
+	struct bpf_create_map_attr xattr = {
+		.name = "hash_map",
+		.map_type = map_type,
+		.key_size = sizeof(int),
+		.value_size = sizeof(int),
+	};
+	typedef BPF_DECLARE_PERCPU(int, value);
+	int map_fd, *keys, *visited, key;
+	__u32 batch = 0, count, total, total_success;
+	const __u32 max_entries = 10;
+	int err, i, step, value_size;
+	value pcpu_values[10];
+	bool nospace_err;
+	void *values;
+
+	xattr.max_entries = max_entries;
+	map_fd = bpf_create_map_xattr(&xattr);
+	CHECK(map_fd == -1,
+	      "bpf_create_map_xattr()", "error:%s\n", strerror(errno));
+
+	value_size = is_pcpu ? sizeof(value) : sizeof(int);
+	keys = malloc(max_entries * sizeof(int));
+	if (is_pcpu)
+		values = pcpu_values;
+	else
+		values = malloc(max_entries * sizeof(int));
+	visited = malloc(max_entries * sizeof(int));
+	CHECK(!keys || !values || !visited, "malloc()",
+	      "error:%s\n", strerror(errno));
+
+	/* test 1: lookup/delete an empty hash table, -ENOENT */
+	count = max_entries;
+	err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
+					      values, &count, 0, 0);
+	CHECK((err && errno != ENOENT), "empty map",
+	      "error: %s\n", strerror(errno));
+
+	/* populate elements to the map */
+	map_batch_update(map_fd, max_entries, keys, values, is_pcpu);
+
+	/* test 2: lookup/delete with count = 0, success */
+	batch = 0;
+	count = 0;
+	err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
+					      values, &count, 0, 0);
+	CHECK(err, "count = 0", "error: %s\n", strerror(errno));
+
+	/* test 3: lookup/delete with count = max_entries, success */
+	memset(keys, 0, max_entries * sizeof(*keys));
+	memset(values, 0, max_entries * value_size);
+	count = max_entries;
+	batch = 0;
+	err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
+					      values, &count, 0, 0);
+	CHECK((err && errno != ENOENT), "count = max_entries",
+	       "error: %s\n", strerror(errno));
+	CHECK(count != max_entries, "count = max_entries",
+	      "count = %u, max_entries = %u\n", count, max_entries);
+	map_batch_verify(visited, max_entries, keys, values, is_pcpu);
+
+	/* bpf_map_get_next_key() should return -ENOENT for an empty map. */
+	err = bpf_map_get_next_key(map_fd, NULL, &key);
+	CHECK(!err, "bpf_map_get_next_key()", "error: %s\n", strerror(errno));
+
+	/* test 4: lookup/delete in a loop with various steps. */
+	total_success = 0;
+	for (step = 1; step < max_entries; step++) {
+		map_batch_update(map_fd, max_entries, keys, values, is_pcpu);
+		memset(keys, 0, max_entries * sizeof(*keys));
+		memset(values, 0, max_entries * value_size);
+		batch = 0;
+		total = 0;
+		i = 0;
+		/* iteratively lookup/delete elements with 'step'
+		 * elements each
+		 */
+		count = step;
+		nospace_err = false;
+		while (true) {
+			err = bpf_map_lookup_batch(map_fd,
+						   total ? &batch : NULL,
+						   &batch, keys + total,
+						   values +
+						   total * value_size,
+						   &count, 0, 0);
+			/* It is possible that we are failing due to buffer size
+			 * not big enough. In such cases, let us just exit and
+			 * go with large steps. Not that a buffer size with
+			 * max_entries should always work.
+			 */
+			if (err && errno == ENOSPC) {
+				nospace_err = true;
+				break;
+			}
+
+
+			CHECK((err && errno != ENOENT), "lookup with steps",
+			      "error: %s\n", strerror(errno));
+
+			total += count;
+			if (err)
+				break;
+
+			i++;
+		}
+		if (nospace_err == true)
+			continue;
+
+		CHECK(total != max_entries, "lookup with steps",
+		      "total = %u, max_entries = %u\n", total, max_entries);
+		map_batch_verify(visited, max_entries, keys, values, is_pcpu);
+
+		memset(keys, 0, max_entries * sizeof(*keys));
+		memset(values, 0, max_entries * value_size);
+		batch = 0;
+		total = 0;
+		i = 0;
+		/* iteratively lookup/delete elements with 'step'
+		 * elements each
+		 */
+		count = step;
+		nospace_err = false;
+		while (true) {
+			err = bpf_map_lookup_and_delete_batch(map_fd,
+							total ? &batch : NULL,
+							&batch, keys + total,
+							values +
+							total * value_size,
+							&count, 0, 0);
+			/* It is possible that we are failing due to buffer size
+			 * not big enough. In such cases, let us just exit and
+			 * go with large steps. Not that a buffer size with
+			 * max_entries should always work.
+			 */
+			if (err && errno == ENOSPC) {
+				nospace_err = true;
+				break;
+			}
+
+			CHECK((err && errno != ENOENT), "lookup with steps",
+			      "error: %s\n", strerror(errno));
+
+			total += count;
+			if (err)
+				break;
+			i++;
+		}
+
+		if (nospace_err == true)
+			continue;
+
+		CHECK(total != max_entries, "lookup/delete with steps",
+		      "total = %u, max_entries = %u\n", total, max_entries);
+
+		map_batch_verify(visited, max_entries, keys, values, is_pcpu);
+		err = bpf_map_get_next_key(map_fd, NULL, &key);
+		CHECK(!err, "bpf_map_get_next_key()", "error: %s\n",
+		      strerror(errno));
+
+		total_success++;
+	}
+
+	CHECK(total_success == 0, "check total_success",
+	      "unexpected failure\n");
+}
+
+void test_hmap_lookup_and_delete_batch(void)
+{
+	__test_map_lookup_and_delete_batch(false);
+	printf("%s:PASS\n", __func__);
+}
+
+void test_pcpu_hmap_lookup_and_delete_batch(void)
+{
+	__test_map_lookup_and_delete_batch(true);
+	printf("%s:PASS\n", __func__);
+}
+
+void test_map_lookup_and_delete_batch_htab(void)
+{
+	test_hmap_lookup_and_delete_batch();
+	test_pcpu_hmap_lookup_and_delete_batch();
+}
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 bpf-next 9/9] selftests/bpf: add batch ops testing to array bpf map
  2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
                   ` (7 preceding siblings ...)
  2019-11-19 19:30 ` [PATCH v2 bpf-next 8/9] selftests/bpf: add batch ops testing for hmap and hmap_percpu Brian Vazquez
@ 2019-11-19 19:30 ` Brian Vazquez
  2019-11-21 18:43   ` Yonghong Song
  8 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-19 19:30 UTC (permalink / raw)
  To: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

Tested bpf_map_lookup_batch() and bpf_map_update_batch()
functionality.

  $ ./test_maps
      ...
        test_map_lookup_and_delete_batch_array:PASS
      ...

Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../map_lookup_and_delete_batch_array.c       | 119 ++++++++++++++++++
 1 file changed, 119 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c

diff --git a/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
new file mode 100644
index 0000000000000..cbec72ad38609
--- /dev/null
+++ b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
@@ -0,0 +1,119 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include <test_maps.h>
+
+static void map_batch_update(int map_fd, __u32 max_entries, int *keys,
+			     int *values)
+{
+	int i, err;
+
+	for (i = 0; i < max_entries; i++) {
+		keys[i] = i;
+		values[i] = i + 1;
+	}
+
+	err = bpf_map_update_batch(map_fd, keys, values, &max_entries, 0, 0);
+	CHECK(err, "bpf_map_update_batch()", "error:%s\n", strerror(errno));
+}
+
+static void map_batch_verify(int *visited, __u32 max_entries,
+			     int *keys, int *values)
+{
+	int i;
+
+	memset(visited, 0, max_entries * sizeof(*visited));
+	for (i = 0; i < max_entries; i++) {
+		CHECK(keys[i] + 1 != values[i], "key/value checking",
+		      "error: i %d key %d value %d\n", i, keys[i], values[i]);
+		visited[i] = 1;
+	}
+	for (i = 0; i < max_entries; i++) {
+		CHECK(visited[i] != 1, "visited checking",
+		      "error: keys array at index %d missing\n", i);
+	}
+}
+
+void test_map_lookup_and_delete_batch_array(void)
+{
+	struct bpf_create_map_attr xattr = {
+		.name = "array_map",
+		.map_type = BPF_MAP_TYPE_ARRAY,
+		.key_size = sizeof(int),
+		.value_size = sizeof(int),
+	};
+	int map_fd, *keys, *values, *visited;
+	__u32 count, total, total_success;
+	const __u32 max_entries = 10;
+	int err, i, step;
+	bool nospace_err;
+	__u64 batch = 0;
+
+	xattr.max_entries = max_entries;
+	map_fd = bpf_create_map_xattr(&xattr);
+	CHECK(map_fd == -1,
+	      "bpf_create_map_xattr()", "error:%s\n", strerror(errno));
+
+	keys = malloc(max_entries * sizeof(int));
+	values = malloc(max_entries * sizeof(int));
+	visited = malloc(max_entries * sizeof(int));
+	CHECK(!keys || !values || !visited, "malloc()", "error:%s\n",
+	      strerror(errno));
+
+	/* populate elements to the map */
+	map_batch_update(map_fd, max_entries, keys, values);
+
+	/* test 1: lookup in a loop with various steps. */
+	total_success = 0;
+	for (step = 1; step < max_entries; step++) {
+		map_batch_update(map_fd, max_entries, keys, values);
+		memset(keys, 0, max_entries * sizeof(*keys));
+		memset(values, 0, max_entries * sizeof(*values));
+		batch = 0;
+		total = 0;
+		i = 0;
+		/* iteratively lookup/delete elements with 'step'
+		 * elements each.
+		 */
+		count = step;
+		nospace_err = false;
+		while (true) {
+			err = bpf_map_lookup_batch(map_fd,
+						total ? &batch : NULL, &batch,
+						keys + total,
+						values + total,
+						&count, 0, 0);
+
+			CHECK((err && errno != ENOENT), "lookup with steps",
+			      "error: %s\n", strerror(errno));
+
+			total += count;
+
+			if (err)
+				break;
+
+			i++;
+		}
+
+		if (nospace_err == true)
+			continue;
+
+		CHECK(total != max_entries, "lookup with steps",
+		      "total = %u, max_entries = %u\n", total, max_entries);
+
+		map_batch_verify(visited, max_entries, keys, values);
+
+		total_success++;
+	}
+
+	CHECK(total_success == 0, "check total_success",
+	      "unexpected failure\n");
+
+	printf("%s:PASS\n", __func__);
+}
-- 
2.24.0.432.g9d3f5f5b63-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops
  2019-11-19 19:30 ` [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops Brian Vazquez
@ 2019-11-21 17:36   ` Yonghong Song
  2019-11-21 21:36     ` Brian Vazquez
  2019-11-22 17:25   ` John Fastabend
  1 sibling, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2019-11-21 17:36 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: Stanislav Fomichev, Petar Penkov, Willem de Bruijn, linux-kernel,
	netdev, bpf



On 11/19/19 11:30 AM, Brian Vazquez wrote:
> This commit introduces generic support for the bpf_map_lookup_batch and
> bpf_map_lookup_and_delete_batch ops. This implementation can be used by
> almost all the bpf maps since its core implementation is relying on the
> existing map_get_next_key, map_lookup_elem and map_delete_elem
> functions. The bpf syscall subcommands introduced are:
> 
>    BPF_MAP_LOOKUP_BATCH
>    BPF_MAP_LOOKUP_AND_DELETE_BATCH
> 
> The UAPI attribute is:
> 
>    struct { /* struct used by BPF_MAP_*_BATCH commands */
>           __aligned_u64   in_batch;       /* start batch,
>                                            * NULL to start from beginning
>                                            */
>           __aligned_u64   out_batch;      /* output: next start batch */
>           __aligned_u64   keys;
>           __aligned_u64   values;
>           __u32           count;          /* input/output:
>                                            * input: # of key/value
>                                            * elements
>                                            * output: # of filled elements
>                                            */
>           __u32           map_fd;
>           __u64           elem_flags;
>           __u64           flags;
>    } batch;
> 
> in_batch/out_batch are opaque values use to communicate between
> user/kernel space, in_batch/out_batch must be of key_size length.
> 
> To start iterating from the beginning in_batch must be null,
> count is the # of key/value elements to retrieve. Note that the 'keys'
> buffer must be a buffer of key_size * count size and the 'values' buffer
> must be value_size * count, where value_size must be aligned to 8 bytes
> by userspace if it's dealing with percpu maps. 'count' will contain the
> number of keys/values successfully retrieved. Note that 'count' is an
> input/output variable and it can contain a lower value after a call.
> 
> If there's no more entries to retrieve, ENOENT will be returned. If error
> is ENOENT, count might be > 0 in case it copied some values but there were
> no more entries to retrieve.
> 
> Note that if the return code is an error and not -EFAULT,
> count indicates the number of elements successfully processed.
> 
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>   include/linux/bpf.h      |  11 +++
>   include/uapi/linux/bpf.h |  19 +++++
>   kernel/bpf/syscall.c     | 176 +++++++++++++++++++++++++++++++++++++++
>   3 files changed, 206 insertions(+)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5b81cde47314e..767a823dbac74 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -41,6 +41,11 @@ struct bpf_map_ops {
>   	int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
>   	void (*map_release_uref)(struct bpf_map *map);
>   	void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
> +	int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
> +				union bpf_attr __user *uattr);
> +	int (*map_lookup_and_delete_batch)(struct bpf_map *map,
> +					   const union bpf_attr *attr,
> +					   union bpf_attr __user *uattr);
>   
>   	/* funcs callable from userspace and from eBPF programs */
>   	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
> @@ -797,6 +802,12 @@ void bpf_map_charge_move(struct bpf_map_memory *dst,
>   void *bpf_map_area_alloc(size_t size, int numa_node);
>   void bpf_map_area_free(void *base);
>   void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> +int  generic_map_lookup_batch(struct bpf_map *map,
> +			      const union bpf_attr *attr,
> +			      union bpf_attr __user *uattr);
> +int  generic_map_lookup_and_delete_batch(struct bpf_map *map,
> +					 const union bpf_attr *attr,
> +					 union bpf_attr __user *uattr);
>   
>   extern int sysctl_unprivileged_bpf_disabled;
>   
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4842a134b202a..e60b7b7cda61a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -107,6 +107,8 @@ enum bpf_cmd {
>   	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
>   	BPF_MAP_FREEZE,
>   	BPF_BTF_GET_NEXT_ID,
> +	BPF_MAP_LOOKUP_BATCH,
> +	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
>   };
>   
>   enum bpf_map_type {
> @@ -400,6 +402,23 @@ union bpf_attr {
>   		__u64		flags;
>   	};
>   
> +	struct { /* struct used by BPF_MAP_*_BATCH commands */
> +		__aligned_u64	in_batch;	/* start batch,
> +						 * NULL to start from beginning
> +						 */
> +		__aligned_u64	out_batch;	/* output: next start batch */
> +		__aligned_u64	keys;
> +		__aligned_u64	values;
> +		__u32		count;		/* input/output:
> +						 * input: # of key/value
> +						 * elements
> +						 * output: # of filled elements
> +						 */
> +		__u32		map_fd;
> +		__u64		elem_flags;
> +		__u64		flags;
> +	} batch;
> +
>   	struct { /* anonymous struct used by BPF_PROG_LOAD command */
>   		__u32		prog_type;	/* one of enum bpf_prog_type */
>   		__u32		insn_cnt;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index cc714c9d5b4cc..d0d3d0e0eaca4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1127,6 +1127,124 @@ static int map_get_next_key(union bpf_attr *attr)
>   	return err;
>   }
>   
> +static int __generic_map_lookup_batch(struct bpf_map *map,
> +				      const union bpf_attr *attr,
> +				      union bpf_attr __user *uattr,
> +				      bool do_delete)
> +{
> +	void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
> +	void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch);
> +	void __user *values = u64_to_user_ptr(attr->batch.values);
> +	void __user *keys = u64_to_user_ptr(attr->batch.keys);
> +	void *buf, *prev_key, *key, *value;
> +	u32 value_size, cp, max_count;
> +	bool first_key = false;
> +	int err, retry = 3;
> +
> +	if (attr->batch.elem_flags & ~BPF_F_LOCK)
> +		return -EINVAL;
> +
> +	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> +	    !map_value_has_spin_lock(map)) {
> +		err = -EINVAL;
> +		goto err_put;
> +	}

Direct return -EINVAL?

> +
> +	if (map->map_type == BPF_MAP_TYPE_QUEUE ||
> +	    map->map_type == BPF_MAP_TYPE_STACK) {
> +		err = -ENOTSUPP;
> +		goto err_put;
> +	}

Direct return -ENOTSUPP?

> +
> +	value_size = bpf_map_value_size(map);
> +
> +	max_count = attr->batch.count;
> +	if (!max_count)
> +		return 0;
> +
> +	err = -ENOMEM;
> +	buf = kmalloc(map->key_size + value_size, GFP_USER | __GFP_NOWARN);
> +	if (!buf)
> +		goto err_put;

Direct return -ENOMEM?

> +
> +	err = -EFAULT;
> +	first_key = false;
> +	if (ubatch && copy_from_user(buf, ubatch, map->key_size))
> +		goto free_buf;
> +	key = buf;
> +	value = key + map->key_size;
> +	if (!ubatch) {
> +		prev_key = NULL;
> +		first_key = true;
> +	}
> +
> +
One extra line.

> +	for (cp = 0; cp < max_count; cp++) {
> +		if (cp || first_key) {
> +			rcu_read_lock();
> +			err = map->ops->map_get_next_key(map, prev_key, key);
> +			rcu_read_unlock();
> +			if (err)
> +				break;
> +		}
> +		err = bpf_map_copy_value(map, key, value,
> +					 attr->batch.elem_flags, do_delete);
> +
> +		if (err == -ENOENT) {
> +			if (retry) {
> +				retry--;

What is the 'retry' semantics here? After 'continue', cp++ is executed.

> +				continue;
> +			}
> +			err = -EINTR;

Why returning -EINTR?

> +			break;
> +		}
> +
> +		if (err)
> +			goto free_buf;
> +
> +		if (copy_to_user(keys + cp * map->key_size, key,
> +				 map->key_size)) {
> +			err = -EFAULT;
> +			goto free_buf;
> +		}
> +		if (copy_to_user(values + cp * value_size, value, value_size)) {
> +			err = -EFAULT;
> +			goto free_buf;
> +		}
> +
> +		prev_key = key;
> +		retry = 3;
> +	}
> +	if (!err) {
> +		rcu_read_lock();
> +		err = map->ops->map_get_next_key(map, prev_key, key);

if err != 0, the 'key' will be invalid and it cannot be used by below 
copy_to_user.

> +		rcu_read_unlock();
> +	} > +
> +	if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) ||

The 'cp' may not be accurate if 'retry' is triggered in the above.

> +		    (copy_to_user(uobatch, key, map->key_size))))
> +		err = -EFAULT;
> +
> +free_buf:
> +	kfree(buf);
> +err_put:

err_put can be removed.

> +	return err;
> +}
> +
> +int generic_map_lookup_batch(struct bpf_map *map,
> +			     const union bpf_attr *attr,
> +			     union bpf_attr __user *uattr)
> +{
> +	return __generic_map_lookup_batch(map, attr, uattr, false);
> +}
> +
> +int generic_map_lookup_and_delete_batch(struct bpf_map *map,
> +					const union bpf_attr *attr,
> +					union bpf_attr __user *uattr)
> +{
> +	return __generic_map_lookup_batch(map, attr, uattr, true);
> +}
> +
>   #define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value
>   
>   static int map_lookup_and_delete_elem(union bpf_attr *attr)
> @@ -2956,6 +3074,57 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>   	return err;
>   }
>   
> +#define BPF_MAP_BATCH_LAST_FIELD batch.flags
> +
> +#define BPF_DO_BATCH(fn)			\
> +	do {					\
> +		if (!fn) {			\
> +			err = -ENOTSUPP;	\
> +			goto err_put;		\
> +		}				\
> +		err = fn(map, attr, uattr);	\
> +	} while (0)
> +
> +static int bpf_map_do_batch(const union bpf_attr *attr,
> +			    union bpf_attr __user *uattr,
> +			    int cmd)
> +{
> +	struct bpf_map *map;
> +	int err, ufd;
> +	struct fd f;
> +
> +	if (CHECK_ATTR(BPF_MAP_BATCH))
> +		return -EINVAL;
> +
> +	ufd = attr->batch.map_fd;
> +	f = fdget(ufd);
> +	map = __bpf_map_get(f);
> +	if (IS_ERR(map))
> +		return PTR_ERR(map);
> +
> +	if ((cmd == BPF_MAP_LOOKUP_BATCH ||
> +	     cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) &&
> +	    !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
> +		err = -EPERM;
> +		goto err_put;
> +	}
> +
> +	if (cmd != BPF_MAP_LOOKUP_BATCH &&

Here should be BPF_MAP_LOOKUP_AND_DELETE_BATCH.
BPF_MAP_LOOKUP_BATCH does not need FMODE_CAN_WRITE.

> +	    !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
> +		err = -EPERM;
> +		goto err_put;
> +	}
> +
> +	if (cmd == BPF_MAP_LOOKUP_BATCH)
> +		BPF_DO_BATCH(map->ops->map_lookup_batch);
> +	else
> +		BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch);
> +
> +err_put:
> +	fdput(f);
> +	return err;
> +}
> +
>   SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
>   {
>   	union bpf_attr attr = {};
> @@ -3053,6 +3222,13 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>   	case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
>   		err = map_lookup_and_delete_elem(&attr);
>   		break;
> +	case BPF_MAP_LOOKUP_BATCH:
> +		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_LOOKUP_BATCH);
> +		break;
> +	case BPF_MAP_LOOKUP_AND_DELETE_BATCH:
> +		err = bpf_map_do_batch(&attr, uattr,
> +				       BPF_MAP_LOOKUP_AND_DELETE_BATCH);
> +		break;
>   	default:
>   		err = -EINVAL;
>   		break;
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 3/9] bpf: add generic support for update and delete batch ops
  2019-11-19 19:30 ` [PATCH v2 bpf-next 3/9] bpf: add generic support for update and delete " Brian Vazquez
@ 2019-11-21 18:00   ` Yonghong Song
  2019-11-22  5:50     ` Brian Vazquez
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2019-11-21 18:00 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: Stanislav Fomichev, Petar Penkov, Willem de Bruijn, linux-kernel,
	netdev, bpf



On 11/19/19 11:30 AM, Brian Vazquez wrote:
> This commit adds generic support for update and delete batch ops that
> can be used for almost all the bpf maps. These commands share the same
> UAPI attr that lookup and lookup_and_delete batch ops use and the
> syscall commands are:
> 
>    BPF_MAP_UPDATE_BATCH
>    BPF_MAP_DELETE_BATCH
> 
> The main difference between update/delete and lookup/lookup_and_delete
> batch ops is that for update/delete keys/values must be specified for
> userspace and because of that, neither in_batch nor out_batch are used.
> 
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>   include/linux/bpf.h      |  10 ++++
>   include/uapi/linux/bpf.h |   2 +
>   kernel/bpf/syscall.c     | 126 ++++++++++++++++++++++++++++++++++++++-
>   3 files changed, 137 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 767a823dbac74..96a19e1fd2b5b 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -46,6 +46,10 @@ struct bpf_map_ops {
>   	int (*map_lookup_and_delete_batch)(struct bpf_map *map,
>   					   const union bpf_attr *attr,
>   					   union bpf_attr __user *uattr);
> +	int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr,
> +				union bpf_attr __user *uattr);
> +	int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
> +				union bpf_attr __user *uattr);
>   
>   	/* funcs callable from userspace and from eBPF programs */
>   	void *(*map_lookup_elem)(struct bpf_map *map, void *key);
> @@ -808,6 +812,12 @@ int  generic_map_lookup_batch(struct bpf_map *map,
>   int  generic_map_lookup_and_delete_batch(struct bpf_map *map,
>   					 const union bpf_attr *attr,
>   					 union bpf_attr __user *uattr);
> +int  generic_map_update_batch(struct bpf_map *map,
> +			      const union bpf_attr *attr,
> +			      union bpf_attr __user *uattr);
> +int  generic_map_delete_batch(struct bpf_map *map,
> +			      const union bpf_attr *attr,
> +			      union bpf_attr __user *uattr);
>   
>   extern int sysctl_unprivileged_bpf_disabled;
>   
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index e60b7b7cda61a..0f6ff0c4d79dd 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -109,6 +109,8 @@ enum bpf_cmd {
>   	BPF_BTF_GET_NEXT_ID,
>   	BPF_MAP_LOOKUP_BATCH,
>   	BPF_MAP_LOOKUP_AND_DELETE_BATCH,
> +	BPF_MAP_UPDATE_BATCH,
> +	BPF_MAP_DELETE_BATCH,
>   };
>   
>   enum bpf_map_type {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index d0d3d0e0eaca4..06e1bcf40fb8d 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1127,6 +1127,120 @@ static int map_get_next_key(union bpf_attr *attr)
>   	return err;
>   }
>   
> +int generic_map_delete_batch(struct bpf_map *map,
> +			     const union bpf_attr *attr,
> +			     union bpf_attr __user *uattr)
> +{
> +	void __user *keys = u64_to_user_ptr(attr->batch.keys);
> +	int ufd = attr->map_fd;
> +	u32 cp, max_count;
> +	struct fd f;
> +	void *key;
> +	int err;
> +
> +	f = fdget(ufd);
> +	if (attr->batch.elem_flags & ~BPF_F_LOCK)
> +		return -EINVAL;
> +
> +	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> +	    !map_value_has_spin_lock(map)) {
> +		err = -EINVAL;
> +		goto err_put;

Just return -EINVAL?

> +	}
> +
> +	max_count = attr->batch.count;
> +	if (!max_count)
> +		return 0;
> +
> +	err = -ENOMEM;

Why initialize err to -ENOMEM? Maybe just err = 0.

> +	for (cp = 0; cp < max_count; cp++) {
> +		key = __bpf_copy_key(keys + cp * map->key_size, map->key_size);
> +		if (IS_ERR(key)) {
> +			err = PTR_ERR(key);
> +			break;
> +		}
> +
> +		if (err)
> +			break;

The above is incorrect, esp. if you assign err initial value to -ENOMEM.
The above ` if (err) break; ` is not really needed. If there is error,
you already break in the above.
If map->key_size is not 0, the return value 'key' cannot be NULL pointer.

> +		if (bpf_map_is_dev_bound(map)) {
> +			err = bpf_map_offload_delete_elem(map, key);
> +			break;
> +		}
> +
> +		preempt_disable();
> +		__this_cpu_inc(bpf_prog_active);
> +		rcu_read_lock();
> +		err = map->ops->map_delete_elem(map, key);
> +		rcu_read_unlock();
> +		__this_cpu_dec(bpf_prog_active);
> +		preempt_enable();
> +		maybe_wait_bpf_programs(map);
> +		if (err)
> +			break;
> +	}
> +	if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
> +		err = -EFAULT;

If previous err = -EFAULT, even if copy_to_user() succeeded,
return value will be -EFAULT, so uattr->batch.count cannot be
trusted. So may be do
    if (err != -EFAULT && copy_to_user(...))
       err = -EFAULT
?
There are several other places like this.

> +err_put:

You don't need err_put label in the above.

> +	return err;
> +}
> +int generic_map_update_batch(struct bpf_map *map,
> +			     const union bpf_attr *attr,
> +			     union bpf_attr __user *uattr)
> +{
> +	void __user *values = u64_to_user_ptr(attr->batch.values);
> +	void __user *keys = u64_to_user_ptr(attr->batch.keys);
> +	u32 value_size, cp, max_count;
> +	int ufd = attr->map_fd;
> +	void *key, *value;
> +	struct fd f;
> +	int err;
> +
> +	f = fdget(ufd);
> +	if (attr->batch.elem_flags & ~BPF_F_LOCK)
> +		return -EINVAL;
> +
> +	if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> +	    !map_value_has_spin_lock(map)) {
> +		err = -EINVAL;
> +		goto err_put;

Directly return -EINVAL?

> +	}
> +
> +	value_size = bpf_map_value_size(map);
> +
> +	max_count = attr->batch.count;
> +	if (!max_count)
> +		return 0;
> +
> +	err = -ENOMEM;
> +	value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
> +	if (!value)
> +		goto err_put;

Directly return -ENOMEM?

> +
> +	for (cp = 0; cp < max_count; cp++) {
> +		key = __bpf_copy_key(keys + cp * map->key_size, map->key_size);

Do you need to free 'key' after its use?

> +		if (IS_ERR(key)) {
> +			err = PTR_ERR(key);
> +			break;
> +		}
> +		err = -EFAULT;
> +		if (copy_from_user(value, values + cp * value_size, value_size))
> +			break;
> +
> +		err = bpf_map_update_value(map, f, key, value,
> +					   attr->batch.elem_flags);
> +
> +		if (err)
> +			break;
> +	}
> +
> +	if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
> +		err = -EFAULT;

Similar to the above comment, if err already -EFAULT, no need
to do copy_to_user().

> +
> +	kfree(value);
> +err_put:

err_put label is not needed.

> +	return err;
> +}
> +
>   static int __generic_map_lookup_batch(struct bpf_map *map,
>   				      const union bpf_attr *attr,
>   				      union bpf_attr __user *uattr,
> @@ -3117,8 +3231,12 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
>   
>   	if (cmd == BPF_MAP_LOOKUP_BATCH)
>   		BPF_DO_BATCH(map->ops->map_lookup_batch);
> -	else
> +	else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH)
>   		BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch);
> +	else if (cmd == BPF_MAP_UPDATE_BATCH)
> +		BPF_DO_BATCH(map->ops->map_update_batch);
> +	else
> +		BPF_DO_BATCH(map->ops->map_delete_batch);

Also need to check map_get_sys_perms() permissions for these two new 
commands. Both delete and update needs FMODE_CAN_WRITE permission.

>   
>   err_put:
>   	fdput(f);
> @@ -3229,6 +3347,12 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
>   		err = bpf_map_do_batch(&attr, uattr,
>   				       BPF_MAP_LOOKUP_AND_DELETE_BATCH);
>   		break;
> +	case BPF_MAP_UPDATE_BATCH:
> +		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_UPDATE_BATCH);
> +		break;
> +	case BPF_MAP_DELETE_BATCH:
> +		err = bpf_map_do_batch(&attr, uattr, BPF_MAP_DELETE_BATCH);
> +		break;
>   	default:
>   		err = -EINVAL;
>   		break;
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 5/9] bpf: add batch ops to all htab bpf map
  2019-11-19 19:30 ` [PATCH v2 bpf-next 5/9] bpf: add batch ops to all htab bpf map Brian Vazquez
@ 2019-11-21 18:27   ` Yonghong Song
  2019-11-21 21:27     ` Brian Vazquez
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2019-11-21 18:27 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: Stanislav Fomichev, Petar Penkov, Willem de Bruijn, linux-kernel,
	netdev, bpf



On 11/19/19 11:30 AM, Brian Vazquez wrote:
> From: Yonghong Song <yhs@fb.com>
> 
> htab can't use generic batch support due some problematic behaviours
> inherent to the data structre, i.e. while iterating the bpf map  a
> concurrent program might delete the next entry that batch was about to
> use, in that case there's no easy solution to retrieve the next entry,
> the issue has been discussed multiple times (see [1] and [2]).
> 
> The only way hmap can be traversed without the problem previously
> exposed is by making sure that the map is traversing entire buckets.
> This commit implements those strict requirements for hmap, the
> implementation follows the same interaction that generic support with
> some exceptions:
> 
>   - If keys/values buffer are not big enough to traverse a bucket,
>     ENOSPC will be returned.
>   - out_batch contains the value of the next bucket in the iteration, not
>     the next key, but this is transparent for the user since the user
>     should never use out_batch for other than bpf batch syscalls.
> 
> Note that only lookup and lookup_and_delete batch ops require the hmap
> specific implementation, update/delete batch ops can be the generic
> ones.
> 
> [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
> [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> ---
>   kernel/bpf/hashtab.c | 244 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 244 insertions(+)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 22066a62c8c97..3402174b292ea 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -17,6 +17,17 @@
>   	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
>   	 BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
>   
> +#define BATCH_OPS(_name)			\
> +	.map_lookup_batch =			\
> +	_name##_map_lookup_batch,		\
> +	.map_lookup_and_delete_batch =		\
> +	_name##_map_lookup_and_delete_batch,	\
> +	.map_update_batch =			\
> +	generic_map_update_batch,		\
> +	.map_delete_batch =			\
> +	generic_map_delete_batch
> +
> +
>   struct bucket {
>   	struct hlist_nulls_head head;
>   	raw_spinlock_t lock;
> @@ -1232,6 +1243,235 @@ static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
>   	rcu_read_unlock();
>   }
>   
> +static int
> +__htab_map_lookup_and_delete_batch(struct bpf_map *map,
> +				   const union bpf_attr *attr,
> +				   union bpf_attr __user *uattr,
> +				   bool do_delete, bool is_lru_map,
> +				   bool is_percpu)
> +{
> +	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> +	u32 bucket_cnt, total, key_size, value_size, roundup_key_size;
> +	void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val;
> +	void __user *ukeys, *uvalues, *ubatch;
> +	u64 elem_map_flags, map_flags;
> +	struct hlist_nulls_head *head;
> +	struct hlist_nulls_node *n;
> +	u32 batch, max_count, size;
> +	unsigned long flags;
> +	struct htab_elem *l;
> +	struct bucket *b;
> +	int ret = 0;
> +
> +	max_count = attr->batch.count;
> +	if (!max_count)
> +		return 0;
> +
> +	elem_map_flags = attr->batch.elem_flags;
> +	if ((elem_map_flags & ~BPF_F_LOCK) ||
> +	    ((elem_map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
> +		return -EINVAL;
> +
> +	map_flags = attr->batch.flags;
> +	if (map_flags)
> +		return -EINVAL;
> +
> +	batch = 0;
> +	ubatch = u64_to_user_ptr(attr->batch.in_batch);
> +	if (ubatch && copy_from_user(&batch, ubatch, sizeof(batch)))
> +		return -EFAULT;
> +
> +	if (batch >= htab->n_buckets)
> +		return -ENOENT;
> +
> +	/* We cannot do copy_from_user or copy_to_user inside
> +	 * the rcu_read_lock. Allocate enough space here.
> +	 */
> +	key_size = htab->map.key_size;
> +	roundup_key_size = round_up(htab->map.key_size, 8);
> +	value_size = htab->map.value_size;
> +	size = round_up(value_size, 8);
> +	if (is_percpu)
> +		value_size = size * num_possible_cpus();
> +	keys = kvmalloc(key_size * max_count, GFP_USER | __GFP_NOWARN);
> +	values = kvmalloc(value_size * max_count, GFP_USER | __GFP_NOWARN);
> +	if (!keys || !values) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	dst_key = keys;
> +	dst_val = values;
> +	total = 0;
> +
> +	preempt_disable();
> +	this_cpu_inc(bpf_prog_active);
> +	rcu_read_lock();
> +
> +again:
> +	b = &htab->buckets[batch];
> +	head = &b->head;
> +	raw_spin_lock_irqsave(&b->lock, flags);

Brian, you have some early comments whether we could
remove locks for lookup only. Do you still think this
is needed?

If this is still desired, the below is a possible approach:
  - lock will be used when delete is also needed.
  - we still do a preliminary counting to get bucket_cnt,
    but for lookup, it may not guarantee that all elements
    in the bucket will be copied since there could be
    parallel update which may add more elements to the bucket.
This probably should be okay.
If needed, later on, we can add a flag to permit locking.

> +
> +	bucket_cnt = 0;
> +	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
> +		bucket_cnt++;
> +
> +	if (bucket_cnt > (max_count - total)) {
> +		if (total == 0)
> +			ret = -ENOSPC;
> +		goto after_loop;
> +	}
> +
> +	hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) {
> +		memcpy(dst_key, l->key, key_size);
> +
> +		if (is_percpu) {
> +			int off = 0, cpu;
> +			void __percpu *pptr;
> +
> +			pptr = htab_elem_get_ptr(l, map->key_size);
> +			for_each_possible_cpu(cpu) {
> +				bpf_long_memcpy(dst_val + off,
> +						per_cpu_ptr(pptr, cpu), size);
> +				off += size;
> +			}
> +		} else {
> +			value = l->key + roundup_key_size;
> +			if (elem_map_flags & BPF_F_LOCK)
> +				copy_map_value_locked(map, dst_val, value,
> +						      true);
> +			else
> +				copy_map_value(map, dst_val, value);
> +			check_and_init_map_lock(map, dst_val);
> +		}

It is possible we can move the below do_delete loop body here.
We need to change hlist_nulls_for_each_entry_rcu to
hlist_nulls_for_each_entry_safe.
Once you have next patch, we can ask some RCU experts to help
ensure implementation correctness.

> +		dst_key += key_size;
> +		dst_val += value_size;
> +		total++;
> +	}
> +
> +	if (do_delete) {
> +		hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) {
> +			hlist_nulls_del_rcu(&l->hash_node);
> +			if (is_lru_map)
> +				bpf_lru_push_free(&htab->lru, &l->lru_node);
> +			else
> +				free_htab_elem(htab, l);
> +		}
> +	}
> +
> +	batch++;
> +	if (batch >= htab->n_buckets) {
> +		ret = -ENOENT;
> +		goto after_loop;
> +	}
> +
> +	raw_spin_unlock_irqrestore(&b->lock, flags);
> +	goto again;
> +
> +after_loop:
> +	raw_spin_unlock_irqrestore(&b->lock, flags);
> +
> +	rcu_read_unlock();
> +	this_cpu_dec(bpf_prog_active);
> +	preempt_enable();
> +
> +	if (ret && ret != -ENOENT)
> +		goto out;
> +
> +	/* copy data back to user */
> +	ukeys = u64_to_user_ptr(attr->batch.keys);
> +	uvalues = u64_to_user_ptr(attr->batch.values);
> +	ubatch = u64_to_user_ptr(attr->batch.out_batch);
> +	if (copy_to_user(ubatch, &batch, sizeof(batch)) ||
> +	    copy_to_user(ukeys, keys, total * key_size) ||
> +	    copy_to_user(uvalues, values, total * value_size) ||
> +	    put_user(total, &uattr->batch.count))
> +		ret = -EFAULT;
> +
> +out:
> +	kvfree(keys);
> +	kvfree(values);
> +	return ret;
> +}
> +
> +static int
> +htab_percpu_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
> +			     union bpf_attr __user *uattr)
> +{
> +	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
> +						  false, true);
> +}
> +
> +static int
> +htab_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
> +					const union bpf_attr *attr,
> +					union bpf_attr __user *uattr)
> +{
> +	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
> +						  false, true);
> +}
> +
> +static int
> +htab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
> +		      union bpf_attr __user *uattr)
> +{
> +	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
> +						  false, false);
> +}
> +
> +static int
> +htab_map_lookup_and_delete_batch(struct bpf_map *map,
> +				 const union bpf_attr *attr,
> +				 union bpf_attr __user *uattr)
> +{
> +	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
> +						  false, false);
> +}
> +
> +static int
> +htab_map_delete_batch(struct bpf_map *map,
> +		      const union bpf_attr *attr,
> +		      union bpf_attr __user *uattr)
> +{
> +	return generic_map_delete_batch(map, attr, uattr);
> +}
> +
> +static int
> +htab_lru_percpu_map_lookup_batch(struct bpf_map *map,
> +				 const union bpf_attr *attr,
> +				 union bpf_attr __user *uattr)
> +{
> +	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
> +						  true, true);
> +}
> +
> +static int
> +htab_lru_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
> +					    const union bpf_attr *attr,
> +					    union bpf_attr __user *uattr)
> +{
> +	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
> +						  true, true);
> +}
> +
> +static int
> +htab_lru_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
> +			  union bpf_attr __user *uattr)
> +{
> +	return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
> +						  true, false);
> +}
> +
> +static int
> +htab_lru_map_lookup_and_delete_batch(struct bpf_map *map,
> +				     const union bpf_attr *attr,
> +				     union bpf_attr __user *uattr)
> +{
> +	return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
> +						  true, false);
> +}
> +
>   const struct bpf_map_ops htab_map_ops = {
>   	.map_alloc_check = htab_map_alloc_check,
>   	.map_alloc = htab_map_alloc,
> @@ -1242,6 +1482,7 @@ const struct bpf_map_ops htab_map_ops = {
>   	.map_delete_elem = htab_map_delete_elem,
>   	.map_gen_lookup = htab_map_gen_lookup,
>   	.map_seq_show_elem = htab_map_seq_show_elem,
> +	BATCH_OPS(htab),
>   };
>   
>   const struct bpf_map_ops htab_lru_map_ops = {
> @@ -1255,6 +1496,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
>   	.map_delete_elem = htab_lru_map_delete_elem,
>   	.map_gen_lookup = htab_lru_map_gen_lookup,
>   	.map_seq_show_elem = htab_map_seq_show_elem,
> +	BATCH_OPS(htab_lru),
>   };
>   
>   /* Called from eBPF program */
> @@ -1368,6 +1610,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
>   	.map_update_elem = htab_percpu_map_update_elem,
>   	.map_delete_elem = htab_map_delete_elem,
>   	.map_seq_show_elem = htab_percpu_map_seq_show_elem,
> +	BATCH_OPS(htab_percpu),
>   };
>   
>   const struct bpf_map_ops htab_lru_percpu_map_ops = {
> @@ -1379,6 +1622,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
>   	.map_update_elem = htab_lru_percpu_map_update_elem,
>   	.map_delete_elem = htab_lru_map_delete_elem,
>   	.map_seq_show_elem = htab_percpu_map_seq_show_elem,
> +	BATCH_OPS(htab_lru_percpu),
>   };
>   
>   static int fd_htab_map_alloc_check(union bpf_attr *attr)
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 7/9] libbpf: add libbpf support to batch ops
  2019-11-19 19:30 ` [PATCH v2 bpf-next 7/9] libbpf: add libbpf support to batch ops Brian Vazquez
@ 2019-11-21 18:30   ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2019-11-21 18:30 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: Stanislav Fomichev, Petar Penkov, Willem de Bruijn, linux-kernel,
	netdev, bpf



On 11/19/19 11:30 AM, Brian Vazquez wrote:
> From: Yonghong Song <yhs@fb.com>
> 
> Added four libbpf API functions to support map batch operations:
>    . int bpf_map_delete_batch( ... )
>    . int bpf_map_lookup_batch( ... )
>    . int bpf_map_lookup_and_delete_batch( ... )
>    . int bpf_map_update_batch( ... )
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>   tools/lib/bpf/bpf.c      | 61 ++++++++++++++++++++++++++++++++++++++++
>   tools/lib/bpf/bpf.h      | 14 +++++++++
>   tools/lib/bpf/libbpf.map |  4 +++
>   3 files changed, 79 insertions(+)
> 
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 98596e15390fb..9acd9309b47b3 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -443,6 +443,67 @@ int bpf_map_freeze(int fd)
>   	return sys_bpf(BPF_MAP_FREEZE, &attr, sizeof(attr));
>   }
>   
[...]>   LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
>   LIBBPF_API int bpf_obj_get(const char *pathname);
>   LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 8ddc2c40e482d..56462fea66f74 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -207,4 +207,8 @@ LIBBPF_0.0.6 {
>   		bpf_program__size;
>   		btf__find_by_name_kind;
>   		libbpf_find_vmlinux_btf_id;
> +		bpf_map_delete_batch;
> +		bpf_map_lookup_and_delete_batch;
> +		bpf_map_lookup_batch;
> +		bpf_map_update_batch;

Please insert new API functions following alphabet ordering.

>   } LIBBPF_0.0.5;
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 8/9] selftests/bpf: add batch ops testing for hmap and hmap_percpu
  2019-11-19 19:30 ` [PATCH v2 bpf-next 8/9] selftests/bpf: add batch ops testing for hmap and hmap_percpu Brian Vazquez
@ 2019-11-21 18:36   ` Yonghong Song
  2019-11-21 21:16     ` Brian Vazquez
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2019-11-21 18:36 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: Stanislav Fomichev, Petar Penkov, Willem de Bruijn, linux-kernel,
	netdev, bpf



On 11/19/19 11:30 AM, Brian Vazquez wrote:
> From: Yonghong Song <yhs@fb.com>
> 
> Tested bpf_map_lookup_and_delete_batch() and bpf_map_update_batch()
> functionality.
>    $ ./test_maps
>      ...
>        test_hmap_lookup_and_delete_batch:PASS
>        test_pcpu_hmap_lookup_and_delete_batch:PASS
>      ...

Maybe you can add another tests for lookup_batch() and delete_batch()
so all new APIs get tested?

> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> ---
>   .../map_lookup_and_delete_batch_htab.c        | 257 ++++++++++++++++++
>   1 file changed, 257 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c
> 
> diff --git a/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c
> new file mode 100644
> index 0000000000000..93e024cb85c60
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c
> @@ -0,0 +1,257 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2019 Facebook  */
> +#include <stdio.h>
> +#include <errno.h>
> +#include <string.h>
> +
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include <bpf_util.h>
> +#include <test_maps.h>
> +
> +static void map_batch_update(int map_fd, __u32 max_entries, int *keys,
> +			     void *values, bool is_pcpu)
> +{
> +	typedef BPF_DECLARE_PERCPU(int, value);
> +	int i, j, err;
> +	value *v;
> +
> +	if (is_pcpu)
> +		v = (value *)values;
> +
> +	for (i = 0; i < max_entries; i++) {
> +		keys[i] = i + 1;
> +		if (is_pcpu)
> +			for (j = 0; j < bpf_num_possible_cpus(); j++)
> +				bpf_percpu(v[i], j) = i + 2 + j;
> +		else
> +			((int *)values)[i] = i + 2;
> +	}
> +
> +	err = bpf_map_update_batch(map_fd, keys, values, &max_entries, 0, 0);
> +	CHECK(err, "bpf_map_update_batch()", "error:%s\n", strerror(errno));
> +}
> +
> +static void map_batch_verify(int *visited, __u32 max_entries,
> +			     int *keys, void *values, bool is_pcpu)
> +{
> +	typedef BPF_DECLARE_PERCPU(int, value);
> +	value *v;
> +	int i, j;
> +
> +	if (is_pcpu)
> +		v = (value *)values;
> +
> +	memset(visited, 0, max_entries * sizeof(*visited));
> +	for (i = 0; i < max_entries; i++) {
> +
> +		if (is_pcpu) {
> +			for (j = 0; j < bpf_num_possible_cpus(); j++) {
> +				CHECK(keys[i] + 1 + j != bpf_percpu(v[i], j),
> +				      "key/value checking",
> +				      "error: i %d j %d key %d value %d\n",
> +				      i, j, keys[i], bpf_percpu(v[i],  j));
> +			}
> +		} else {
> +			CHECK(keys[i] + 1 != ((int *)values)[i],
> +			      "key/value checking",
> +			      "error: i %d key %d value %d\n", i, keys[i],
> +			      ((int *)values)[i]);
> +		}
> +
> +		visited[i] = 1;
> +
> +	}
> +	for (i = 0; i < max_entries; i++) {
> +		CHECK(visited[i] != 1, "visited checking",
> +		      "error: keys array at index %d missing\n", i);
> +	}
> +}
> +
> +void __test_map_lookup_and_delete_batch(bool is_pcpu)
> +{
> +	int map_type = is_pcpu ? BPF_MAP_TYPE_PERCPU_HASH : BPF_MAP_TYPE_HASH;
> +	struct bpf_create_map_attr xattr = {
> +		.name = "hash_map",
> +		.map_type = map_type,
> +		.key_size = sizeof(int),
> +		.value_size = sizeof(int),
> +	};
> +	typedef BPF_DECLARE_PERCPU(int, value);
> +	int map_fd, *keys, *visited, key;
> +	__u32 batch = 0, count, total, total_success;
> +	const __u32 max_entries = 10;
> +	int err, i, step, value_size;
> +	value pcpu_values[10];
> +	bool nospace_err;
> +	void *values;
> +
> +	xattr.max_entries = max_entries;
> +	map_fd = bpf_create_map_xattr(&xattr);
> +	CHECK(map_fd == -1,
> +	      "bpf_create_map_xattr()", "error:%s\n", strerror(errno));
> +
> +	value_size = is_pcpu ? sizeof(value) : sizeof(int);
> +	keys = malloc(max_entries * sizeof(int));
> +	if (is_pcpu)
> +		values = pcpu_values;
> +	else
> +		values = malloc(max_entries * sizeof(int));
> +	visited = malloc(max_entries * sizeof(int));
> +	CHECK(!keys || !values || !visited, "malloc()",
> +	      "error:%s\n", strerror(errno));
> +
> +	/* test 1: lookup/delete an empty hash table, -ENOENT */
> +	count = max_entries;
> +	err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
> +					      values, &count, 0, 0);
> +	CHECK((err && errno != ENOENT), "empty map",
> +	      "error: %s\n", strerror(errno));
> +
> +	/* populate elements to the map */
> +	map_batch_update(map_fd, max_entries, keys, values, is_pcpu);
> +
> +	/* test 2: lookup/delete with count = 0, success */
> +	batch = 0;
> +	count = 0;
> +	err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
> +					      values, &count, 0, 0);
> +	CHECK(err, "count = 0", "error: %s\n", strerror(errno));
> +
> +	/* test 3: lookup/delete with count = max_entries, success */
> +	memset(keys, 0, max_entries * sizeof(*keys));
> +	memset(values, 0, max_entries * value_size);
> +	count = max_entries;
> +	batch = 0;
> +	err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
> +					      values, &count, 0, 0);
> +	CHECK((err && errno != ENOENT), "count = max_entries",
> +	       "error: %s\n", strerror(errno));
> +	CHECK(count != max_entries, "count = max_entries",
> +	      "count = %u, max_entries = %u\n", count, max_entries);
> +	map_batch_verify(visited, max_entries, keys, values, is_pcpu);
> +
> +	/* bpf_map_get_next_key() should return -ENOENT for an empty map. */
> +	err = bpf_map_get_next_key(map_fd, NULL, &key);
> +	CHECK(!err, "bpf_map_get_next_key()", "error: %s\n", strerror(errno));
> +
> +	/* test 4: lookup/delete in a loop with various steps. */
> +	total_success = 0;
> +	for (step = 1; step < max_entries; step++) {
> +		map_batch_update(map_fd, max_entries, keys, values, is_pcpu);
> +		memset(keys, 0, max_entries * sizeof(*keys));
> +		memset(values, 0, max_entries * value_size);
> +		batch = 0;
> +		total = 0;
> +		i = 0;
> +		/* iteratively lookup/delete elements with 'step'
> +		 * elements each
> +		 */
> +		count = step;
> +		nospace_err = false;
> +		while (true) {
> +			err = bpf_map_lookup_batch(map_fd,
> +						   total ? &batch : NULL,
> +						   &batch, keys + total,
> +						   values +
> +						   total * value_size,
> +						   &count, 0, 0);
> +			/* It is possible that we are failing due to buffer size
> +			 * not big enough. In such cases, let us just exit and
> +			 * go with large steps. Not that a buffer size with
> +			 * max_entries should always work.
> +			 */
> +			if (err && errno == ENOSPC) {
> +				nospace_err = true;
> +				break;
> +			}
> +
> +
> +			CHECK((err && errno != ENOENT), "lookup with steps",
> +			      "error: %s\n", strerror(errno));
> +
> +			total += count;
> +			if (err)
> +				break;
> +
> +			i++;
> +		}
> +		if (nospace_err == true)
> +			continue;
> +
> +		CHECK(total != max_entries, "lookup with steps",
> +		      "total = %u, max_entries = %u\n", total, max_entries);
> +		map_batch_verify(visited, max_entries, keys, values, is_pcpu);
> +
> +		memset(keys, 0, max_entries * sizeof(*keys));
> +		memset(values, 0, max_entries * value_size);
> +		batch = 0;
> +		total = 0;
> +		i = 0;
> +		/* iteratively lookup/delete elements with 'step'
> +		 * elements each
> +		 */
> +		count = step;
> +		nospace_err = false;
> +		while (true) {
> +			err = bpf_map_lookup_and_delete_batch(map_fd,
> +							total ? &batch : NULL,
> +							&batch, keys + total,
> +							values +
> +							total * value_size,
> +							&count, 0, 0);
> +			/* It is possible that we are failing due to buffer size
> +			 * not big enough. In such cases, let us just exit and
> +			 * go with large steps. Not that a buffer size with
> +			 * max_entries should always work.
> +			 */
> +			if (err && errno == ENOSPC) {
> +				nospace_err = true;
> +				break;
> +			}
> +
> +			CHECK((err && errno != ENOENT), "lookup with steps",
> +			      "error: %s\n", strerror(errno));
> +
> +			total += count;
> +			if (err)
> +				break;
> +			i++;
> +		}
> +
> +		if (nospace_err == true)
> +			continue;
> +
> +		CHECK(total != max_entries, "lookup/delete with steps",
> +		      "total = %u, max_entries = %u\n", total, max_entries);
> +
> +		map_batch_verify(visited, max_entries, keys, values, is_pcpu);
> +		err = bpf_map_get_next_key(map_fd, NULL, &key);
> +		CHECK(!err, "bpf_map_get_next_key()", "error: %s\n",
> +		      strerror(errno));
> +
> +		total_success++;
> +	}
> +
> +	CHECK(total_success == 0, "check total_success",
> +	      "unexpected failure\n");
> +}
> +
> +void test_hmap_lookup_and_delete_batch(void)
> +{
> +	__test_map_lookup_and_delete_batch(false);
> +	printf("%s:PASS\n", __func__);
> +}
> +
> +void test_pcpu_hmap_lookup_and_delete_batch(void)
> +{
> +	__test_map_lookup_and_delete_batch(true);
> +	printf("%s:PASS\n", __func__);
> +}
> +
> +void test_map_lookup_and_delete_batch_htab(void)
> +{
> +	test_hmap_lookup_and_delete_batch();
> +	test_pcpu_hmap_lookup_and_delete_batch();
> +}
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 9/9] selftests/bpf: add batch ops testing to array bpf map
  2019-11-19 19:30 ` [PATCH v2 bpf-next 9/9] selftests/bpf: add batch ops testing to array bpf map Brian Vazquez
@ 2019-11-21 18:43   ` Yonghong Song
  2019-11-21 21:14     ` Brian Vazquez
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2019-11-21 18:43 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: Stanislav Fomichev, Petar Penkov, Willem de Bruijn, linux-kernel,
	netdev, bpf



On 11/19/19 11:30 AM, Brian Vazquez wrote:
> Tested bpf_map_lookup_batch() and bpf_map_update_batch()
> functionality.
> 
>    $ ./test_maps
>        ...
>          test_map_lookup_and_delete_batch_array:PASS
>        ...

The test is for lookup_batch() and update_batch()
and the test name and func name is lookup_and_delete_batch(),
probably rename is to lookup_and_update_batch_array()?

It would be good if generic lookup_and_delete_batch()
and delete_patch() can be tested as well.
Maybe tried to use prog_array?

> 
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>   .../map_lookup_and_delete_batch_array.c       | 119 ++++++++++++++++++
>   1 file changed, 119 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
> 
> diff --git a/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
> new file mode 100644
> index 0000000000000..cbec72ad38609
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
> @@ -0,0 +1,119 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <stdio.h>
> +#include <errno.h>
> +#include <string.h>
> +
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include <test_maps.h>
> +
> +static void map_batch_update(int map_fd, __u32 max_entries, int *keys,
> +			     int *values)
> +{
> +	int i, err;
> +
> +	for (i = 0; i < max_entries; i++) {
> +		keys[i] = i;
> +		values[i] = i + 1;
> +	}
> +
> +	err = bpf_map_update_batch(map_fd, keys, values, &max_entries, 0, 0);
> +	CHECK(err, "bpf_map_update_batch()", "error:%s\n", strerror(errno));
> +}
> +
> +static void map_batch_verify(int *visited, __u32 max_entries,
> +			     int *keys, int *values)
> +{
> +	int i;
> +
> +	memset(visited, 0, max_entries * sizeof(*visited));
> +	for (i = 0; i < max_entries; i++) {
> +		CHECK(keys[i] + 1 != values[i], "key/value checking",
> +		      "error: i %d key %d value %d\n", i, keys[i], values[i]);
> +		visited[i] = 1;
> +	}
> +	for (i = 0; i < max_entries; i++) {
> +		CHECK(visited[i] != 1, "visited checking",
> +		      "error: keys array at index %d missing\n", i);
> +	}
> +}
> +
> +void test_map_lookup_and_delete_batch_array(void)
> +{
> +	struct bpf_create_map_attr xattr = {
> +		.name = "array_map",
> +		.map_type = BPF_MAP_TYPE_ARRAY,
> +		.key_size = sizeof(int),
> +		.value_size = sizeof(int),
> +	};
> +	int map_fd, *keys, *values, *visited;
> +	__u32 count, total, total_success;
> +	const __u32 max_entries = 10;
> +	int err, i, step;
> +	bool nospace_err;
> +	__u64 batch = 0;
> +
> +	xattr.max_entries = max_entries;
> +	map_fd = bpf_create_map_xattr(&xattr);
> +	CHECK(map_fd == -1,
> +	      "bpf_create_map_xattr()", "error:%s\n", strerror(errno));
> +
> +	keys = malloc(max_entries * sizeof(int));
> +	values = malloc(max_entries * sizeof(int));
> +	visited = malloc(max_entries * sizeof(int));
> +	CHECK(!keys || !values || !visited, "malloc()", "error:%s\n",
> +	      strerror(errno));
> +
> +	/* populate elements to the map */
> +	map_batch_update(map_fd, max_entries, keys, values);
> +
> +	/* test 1: lookup in a loop with various steps. */
> +	total_success = 0;
> +	for (step = 1; step < max_entries; step++) {
> +		map_batch_update(map_fd, max_entries, keys, values);
> +		memset(keys, 0, max_entries * sizeof(*keys));
> +		memset(values, 0, max_entries * sizeof(*values));
> +		batch = 0;
> +		total = 0;
> +		i = 0;
> +		/* iteratively lookup/delete elements with 'step'
> +		 * elements each.
> +		 */
> +		count = step;
> +		nospace_err = false;
> +		while (true) {
> +			err = bpf_map_lookup_batch(map_fd,
> +						total ? &batch : NULL, &batch,
> +						keys + total,
> +						values + total,
> +						&count, 0, 0);
> +
> +			CHECK((err && errno != ENOENT), "lookup with steps",
> +			      "error: %s\n", strerror(errno));
> +
> +			total += count;
> +
> +			if (err)
> +				break;
> +
> +			i++;
> +		}
> +
> +		if (nospace_err == true)
> +			continue;
> +
> +		CHECK(total != max_entries, "lookup with steps",
> +		      "total = %u, max_entries = %u\n", total, max_entries);
> +
> +		map_batch_verify(visited, max_entries, keys, values);
> +
> +		total_success++;
> +	}
> +
> +	CHECK(total_success == 0, "check total_success",
> +	      "unexpected failure\n");
> +
> +	printf("%s:PASS\n", __func__);
> +}
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 9/9] selftests/bpf: add batch ops testing to array bpf map
  2019-11-21 18:43   ` Yonghong Song
@ 2019-11-21 21:14     ` Brian Vazquez
  2019-11-22  0:22       ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-21 21:14 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf

Thanks for reviewing it!

On Thu, Nov 21, 2019 at 10:43 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 11/19/19 11:30 AM, Brian Vazquez wrote:
> > Tested bpf_map_lookup_batch() and bpf_map_update_batch()
> > functionality.
> >
> >    $ ./test_maps
> >        ...
> >          test_map_lookup_and_delete_batch_array:PASS
> >        ...
>
> The test is for lookup_batch() and update_batch()
> and the test name and func name is lookup_and_delete_batch(),
> probably rename is to lookup_and_update_batch_array()?
>
Yes, you are right, I will change the name for next version.

> It would be good if generic lookup_and_delete_batch()
> and delete_patch() can be tested as well.
> Maybe tried to use prog_array?

 I did test generic_lookup_and_delete_batch with hmap and it worked
fine because I didn't have concurrent deletions.
But yes I will add tests for generic delete and lookup_and_delete,
maybe for the trie map (prog_array doesn't support lookup ang hence
lookup_and_delete won't apply there)?

>
> >
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> >   .../map_lookup_and_delete_batch_array.c       | 119 ++++++++++++++++++
> >   1 file changed, 119 insertions(+)
> >   create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
> >
> > diff --git a/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
> > new file mode 100644
> > index 0000000000000..cbec72ad38609
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_array.c
> > @@ -0,0 +1,119 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <stdio.h>
> > +#include <errno.h>
> > +#include <string.h>
> > +
> > +#include <bpf/bpf.h>
> > +#include <bpf/libbpf.h>
> > +
> > +#include <test_maps.h>
> > +
> > +static void map_batch_update(int map_fd, __u32 max_entries, int *keys,
> > +                          int *values)
> > +{
> > +     int i, err;
> > +
> > +     for (i = 0; i < max_entries; i++) {
> > +             keys[i] = i;
> > +             values[i] = i + 1;
> > +     }
> > +
> > +     err = bpf_map_update_batch(map_fd, keys, values, &max_entries, 0, 0);
> > +     CHECK(err, "bpf_map_update_batch()", "error:%s\n", strerror(errno));
> > +}
> > +
> > +static void map_batch_verify(int *visited, __u32 max_entries,
> > +                          int *keys, int *values)
> > +{
> > +     int i;
> > +
> > +     memset(visited, 0, max_entries * sizeof(*visited));
> > +     for (i = 0; i < max_entries; i++) {
> > +             CHECK(keys[i] + 1 != values[i], "key/value checking",
> > +                   "error: i %d key %d value %d\n", i, keys[i], values[i]);
> > +             visited[i] = 1;
> > +     }
> > +     for (i = 0; i < max_entries; i++) {
> > +             CHECK(visited[i] != 1, "visited checking",
> > +                   "error: keys array at index %d missing\n", i);
> > +     }
> > +}
> > +
> > +void test_map_lookup_and_delete_batch_array(void)
> > +{
> > +     struct bpf_create_map_attr xattr = {
> > +             .name = "array_map",
> > +             .map_type = BPF_MAP_TYPE_ARRAY,
> > +             .key_size = sizeof(int),
> > +             .value_size = sizeof(int),
> > +     };
> > +     int map_fd, *keys, *values, *visited;
> > +     __u32 count, total, total_success;
> > +     const __u32 max_entries = 10;
> > +     int err, i, step;
> > +     bool nospace_err;
> > +     __u64 batch = 0;
> > +
> > +     xattr.max_entries = max_entries;
> > +     map_fd = bpf_create_map_xattr(&xattr);
> > +     CHECK(map_fd == -1,
> > +           "bpf_create_map_xattr()", "error:%s\n", strerror(errno));
> > +
> > +     keys = malloc(max_entries * sizeof(int));
> > +     values = malloc(max_entries * sizeof(int));
> > +     visited = malloc(max_entries * sizeof(int));
> > +     CHECK(!keys || !values || !visited, "malloc()", "error:%s\n",
> > +           strerror(errno));
> > +
> > +     /* populate elements to the map */
> > +     map_batch_update(map_fd, max_entries, keys, values);
> > +
> > +     /* test 1: lookup in a loop with various steps. */
> > +     total_success = 0;
> > +     for (step = 1; step < max_entries; step++) {
> > +             map_batch_update(map_fd, max_entries, keys, values);
> > +             memset(keys, 0, max_entries * sizeof(*keys));
> > +             memset(values, 0, max_entries * sizeof(*values));
> > +             batch = 0;
> > +             total = 0;
> > +             i = 0;
> > +             /* iteratively lookup/delete elements with 'step'
> > +              * elements each.
> > +              */
> > +             count = step;
> > +             nospace_err = false;
> > +             while (true) {
> > +                     err = bpf_map_lookup_batch(map_fd,
> > +                                             total ? &batch : NULL, &batch,
> > +                                             keys + total,
> > +                                             values + total,
> > +                                             &count, 0, 0);
> > +
> > +                     CHECK((err && errno != ENOENT), "lookup with steps",
> > +                           "error: %s\n", strerror(errno));
> > +
> > +                     total += count;
> > +
> > +                     if (err)
> > +                             break;
> > +
> > +                     i++;
> > +             }
> > +
> > +             if (nospace_err == true)
> > +                     continue;
> > +
> > +             CHECK(total != max_entries, "lookup with steps",
> > +                   "total = %u, max_entries = %u\n", total, max_entries);
> > +
> > +             map_batch_verify(visited, max_entries, keys, values);
> > +
> > +             total_success++;
> > +     }
> > +
> > +     CHECK(total_success == 0, "check total_success",
> > +           "unexpected failure\n");
> > +
> > +     printf("%s:PASS\n", __func__);
> > +}
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 8/9] selftests/bpf: add batch ops testing for hmap and hmap_percpu
  2019-11-21 18:36   ` Yonghong Song
@ 2019-11-21 21:16     ` Brian Vazquez
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Vazquez @ 2019-11-21 21:16 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf

On Thu, Nov 21, 2019 at 10:36 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 11/19/19 11:30 AM, Brian Vazquez wrote:
> > From: Yonghong Song <yhs@fb.com>
> >
> > Tested bpf_map_lookup_and_delete_batch() and bpf_map_update_batch()
> > functionality.
> >    $ ./test_maps
> >      ...
> >        test_hmap_lookup_and_delete_batch:PASS
> >        test_pcpu_hmap_lookup_and_delete_batch:PASS
> >      ...
>
> Maybe you can add another tests for lookup_batch() and delete_batch()
> so all new APIs get tested?

I did test lookup_batch() and the code is there, I will add
delete_batch() testing and change the name of the tests to better
reflect what is being tested.

>
> >
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > ---
> >   .../map_lookup_and_delete_batch_htab.c        | 257 ++++++++++++++++++
> >   1 file changed, 257 insertions(+)
> >   create mode 100644 tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c
> >
> > diff --git a/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c
> > new file mode 100644
> > index 0000000000000..93e024cb85c60
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/map_tests/map_lookup_and_delete_batch_htab.c
> > @@ -0,0 +1,257 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2019 Facebook  */
> > +#include <stdio.h>
> > +#include <errno.h>
> > +#include <string.h>
> > +
> > +#include <bpf/bpf.h>
> > +#include <bpf/libbpf.h>
> > +
> > +#include <bpf_util.h>
> > +#include <test_maps.h>
> > +
> > +static void map_batch_update(int map_fd, __u32 max_entries, int *keys,
> > +                          void *values, bool is_pcpu)
> > +{
> > +     typedef BPF_DECLARE_PERCPU(int, value);
> > +     int i, j, err;
> > +     value *v;
> > +
> > +     if (is_pcpu)
> > +             v = (value *)values;
> > +
> > +     for (i = 0; i < max_entries; i++) {
> > +             keys[i] = i + 1;
> > +             if (is_pcpu)
> > +                     for (j = 0; j < bpf_num_possible_cpus(); j++)
> > +                             bpf_percpu(v[i], j) = i + 2 + j;
> > +             else
> > +                     ((int *)values)[i] = i + 2;
> > +     }
> > +
> > +     err = bpf_map_update_batch(map_fd, keys, values, &max_entries, 0, 0);
> > +     CHECK(err, "bpf_map_update_batch()", "error:%s\n", strerror(errno));
> > +}
> > +
> > +static void map_batch_verify(int *visited, __u32 max_entries,
> > +                          int *keys, void *values, bool is_pcpu)
> > +{
> > +     typedef BPF_DECLARE_PERCPU(int, value);
> > +     value *v;
> > +     int i, j;
> > +
> > +     if (is_pcpu)
> > +             v = (value *)values;
> > +
> > +     memset(visited, 0, max_entries * sizeof(*visited));
> > +     for (i = 0; i < max_entries; i++) {
> > +
> > +             if (is_pcpu) {
> > +                     for (j = 0; j < bpf_num_possible_cpus(); j++) {
> > +                             CHECK(keys[i] + 1 + j != bpf_percpu(v[i], j),
> > +                                   "key/value checking",
> > +                                   "error: i %d j %d key %d value %d\n",
> > +                                   i, j, keys[i], bpf_percpu(v[i],  j));
> > +                     }
> > +             } else {
> > +                     CHECK(keys[i] + 1 != ((int *)values)[i],
> > +                           "key/value checking",
> > +                           "error: i %d key %d value %d\n", i, keys[i],
> > +                           ((int *)values)[i]);
> > +             }
> > +
> > +             visited[i] = 1;
> > +
> > +     }
> > +     for (i = 0; i < max_entries; i++) {
> > +             CHECK(visited[i] != 1, "visited checking",
> > +                   "error: keys array at index %d missing\n", i);
> > +     }
> > +}
> > +
> > +void __test_map_lookup_and_delete_batch(bool is_pcpu)
> > +{
> > +     int map_type = is_pcpu ? BPF_MAP_TYPE_PERCPU_HASH : BPF_MAP_TYPE_HASH;
> > +     struct bpf_create_map_attr xattr = {
> > +             .name = "hash_map",
> > +             .map_type = map_type,
> > +             .key_size = sizeof(int),
> > +             .value_size = sizeof(int),
> > +     };
> > +     typedef BPF_DECLARE_PERCPU(int, value);
> > +     int map_fd, *keys, *visited, key;
> > +     __u32 batch = 0, count, total, total_success;
> > +     const __u32 max_entries = 10;
> > +     int err, i, step, value_size;
> > +     value pcpu_values[10];
> > +     bool nospace_err;
> > +     void *values;
> > +
> > +     xattr.max_entries = max_entries;
> > +     map_fd = bpf_create_map_xattr(&xattr);
> > +     CHECK(map_fd == -1,
> > +           "bpf_create_map_xattr()", "error:%s\n", strerror(errno));
> > +
> > +     value_size = is_pcpu ? sizeof(value) : sizeof(int);
> > +     keys = malloc(max_entries * sizeof(int));
> > +     if (is_pcpu)
> > +             values = pcpu_values;
> > +     else
> > +             values = malloc(max_entries * sizeof(int));
> > +     visited = malloc(max_entries * sizeof(int));
> > +     CHECK(!keys || !values || !visited, "malloc()",
> > +           "error:%s\n", strerror(errno));
> > +
> > +     /* test 1: lookup/delete an empty hash table, -ENOENT */
> > +     count = max_entries;
> > +     err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
> > +                                           values, &count, 0, 0);
> > +     CHECK((err && errno != ENOENT), "empty map",
> > +           "error: %s\n", strerror(errno));
> > +
> > +     /* populate elements to the map */
> > +     map_batch_update(map_fd, max_entries, keys, values, is_pcpu);
> > +
> > +     /* test 2: lookup/delete with count = 0, success */
> > +     batch = 0;
> > +     count = 0;
> > +     err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
> > +                                           values, &count, 0, 0);
> > +     CHECK(err, "count = 0", "error: %s\n", strerror(errno));
> > +
> > +     /* test 3: lookup/delete with count = max_entries, success */
> > +     memset(keys, 0, max_entries * sizeof(*keys));
> > +     memset(values, 0, max_entries * value_size);
> > +     count = max_entries;
> > +     batch = 0;
> > +     err = bpf_map_lookup_and_delete_batch(map_fd, NULL, &batch, keys,
> > +                                           values, &count, 0, 0);
> > +     CHECK((err && errno != ENOENT), "count = max_entries",
> > +            "error: %s\n", strerror(errno));
> > +     CHECK(count != max_entries, "count = max_entries",
> > +           "count = %u, max_entries = %u\n", count, max_entries);
> > +     map_batch_verify(visited, max_entries, keys, values, is_pcpu);
> > +
> > +     /* bpf_map_get_next_key() should return -ENOENT for an empty map. */
> > +     err = bpf_map_get_next_key(map_fd, NULL, &key);
> > +     CHECK(!err, "bpf_map_get_next_key()", "error: %s\n", strerror(errno));
> > +
> > +     /* test 4: lookup/delete in a loop with various steps. */
> > +     total_success = 0;
> > +     for (step = 1; step < max_entries; step++) {
> > +             map_batch_update(map_fd, max_entries, keys, values, is_pcpu);
> > +             memset(keys, 0, max_entries * sizeof(*keys));
> > +             memset(values, 0, max_entries * value_size);
> > +             batch = 0;
> > +             total = 0;
> > +             i = 0;
> > +             /* iteratively lookup/delete elements with 'step'
> > +              * elements each
> > +              */
> > +             count = step;
> > +             nospace_err = false;
> > +             while (true) {
> > +                     err = bpf_map_lookup_batch(map_fd,
> > +                                                total ? &batch : NULL,
> > +                                                &batch, keys + total,
> > +                                                values +
> > +                                                total * value_size,
> > +                                                &count, 0, 0);
> > +                     /* It is possible that we are failing due to buffer size
> > +                      * not big enough. In such cases, let us just exit and
> > +                      * go with large steps. Not that a buffer size with
> > +                      * max_entries should always work.
> > +                      */
> > +                     if (err && errno == ENOSPC) {
> > +                             nospace_err = true;
> > +                             break;
> > +                     }
> > +
> > +
> > +                     CHECK((err && errno != ENOENT), "lookup with steps",
> > +                           "error: %s\n", strerror(errno));
> > +
> > +                     total += count;
> > +                     if (err)
> > +                             break;
> > +
> > +                     i++;
> > +             }
> > +             if (nospace_err == true)
> > +                     continue;
> > +
> > +             CHECK(total != max_entries, "lookup with steps",
> > +                   "total = %u, max_entries = %u\n", total, max_entries);
> > +             map_batch_verify(visited, max_entries, keys, values, is_pcpu);
> > +
> > +             memset(keys, 0, max_entries * sizeof(*keys));
> > +             memset(values, 0, max_entries * value_size);
> > +             batch = 0;
> > +             total = 0;
> > +             i = 0;
> > +             /* iteratively lookup/delete elements with 'step'
> > +              * elements each
> > +              */
> > +             count = step;
> > +             nospace_err = false;
> > +             while (true) {
> > +                     err = bpf_map_lookup_and_delete_batch(map_fd,
> > +                                                     total ? &batch : NULL,
> > +                                                     &batch, keys + total,
> > +                                                     values +
> > +                                                     total * value_size,
> > +                                                     &count, 0, 0);
> > +                     /* It is possible that we are failing due to buffer size
> > +                      * not big enough. In such cases, let us just exit and
> > +                      * go with large steps. Not that a buffer size with
> > +                      * max_entries should always work.
> > +                      */
> > +                     if (err && errno == ENOSPC) {
> > +                             nospace_err = true;
> > +                             break;
> > +                     }
> > +
> > +                     CHECK((err && errno != ENOENT), "lookup with steps",
> > +                           "error: %s\n", strerror(errno));
> > +
> > +                     total += count;
> > +                     if (err)
> > +                             break;
> > +                     i++;
> > +             }
> > +
> > +             if (nospace_err == true)
> > +                     continue;
> > +
> > +             CHECK(total != max_entries, "lookup/delete with steps",
> > +                   "total = %u, max_entries = %u\n", total, max_entries);
> > +
> > +             map_batch_verify(visited, max_entries, keys, values, is_pcpu);
> > +             err = bpf_map_get_next_key(map_fd, NULL, &key);
> > +             CHECK(!err, "bpf_map_get_next_key()", "error: %s\n",
> > +                   strerror(errno));
> > +
> > +             total_success++;
> > +     }
> > +
> > +     CHECK(total_success == 0, "check total_success",
> > +           "unexpected failure\n");
> > +}
> > +
> > +void test_hmap_lookup_and_delete_batch(void)
> > +{
> > +     __test_map_lookup_and_delete_batch(false);
> > +     printf("%s:PASS\n", __func__);
> > +}
> > +
> > +void test_pcpu_hmap_lookup_and_delete_batch(void)
> > +{
> > +     __test_map_lookup_and_delete_batch(true);
> > +     printf("%s:PASS\n", __func__);
> > +}
> > +
> > +void test_map_lookup_and_delete_batch_htab(void)
> > +{
> > +     test_hmap_lookup_and_delete_batch();
> > +     test_pcpu_hmap_lookup_and_delete_batch();
> > +}
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 5/9] bpf: add batch ops to all htab bpf map
  2019-11-21 18:27   ` Yonghong Song
@ 2019-11-21 21:27     ` Brian Vazquez
  0 siblings, 0 replies; 26+ messages in thread
From: Brian Vazquez @ 2019-11-21 21:27 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf

On Thu, Nov 21, 2019 at 10:27 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 11/19/19 11:30 AM, Brian Vazquez wrote:
> > From: Yonghong Song <yhs@fb.com>
> >
> > htab can't use generic batch support due some problematic behaviours
> > inherent to the data structre, i.e. while iterating the bpf map  a
> > concurrent program might delete the next entry that batch was about to
> > use, in that case there's no easy solution to retrieve the next entry,
> > the issue has been discussed multiple times (see [1] and [2]).
> >
> > The only way hmap can be traversed without the problem previously
> > exposed is by making sure that the map is traversing entire buckets.
> > This commit implements those strict requirements for hmap, the
> > implementation follows the same interaction that generic support with
> > some exceptions:
> >
> >   - If keys/values buffer are not big enough to traverse a bucket,
> >     ENOSPC will be returned.
> >   - out_batch contains the value of the next bucket in the iteration, not
> >     the next key, but this is transparent for the user since the user
> >     should never use out_batch for other than bpf batch syscalls.
> >
> > Note that only lookup and lookup_and_delete batch ops require the hmap
> > specific implementation, update/delete batch ops can be the generic
> > ones.
> >
> > [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/
> > [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/
> >
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > ---
> >   kernel/bpf/hashtab.c | 244 +++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 244 insertions(+)
> >
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 22066a62c8c97..3402174b292ea 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -17,6 +17,17 @@
> >       (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |    \
> >        BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
> >
> > +#define BATCH_OPS(_name)                     \
> > +     .map_lookup_batch =                     \
> > +     _name##_map_lookup_batch,               \
> > +     .map_lookup_and_delete_batch =          \
> > +     _name##_map_lookup_and_delete_batch,    \
> > +     .map_update_batch =                     \
> > +     generic_map_update_batch,               \
> > +     .map_delete_batch =                     \
> > +     generic_map_delete_batch
> > +
> > +
> >   struct bucket {
> >       struct hlist_nulls_head head;
> >       raw_spinlock_t lock;
> > @@ -1232,6 +1243,235 @@ static void htab_map_seq_show_elem(struct bpf_map *map, void *key,
> >       rcu_read_unlock();
> >   }
> >
> > +static int
> > +__htab_map_lookup_and_delete_batch(struct bpf_map *map,
> > +                                const union bpf_attr *attr,
> > +                                union bpf_attr __user *uattr,
> > +                                bool do_delete, bool is_lru_map,
> > +                                bool is_percpu)
> > +{
> > +     struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
> > +     u32 bucket_cnt, total, key_size, value_size, roundup_key_size;
> > +     void *keys = NULL, *values = NULL, *value, *dst_key, *dst_val;
> > +     void __user *ukeys, *uvalues, *ubatch;
> > +     u64 elem_map_flags, map_flags;
> > +     struct hlist_nulls_head *head;
> > +     struct hlist_nulls_node *n;
> > +     u32 batch, max_count, size;
> > +     unsigned long flags;
> > +     struct htab_elem *l;
> > +     struct bucket *b;
> > +     int ret = 0;
> > +
> > +     max_count = attr->batch.count;
> > +     if (!max_count)
> > +             return 0;
> > +
> > +     elem_map_flags = attr->batch.elem_flags;
> > +     if ((elem_map_flags & ~BPF_F_LOCK) ||
> > +         ((elem_map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
> > +             return -EINVAL;
> > +
> > +     map_flags = attr->batch.flags;
> > +     if (map_flags)
> > +             return -EINVAL;
> > +
> > +     batch = 0;
> > +     ubatch = u64_to_user_ptr(attr->batch.in_batch);
> > +     if (ubatch && copy_from_user(&batch, ubatch, sizeof(batch)))
> > +             return -EFAULT;
> > +
> > +     if (batch >= htab->n_buckets)
> > +             return -ENOENT;
> > +
> > +     /* We cannot do copy_from_user or copy_to_user inside
> > +      * the rcu_read_lock. Allocate enough space here.
> > +      */
> > +     key_size = htab->map.key_size;
> > +     roundup_key_size = round_up(htab->map.key_size, 8);
> > +     value_size = htab->map.value_size;
> > +     size = round_up(value_size, 8);
> > +     if (is_percpu)
> > +             value_size = size * num_possible_cpus();
> > +     keys = kvmalloc(key_size * max_count, GFP_USER | __GFP_NOWARN);
> > +     values = kvmalloc(value_size * max_count, GFP_USER | __GFP_NOWARN);
> > +     if (!keys || !values) {
> > +             ret = -ENOMEM;
> > +             goto out;
> > +     }
> > +
> > +     dst_key = keys;
> > +     dst_val = values;
> > +     total = 0;
> > +
> > +     preempt_disable();
> > +     this_cpu_inc(bpf_prog_active);
> > +     rcu_read_lock();
> > +
> > +again:
> > +     b = &htab->buckets[batch];
> > +     head = &b->head;
> > +     raw_spin_lock_irqsave(&b->lock, flags);
>
> Brian, you have some early comments whether we could
> remove locks for lookup only. Do you still think this
> is needed?

Yes I was thinking that for some applications that have control when
they delete and lookup, the bucket lock could be avoided, but at this
point I'd prefer to always grab it, to try to be as consistent as
possible. Something that we might add later is to check the first
entry (without grabbing the lock)  to see if is a null_elem and the
bucket is empty, and based on the answer skip it or grab the lock, but
this would be racy.

>
> If this is still desired, the below is a possible approach:
>   - lock will be used when delete is also needed.
>   - we still do a preliminary counting to get bucket_cnt,
>     but for lookup, it may not guarantee that all elements
>     in the bucket will be copied since there could be
>     parallel update which may add more elements to the bucket.
> This probably should be okay.
> If needed, later on, we can add a flag to permit locking.
>
> > +
> > +     bucket_cnt = 0;
> > +     hlist_nulls_for_each_entry_rcu(l, n, head, hash_node)
> > +             bucket_cnt++;
> > +
> > +     if (bucket_cnt > (max_count - total)) {
> > +             if (total == 0)
> > +                     ret = -ENOSPC;
> > +             goto after_loop;
> > +     }
> > +
> > +     hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) {
> > +             memcpy(dst_key, l->key, key_size);
> > +
> > +             if (is_percpu) {
> > +                     int off = 0, cpu;
> > +                     void __percpu *pptr;
> > +
> > +                     pptr = htab_elem_get_ptr(l, map->key_size);
> > +                     for_each_possible_cpu(cpu) {
> > +                             bpf_long_memcpy(dst_val + off,
> > +                                             per_cpu_ptr(pptr, cpu), size);
> > +                             off += size;
> > +                     }
> > +             } else {
> > +                     value = l->key + roundup_key_size;
> > +                     if (elem_map_flags & BPF_F_LOCK)
> > +                             copy_map_value_locked(map, dst_val, value,
> > +                                                   true);
> > +                     else
> > +                             copy_map_value(map, dst_val, value);
> > +                     check_and_init_map_lock(map, dst_val);
> > +             }
>
> It is possible we can move the below do_delete loop body here.
> We need to change hlist_nulls_for_each_entry_rcu to
> hlist_nulls_for_each_entry_safe.
> Once you have next patch, we can ask some RCU experts to help
> ensure implementation correctness.

Sounds good, will work on it and send next version soon.

>
> > +             dst_key += key_size;
> > +             dst_val += value_size;
> > +             total++;
> > +     }
> > +
> > +     if (do_delete) {
> > +             hlist_nulls_for_each_entry_rcu(l, n, head, hash_node) {
> > +                     hlist_nulls_del_rcu(&l->hash_node);
> > +                     if (is_lru_map)
> > +                             bpf_lru_push_free(&htab->lru, &l->lru_node);
> > +                     else
> > +                             free_htab_elem(htab, l);
> > +             }
> > +     }
> > +
> > +     batch++;
> > +     if (batch >= htab->n_buckets) {
> > +             ret = -ENOENT;
> > +             goto after_loop;
> > +     }
> > +
> > +     raw_spin_unlock_irqrestore(&b->lock, flags);
> > +     goto again;
> > +
> > +after_loop:
> > +     raw_spin_unlock_irqrestore(&b->lock, flags);
> > +
> > +     rcu_read_unlock();
> > +     this_cpu_dec(bpf_prog_active);
> > +     preempt_enable();
> > +
> > +     if (ret && ret != -ENOENT)
> > +             goto out;
> > +
> > +     /* copy data back to user */
> > +     ukeys = u64_to_user_ptr(attr->batch.keys);
> > +     uvalues = u64_to_user_ptr(attr->batch.values);
> > +     ubatch = u64_to_user_ptr(attr->batch.out_batch);
> > +     if (copy_to_user(ubatch, &batch, sizeof(batch)) ||
> > +         copy_to_user(ukeys, keys, total * key_size) ||
> > +         copy_to_user(uvalues, values, total * value_size) ||
> > +         put_user(total, &uattr->batch.count))
> > +             ret = -EFAULT;
> > +
> > +out:
> > +     kvfree(keys);
> > +     kvfree(values);
> > +     return ret;
> > +}
> > +
> > +static int
> > +htab_percpu_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
> > +                          union bpf_attr __user *uattr)
> > +{
> > +     return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
> > +                                               false, true);
> > +}
> > +
> > +static int
> > +htab_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
> > +                                     const union bpf_attr *attr,
> > +                                     union bpf_attr __user *uattr)
> > +{
> > +     return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
> > +                                               false, true);
> > +}
> > +
> > +static int
> > +htab_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
> > +                   union bpf_attr __user *uattr)
> > +{
> > +     return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
> > +                                               false, false);
> > +}
> > +
> > +static int
> > +htab_map_lookup_and_delete_batch(struct bpf_map *map,
> > +                              const union bpf_attr *attr,
> > +                              union bpf_attr __user *uattr)
> > +{
> > +     return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
> > +                                               false, false);
> > +}
> > +
> > +static int
> > +htab_map_delete_batch(struct bpf_map *map,
> > +                   const union bpf_attr *attr,
> > +                   union bpf_attr __user *uattr)
> > +{
> > +     return generic_map_delete_batch(map, attr, uattr);
> > +}
> > +
> > +static int
> > +htab_lru_percpu_map_lookup_batch(struct bpf_map *map,
> > +                              const union bpf_attr *attr,
> > +                              union bpf_attr __user *uattr)
> > +{
> > +     return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
> > +                                               true, true);
> > +}
> > +
> > +static int
> > +htab_lru_percpu_map_lookup_and_delete_batch(struct bpf_map *map,
> > +                                         const union bpf_attr *attr,
> > +                                         union bpf_attr __user *uattr)
> > +{
> > +     return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
> > +                                               true, true);
> > +}
> > +
> > +static int
> > +htab_lru_map_lookup_batch(struct bpf_map *map, const union bpf_attr *attr,
> > +                       union bpf_attr __user *uattr)
> > +{
> > +     return __htab_map_lookup_and_delete_batch(map, attr, uattr, false,
> > +                                               true, false);
> > +}
> > +
> > +static int
> > +htab_lru_map_lookup_and_delete_batch(struct bpf_map *map,
> > +                                  const union bpf_attr *attr,
> > +                                  union bpf_attr __user *uattr)
> > +{
> > +     return __htab_map_lookup_and_delete_batch(map, attr, uattr, true,
> > +                                               true, false);
> > +}
> > +
> >   const struct bpf_map_ops htab_map_ops = {
> >       .map_alloc_check = htab_map_alloc_check,
> >       .map_alloc = htab_map_alloc,
> > @@ -1242,6 +1482,7 @@ const struct bpf_map_ops htab_map_ops = {
> >       .map_delete_elem = htab_map_delete_elem,
> >       .map_gen_lookup = htab_map_gen_lookup,
> >       .map_seq_show_elem = htab_map_seq_show_elem,
> > +     BATCH_OPS(htab),
> >   };
> >
> >   const struct bpf_map_ops htab_lru_map_ops = {
> > @@ -1255,6 +1496,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
> >       .map_delete_elem = htab_lru_map_delete_elem,
> >       .map_gen_lookup = htab_lru_map_gen_lookup,
> >       .map_seq_show_elem = htab_map_seq_show_elem,
> > +     BATCH_OPS(htab_lru),
> >   };
> >
> >   /* Called from eBPF program */
> > @@ -1368,6 +1610,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
> >       .map_update_elem = htab_percpu_map_update_elem,
> >       .map_delete_elem = htab_map_delete_elem,
> >       .map_seq_show_elem = htab_percpu_map_seq_show_elem,
> > +     BATCH_OPS(htab_percpu),
> >   };
> >
> >   const struct bpf_map_ops htab_lru_percpu_map_ops = {
> > @@ -1379,6 +1622,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
> >       .map_update_elem = htab_lru_percpu_map_update_elem,
> >       .map_delete_elem = htab_lru_map_delete_elem,
> >       .map_seq_show_elem = htab_percpu_map_seq_show_elem,
> > +     BATCH_OPS(htab_lru_percpu),
> >   };
> >
> >   static int fd_htab_map_alloc_check(union bpf_attr *attr)
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops
  2019-11-21 17:36   ` Yonghong Song
@ 2019-11-21 21:36     ` Brian Vazquez
  2019-11-22  0:34       ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-21 21:36 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf

Hi Yonghong,
thanks for reviewing the patch, I will fix all the direct returns and
small fixes in next version.

On Thu, Nov 21, 2019 at 9:36 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 11/19/19 11:30 AM, Brian Vazquez wrote:
> > This commit introduces generic support for the bpf_map_lookup_batch and
> > bpf_map_lookup_and_delete_batch ops. This implementation can be used by
> > almost all the bpf maps since its core implementation is relying on the
> > existing map_get_next_key, map_lookup_elem and map_delete_elem
> > functions. The bpf syscall subcommands introduced are:
> >
> >    BPF_MAP_LOOKUP_BATCH
> >    BPF_MAP_LOOKUP_AND_DELETE_BATCH
> >
> > The UAPI attribute is:
> >
> >    struct { /* struct used by BPF_MAP_*_BATCH commands */
> >           __aligned_u64   in_batch;       /* start batch,
> >                                            * NULL to start from beginning
> >                                            */
> >           __aligned_u64   out_batch;      /* output: next start batch */
> >           __aligned_u64   keys;
> >           __aligned_u64   values;
> >           __u32           count;          /* input/output:
> >                                            * input: # of key/value
> >                                            * elements
> >                                            * output: # of filled elements
> >                                            */
> >           __u32           map_fd;
> >           __u64           elem_flags;
> >           __u64           flags;
> >    } batch;
> >
> > in_batch/out_batch are opaque values use to communicate between
> > user/kernel space, in_batch/out_batch must be of key_size length.
> >
> > To start iterating from the beginning in_batch must be null,
> > count is the # of key/value elements to retrieve. Note that the 'keys'
> > buffer must be a buffer of key_size * count size and the 'values' buffer
> > must be value_size * count, where value_size must be aligned to 8 bytes
> > by userspace if it's dealing with percpu maps. 'count' will contain the
> > number of keys/values successfully retrieved. Note that 'count' is an
> > input/output variable and it can contain a lower value after a call.
> >
> > If there's no more entries to retrieve, ENOENT will be returned. If error
> > is ENOENT, count might be > 0 in case it copied some values but there were
> > no more entries to retrieve.
> >
> > Note that if the return code is an error and not -EFAULT,
> > count indicates the number of elements successfully processed.
> >
> > Suggested-by: Stanislav Fomichev <sdf@google.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> >   include/linux/bpf.h      |  11 +++
> >   include/uapi/linux/bpf.h |  19 +++++
> >   kernel/bpf/syscall.c     | 176 +++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 206 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 5b81cde47314e..767a823dbac74 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -41,6 +41,11 @@ struct bpf_map_ops {
> >       int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key);
> >       void (*map_release_uref)(struct bpf_map *map);
> >       void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);
> > +     int (*map_lookup_batch)(struct bpf_map *map, const union bpf_attr *attr,
> > +                             union bpf_attr __user *uattr);
> > +     int (*map_lookup_and_delete_batch)(struct bpf_map *map,
> > +                                        const union bpf_attr *attr,
> > +                                        union bpf_attr __user *uattr);
> >
> >       /* funcs callable from userspace and from eBPF programs */
> >       void *(*map_lookup_elem)(struct bpf_map *map, void *key);
> > @@ -797,6 +802,12 @@ void bpf_map_charge_move(struct bpf_map_memory *dst,
> >   void *bpf_map_area_alloc(size_t size, int numa_node);
> >   void bpf_map_area_free(void *base);
> >   void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr);
> > +int  generic_map_lookup_batch(struct bpf_map *map,
> > +                           const union bpf_attr *attr,
> > +                           union bpf_attr __user *uattr);
> > +int  generic_map_lookup_and_delete_batch(struct bpf_map *map,
> > +                                      const union bpf_attr *attr,
> > +                                      union bpf_attr __user *uattr);
> >
> >   extern int sysctl_unprivileged_bpf_disabled;
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4842a134b202a..e60b7b7cda61a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -107,6 +107,8 @@ enum bpf_cmd {
> >       BPF_MAP_LOOKUP_AND_DELETE_ELEM,
> >       BPF_MAP_FREEZE,
> >       BPF_BTF_GET_NEXT_ID,
> > +     BPF_MAP_LOOKUP_BATCH,
> > +     BPF_MAP_LOOKUP_AND_DELETE_BATCH,
> >   };
> >
> >   enum bpf_map_type {
> > @@ -400,6 +402,23 @@ union bpf_attr {
> >               __u64           flags;
> >       };
> >
> > +     struct { /* struct used by BPF_MAP_*_BATCH commands */
> > +             __aligned_u64   in_batch;       /* start batch,
> > +                                              * NULL to start from beginning
> > +                                              */
> > +             __aligned_u64   out_batch;      /* output: next start batch */
> > +             __aligned_u64   keys;
> > +             __aligned_u64   values;
> > +             __u32           count;          /* input/output:
> > +                                              * input: # of key/value
> > +                                              * elements
> > +                                              * output: # of filled elements
> > +                                              */
> > +             __u32           map_fd;
> > +             __u64           elem_flags;
> > +             __u64           flags;
> > +     } batch;
> > +
> >       struct { /* anonymous struct used by BPF_PROG_LOAD command */
> >               __u32           prog_type;      /* one of enum bpf_prog_type */
> >               __u32           insn_cnt;
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index cc714c9d5b4cc..d0d3d0e0eaca4 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1127,6 +1127,124 @@ static int map_get_next_key(union bpf_attr *attr)
> >       return err;
> >   }
> >
> > +static int __generic_map_lookup_batch(struct bpf_map *map,
> > +                                   const union bpf_attr *attr,
> > +                                   union bpf_attr __user *uattr,
> > +                                   bool do_delete)
> > +{
> > +     void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
> > +     void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch);
> > +     void __user *values = u64_to_user_ptr(attr->batch.values);
> > +     void __user *keys = u64_to_user_ptr(attr->batch.keys);
> > +     void *buf, *prev_key, *key, *value;
> > +     u32 value_size, cp, max_count;
> > +     bool first_key = false;
> > +     int err, retry = 3;
> > +
> > +     if (attr->batch.elem_flags & ~BPF_F_LOCK)
> > +             return -EINVAL;
> > +
> > +     if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> > +         !map_value_has_spin_lock(map)) {
> > +             err = -EINVAL;
> > +             goto err_put;
> > +     }
>
> Direct return -EINVAL?
>
> > +
> > +     if (map->map_type == BPF_MAP_TYPE_QUEUE ||
> > +         map->map_type == BPF_MAP_TYPE_STACK) {
> > +             err = -ENOTSUPP;
> > +             goto err_put;
> > +     }
>
> Direct return -ENOTSUPP?
>
> > +
> > +     value_size = bpf_map_value_size(map);
> > +
> > +     max_count = attr->batch.count;
> > +     if (!max_count)
> > +             return 0;
> > +
> > +     err = -ENOMEM;
> > +     buf = kmalloc(map->key_size + value_size, GFP_USER | __GFP_NOWARN);
> > +     if (!buf)
> > +             goto err_put;
>
> Direct return -ENOMEM?
>
> > +
> > +     err = -EFAULT;
> > +     first_key = false;
> > +     if (ubatch && copy_from_user(buf, ubatch, map->key_size))
> > +             goto free_buf;
> > +     key = buf;
> > +     value = key + map->key_size;
> > +     if (!ubatch) {
> > +             prev_key = NULL;
> > +             first_key = true;
> > +     }
> > +
> > +
> One extra line.
>
> > +     for (cp = 0; cp < max_count; cp++) {
> > +             if (cp || first_key) {
> > +                     rcu_read_lock();
> > +                     err = map->ops->map_get_next_key(map, prev_key, key);
> > +                     rcu_read_unlock();
> > +                     if (err)
> > +                             break;
> > +             }
> > +             err = bpf_map_copy_value(map, key, value,
> > +                                      attr->batch.elem_flags, do_delete);
> > +
> > +             if (err == -ENOENT) {
> > +                     if (retry) {
> > +                             retry--;
>
> What is the 'retry' semantics here? After 'continue', cp++ is executed.

Good catch, I'll move cp++ to a proper place. retry is used to prevent
the cases where the map is doing many concurrent additions and
deletions, this could result in map_get_next_key succeeding but
bpf_map_copy_value failing, in which case I think it'd be better to
try and find a next elem, but we don't want to do this for more than 3
times.

>
> > +                             continue;
> > +                     }
> > +                     err = -EINTR;
>
> Why returning -EINTR?

I thought that this is the err more appropriate for the behaviour I
describe above. Should I handle that case? WDYT?


>
> > +                     break;
> > +             }
> > +
> > +             if (err)
> > +                     goto free_buf;
> > +
> > +             if (copy_to_user(keys + cp * map->key_size, key,
> > +                              map->key_size)) {
> > +                     err = -EFAULT;
> > +                     goto free_buf;
> > +             }
> > +             if (copy_to_user(values + cp * value_size, value, value_size)) {
> > +                     err = -EFAULT;
> > +                     goto free_buf;
> > +             }
> > +
> > +             prev_key = key;
> > +             retry = 3;
> > +     }
> > +     if (!err) {
> > +             rcu_read_lock();
> > +             err = map->ops->map_get_next_key(map, prev_key, key);
>
> if err != 0, the 'key' will be invalid and it cannot be used by below
> copy_to_user.
>
> > +             rcu_read_unlock();
> > +     } > +
> > +     if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) ||
>
> The 'cp' may not be accurate if 'retry' is triggered in the above.
>
> > +                 (copy_to_user(uobatch, key, map->key_size))))
> > +             err = -EFAULT;
> > +
> > +free_buf:
> > +     kfree(buf);
> > +err_put:
>
> err_put can be removed.
>
> > +     return err;
> > +}
> > +
> > +int generic_map_lookup_batch(struct bpf_map *map,
> > +                          const union bpf_attr *attr,
> > +                          union bpf_attr __user *uattr)
> > +{
> > +     return __generic_map_lookup_batch(map, attr, uattr, false);
> > +}
> > +
> > +int generic_map_lookup_and_delete_batch(struct bpf_map *map,
> > +                                     const union bpf_attr *attr,
> > +                                     union bpf_attr __user *uattr)
> > +{
> > +     return __generic_map_lookup_batch(map, attr, uattr, true);
> > +}
> > +
> >   #define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value
> >
> >   static int map_lookup_and_delete_elem(union bpf_attr *attr)
> > @@ -2956,6 +3074,57 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
> >       return err;
> >   }
> >
> > +#define BPF_MAP_BATCH_LAST_FIELD batch.flags
> > +
> > +#define BPF_DO_BATCH(fn)                     \
> > +     do {                                    \
> > +             if (!fn) {                      \
> > +                     err = -ENOTSUPP;        \
> > +                     goto err_put;           \
> > +             }                               \
> > +             err = fn(map, attr, uattr);     \
> > +     } while (0)
> > +
> > +static int bpf_map_do_batch(const union bpf_attr *attr,
> > +                         union bpf_attr __user *uattr,
> > +                         int cmd)
> > +{
> > +     struct bpf_map *map;
> > +     int err, ufd;
> > +     struct fd f;
> > +
> > +     if (CHECK_ATTR(BPF_MAP_BATCH))
> > +             return -EINVAL;
> > +
> > +     ufd = attr->batch.map_fd;
> > +     f = fdget(ufd);
> > +     map = __bpf_map_get(f);
> > +     if (IS_ERR(map))
> > +             return PTR_ERR(map);
> > +
> > +     if ((cmd == BPF_MAP_LOOKUP_BATCH ||
> > +          cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH) &&
> > +         !(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
> > +             err = -EPERM;
> > +             goto err_put;
> > +     }
> > +
> > +     if (cmd != BPF_MAP_LOOKUP_BATCH &&
>
> Here should be BPF_MAP_LOOKUP_AND_DELETE_BATCH.
> BPF_MAP_LOOKUP_BATCH does not need FMODE_CAN_WRITE.

ACK.
>
> > +         !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
> > +             err = -EPERM;
> > +             goto err_put;
> > +     }
> > +
> > +     if (cmd == BPF_MAP_LOOKUP_BATCH)
> > +             BPF_DO_BATCH(map->ops->map_lookup_batch);
> > +     else
> > +             BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch);
> > +
> > +err_put:
> > +     fdput(f);
> > +     return err;
> > +}
> > +
> >   SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
> >   {
> >       union bpf_attr attr = {};
> > @@ -3053,6 +3222,13 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
> >       case BPF_MAP_LOOKUP_AND_DELETE_ELEM:
> >               err = map_lookup_and_delete_elem(&attr);
> >               break;
> > +     case BPF_MAP_LOOKUP_BATCH:
> > +             err = bpf_map_do_batch(&attr, uattr, BPF_MAP_LOOKUP_BATCH);
> > +             break;
> > +     case BPF_MAP_LOOKUP_AND_DELETE_BATCH:
> > +             err = bpf_map_do_batch(&attr, uattr,
> > +                                    BPF_MAP_LOOKUP_AND_DELETE_BATCH);
> > +             break;
> >       default:
> >               err = -EINVAL;
> >               break;
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 9/9] selftests/bpf: add batch ops testing to array bpf map
  2019-11-21 21:14     ` Brian Vazquez
@ 2019-11-22  0:22       ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2019-11-22  0:22 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf



On 11/21/19 1:14 PM, Brian Vazquez wrote:
> Thanks for reviewing it!
> 
> On Thu, Nov 21, 2019 at 10:43 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 11/19/19 11:30 AM, Brian Vazquez wrote:
>>> Tested bpf_map_lookup_batch() and bpf_map_update_batch()
>>> functionality.
>>>
>>>     $ ./test_maps
>>>         ...
>>>           test_map_lookup_and_delete_batch_array:PASS
>>>         ...
>>
>> The test is for lookup_batch() and update_batch()
>> and the test name and func name is lookup_and_delete_batch(),
>> probably rename is to lookup_and_update_batch_array()?
>>
> Yes, you are right, I will change the name for next version.
> 
>> It would be good if generic lookup_and_delete_batch()
>> and delete_patch() can be tested as well.
>> Maybe tried to use prog_array?
> 
>   I did test generic_lookup_and_delete_batch with hmap and it worked
> fine because I didn't have concurrent deletions.
> But yes I will add tests for generic delete and lookup_and_delete,
> maybe for the trie map (prog_array doesn't support lookup ang hence
> lookup_and_delete won't apply there)?

trie_map is a good choice. Basically any map which can be used to test
this API.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops
  2019-11-21 21:36     ` Brian Vazquez
@ 2019-11-22  0:34       ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2019-11-22  0:34 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf



On 11/21/19 1:36 PM, Brian Vazquez wrote:
> Hi Yonghong,
> thanks for reviewing the patch, I will fix all the direct returns and
> small fixes in next version.
> 
> On Thu, Nov 21, 2019 at 9:36 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 11/19/19 11:30 AM, Brian Vazquez wrote:
>>> This commit introduces generic support for the bpf_map_lookup_batch and
>>> bpf_map_lookup_and_delete_batch ops. This implementation can be used by
>>> almost all the bpf maps since its core implementation is relying on the
>>> existing map_get_next_key, map_lookup_elem and map_delete_elem
>>> functions. The bpf syscall subcommands introduced are:
>>>
[...]
>>> +     for (cp = 0; cp < max_count; cp++) {
>>> +             if (cp || first_key) {
>>> +                     rcu_read_lock();
>>> +                     err = map->ops->map_get_next_key(map, prev_key, key);
>>> +                     rcu_read_unlock();
>>> +                     if (err)
>>> +                             break;
>>> +             }
>>> +             err = bpf_map_copy_value(map, key, value,
>>> +                                      attr->batch.elem_flags, do_delete);
>>> +
>>> +             if (err == -ENOENT) {
>>> +                     if (retry) {
>>> +                             retry--;
>>
>> What is the 'retry' semantics here? After 'continue', cp++ is executed.
> 
> Good catch, I'll move cp++ to a proper place. retry is used to prevent
> the cases where the map is doing many concurrent additions and
> deletions, this could result in map_get_next_key succeeding but
> bpf_map_copy_value failing, in which case I think it'd be better to
> try and find a next elem, but we don't want to do this for more than 3
> times.
> 
>>
>>> +                             continue;
>>> +                     }
>>> +                     err = -EINTR;
>>
>> Why returning -EINTR?
> 
> I thought that this is the err more appropriate for the behaviour I
> describe above. Should I handle that case? WDYT?

I see. We do not want to use -ENOENT since get_next_key may return 
-ENOENT. I think -EINTR is okay here to indicate we have key, but 
key/value entry is gone right before the attempted access.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 3/9] bpf: add generic support for update and delete batch ops
  2019-11-21 18:00   ` Yonghong Song
@ 2019-11-22  5:50     ` Brian Vazquez
  2019-11-22  6:56       ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Brian Vazquez @ 2019-11-22  5:50 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf

ACK to all the observations, will fix in the next version. There are
just 2 things might be correct, PTAL.

On Thu, Nov 21, 2019 at 10:00 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 11/19/19 11:30 AM, Brian Vazquez wrote:
> > This commit adds generic support for update and delete batch ops that
> > can be used for almost all the bpf maps. These commands share the same
> > UAPI attr that lookup and lookup_and_delete batch ops use and the
> > syscall commands are:
> >
> >    BPF_MAP_UPDATE_BATCH
> >    BPF_MAP_DELETE_BATCH
> >
> > The main difference between update/delete and lookup/lookup_and_delete
> > batch ops is that for update/delete keys/values must be specified for
> > userspace and because of that, neither in_batch nor out_batch are used.
> >
> > Suggested-by: Stanislav Fomichev <sdf@google.com>
> > Signed-off-by: Brian Vazquez <brianvv@google.com>
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> >   include/linux/bpf.h      |  10 ++++
> >   include/uapi/linux/bpf.h |   2 +
> >   kernel/bpf/syscall.c     | 126 ++++++++++++++++++++++++++++++++++++++-
> >   3 files changed, 137 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 767a823dbac74..96a19e1fd2b5b 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -46,6 +46,10 @@ struct bpf_map_ops {
> >       int (*map_lookup_and_delete_batch)(struct bpf_map *map,
> >                                          const union bpf_attr *attr,
> >                                          union bpf_attr __user *uattr);
> > +     int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr,
> > +                             union bpf_attr __user *uattr);
> > +     int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
> > +                             union bpf_attr __user *uattr);
> >
> >       /* funcs callable from userspace and from eBPF programs */
> >       void *(*map_lookup_elem)(struct bpf_map *map, void *key);
> > @@ -808,6 +812,12 @@ int  generic_map_lookup_batch(struct bpf_map *map,
> >   int  generic_map_lookup_and_delete_batch(struct bpf_map *map,
> >                                        const union bpf_attr *attr,
> >                                        union bpf_attr __user *uattr);
> > +int  generic_map_update_batch(struct bpf_map *map,
> > +                           const union bpf_attr *attr,
> > +                           union bpf_attr __user *uattr);
> > +int  generic_map_delete_batch(struct bpf_map *map,
> > +                           const union bpf_attr *attr,
> > +                           union bpf_attr __user *uattr);
> >
> >   extern int sysctl_unprivileged_bpf_disabled;
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index e60b7b7cda61a..0f6ff0c4d79dd 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -109,6 +109,8 @@ enum bpf_cmd {
> >       BPF_BTF_GET_NEXT_ID,
> >       BPF_MAP_LOOKUP_BATCH,
> >       BPF_MAP_LOOKUP_AND_DELETE_BATCH,
> > +     BPF_MAP_UPDATE_BATCH,
> > +     BPF_MAP_DELETE_BATCH,
> >   };
> >
> >   enum bpf_map_type {
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index d0d3d0e0eaca4..06e1bcf40fb8d 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1127,6 +1127,120 @@ static int map_get_next_key(union bpf_attr *attr)
> >       return err;
> >   }
> >
> > +int generic_map_delete_batch(struct bpf_map *map,
> > +                          const union bpf_attr *attr,
> > +                          union bpf_attr __user *uattr)
> > +{
> > +     void __user *keys = u64_to_user_ptr(attr->batch.keys);
> > +     int ufd = attr->map_fd;
> > +     u32 cp, max_count;
> > +     struct fd f;
> > +     void *key;
> > +     int err;
> > +
> > +     f = fdget(ufd);
> > +     if (attr->batch.elem_flags & ~BPF_F_LOCK)
> > +             return -EINVAL;
> > +
> > +     if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> > +         !map_value_has_spin_lock(map)) {
> > +             err = -EINVAL;
> > +             goto err_put;
>
> Just return -EINVAL?
>
> > +     }
> > +
> > +     max_count = attr->batch.count;
> > +     if (!max_count)
> > +             return 0;
> > +
> > +     err = -ENOMEM;
>
> Why initialize err to -ENOMEM? Maybe just err = 0.
>
> > +     for (cp = 0; cp < max_count; cp++) {
> > +             key = __bpf_copy_key(keys + cp * map->key_size, map->key_size);
> > +             if (IS_ERR(key)) {
> > +                     err = PTR_ERR(key);
> > +                     break;
> > +             }
> > +
> > +             if (err)
> > +                     break;
>
> The above is incorrect, esp. if you assign err initial value to -ENOMEM.
> The above ` if (err) break; ` is not really needed. If there is error,
> you already break in the above.
> If map->key_size is not 0, the return value 'key' cannot be NULL pointer.
>
> > +             if (bpf_map_is_dev_bound(map)) {
> > +                     err = bpf_map_offload_delete_elem(map, key);
> > +                     break;
> > +             }
> > +
> > +             preempt_disable();
> > +             __this_cpu_inc(bpf_prog_active);
> > +             rcu_read_lock();
> > +             err = map->ops->map_delete_elem(map, key);
> > +             rcu_read_unlock();
> > +             __this_cpu_dec(bpf_prog_active);
> > +             preempt_enable();
> > +             maybe_wait_bpf_programs(map);
> > +             if (err)
> > +                     break;
> > +     }
> > +     if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
> > +             err = -EFAULT;
>
> If previous err = -EFAULT, even if copy_to_user() succeeded,
> return value will be -EFAULT, so uattr->batch.count cannot be
> trusted. So may be do
>     if (err != -EFAULT && copy_to_user(...))
>        err = -EFAULT
> ?
> There are several other places like this.

I think whatever the err is, cp contains the right amount of entries
correctly updated/deleted and the idea is that you should always try
to copy that value to batch.count, and if that fails when uattr was
created by libbpf, everything was set to  0.

>
> > +err_put:
>
> You don't need err_put label in the above.
>
> > +     return err;
> > +}
> > +int generic_map_update_batch(struct bpf_map *map,
> > +                          const union bpf_attr *attr,
> > +                          union bpf_attr __user *uattr)
> > +{
> > +     void __user *values = u64_to_user_ptr(attr->batch.values);
> > +     void __user *keys = u64_to_user_ptr(attr->batch.keys);
> > +     u32 value_size, cp, max_count;
> > +     int ufd = attr->map_fd;
> > +     void *key, *value;
> > +     struct fd f;
> > +     int err;
> > +
> > +     f = fdget(ufd);
> > +     if (attr->batch.elem_flags & ~BPF_F_LOCK)
> > +             return -EINVAL;
> > +
> > +     if ((attr->batch.elem_flags & BPF_F_LOCK) &&
> > +         !map_value_has_spin_lock(map)) {
> > +             err = -EINVAL;
> > +             goto err_put;
>
> Directly return -EINVAL?
>
> > +     }
> > +
> > +     value_size = bpf_map_value_size(map);
> > +
> > +     max_count = attr->batch.count;
> > +     if (!max_count)
> > +             return 0;
> > +
> > +     err = -ENOMEM;
> > +     value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
> > +     if (!value)
> > +             goto err_put;
>
> Directly return -ENOMEM?
>
> > +
> > +     for (cp = 0; cp < max_count; cp++) {
> > +             key = __bpf_copy_key(keys + cp * map->key_size, map->key_size);
>
> Do you need to free 'key' after its use?
>
> > +             if (IS_ERR(key)) {
> > +                     err = PTR_ERR(key);
> > +                     break;
> > +             }
> > +             err = -EFAULT;
> > +             if (copy_from_user(value, values + cp * value_size, value_size))
> > +                     break;
> > +
> > +             err = bpf_map_update_value(map, f, key, value,
> > +                                        attr->batch.elem_flags);
> > +
> > +             if (err)
> > +                     break;
> > +     }
> > +
> > +     if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
> > +             err = -EFAULT;
>
> Similar to the above comment, if err already -EFAULT, no need
> to do copy_to_user().
>
> > +
> > +     kfree(value);
> > +err_put:
>
> err_put label is not needed.
>
> > +     return err;
> > +}
> > +
> >   static int __generic_map_lookup_batch(struct bpf_map *map,
> >                                     const union bpf_attr *attr,
> >                                     union bpf_attr __user *uattr,
> > @@ -3117,8 +3231,12 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
> >
> >       if (cmd == BPF_MAP_LOOKUP_BATCH)
> >               BPF_DO_BATCH(map->ops->map_lookup_batch);
> > -     else
> > +     else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH)
> >               BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch);
> > +     else if (cmd == BPF_MAP_UPDATE_BATCH)
> > +             BPF_DO_BATCH(map->ops->map_update_batch);
> > +     else
> > +             BPF_DO_BATCH(map->ops->map_delete_batch);
>
> Also need to check map_get_sys_perms() permissions for these two new
> commands. Both delete and update needs FMODE_CAN_WRITE permission.
>
I also got confused for a moment, the check is correct since is using
'!=' not '=='
if (cmd != BPF_MAP_LOOKUP_BATCH &&
            !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {

so basically that means that cmd is update,delete or lookup_and_delete
so we check map_get_sys_perms.

> >
> >   err_put:
> >       fdput(f);
> > @@ -3229,6 +3347,12 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
> >               err = bpf_map_do_batch(&attr, uattr,
> >                                      BPF_MAP_LOOKUP_AND_DELETE_BATCH);
> >               break;
> > +     case BPF_MAP_UPDATE_BATCH:
> > +             err = bpf_map_do_batch(&attr, uattr, BPF_MAP_UPDATE_BATCH);
> > +             break;
> > +     case BPF_MAP_DELETE_BATCH:
> > +             err = bpf_map_do_batch(&attr, uattr, BPF_MAP_DELETE_BATCH);
> > +             break;
> >       default:
> >               err = -EINVAL;
> >               break;
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 bpf-next 3/9] bpf: add generic support for update and delete batch ops
  2019-11-22  5:50     ` Brian Vazquez
@ 2019-11-22  6:56       ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2019-11-22  6:56 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf



On 11/21/19 9:50 PM, Brian Vazquez wrote:
> ACK to all the observations, will fix in the next version. There are
> just 2 things might be correct, PTAL.
> 
> On Thu, Nov 21, 2019 at 10:00 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 11/19/19 11:30 AM, Brian Vazquez wrote:
>>> This commit adds generic support for update and delete batch ops that
>>> can be used for almost all the bpf maps. These commands share the same
>>> UAPI attr that lookup and lookup_and_delete batch ops use and the
>>> syscall commands are:
>>>
>>>     BPF_MAP_UPDATE_BATCH
>>>     BPF_MAP_DELETE_BATCH
>>>
>>> The main difference between update/delete and lookup/lookup_and_delete
>>> batch ops is that for update/delete keys/values must be specified for
>>> userspace and because of that, neither in_batch nor out_batch are used.
>>>
>>> Suggested-by: Stanislav Fomichev <sdf@google.com>
>>> Signed-off-by: Brian Vazquez <brianvv@google.com>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>    include/linux/bpf.h      |  10 ++++
>>>    include/uapi/linux/bpf.h |   2 +
>>>    kernel/bpf/syscall.c     | 126 ++++++++++++++++++++++++++++++++++++++-
>>>    3 files changed, 137 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>> index 767a823dbac74..96a19e1fd2b5b 100644
>>> --- a/include/linux/bpf.h
>>> +++ b/include/linux/bpf.h
>>> @@ -46,6 +46,10 @@ struct bpf_map_ops {
>>>        int (*map_lookup_and_delete_batch)(struct bpf_map *map,
>>>                                           const union bpf_attr *attr,
>>>                                           union bpf_attr __user *uattr);
>>> +     int (*map_update_batch)(struct bpf_map *map, const union bpf_attr *attr,
>>> +                             union bpf_attr __user *uattr);
>>> +     int (*map_delete_batch)(struct bpf_map *map, const union bpf_attr *attr,
>>> +                             union bpf_attr __user *uattr);
>>>
[...]
>>> +
>>> +             preempt_disable();
>>> +             __this_cpu_inc(bpf_prog_active);
>>> +             rcu_read_lock();
>>> +             err = map->ops->map_delete_elem(map, key);
>>> +             rcu_read_unlock();
>>> +             __this_cpu_dec(bpf_prog_active);
>>> +             preempt_enable();
>>> +             maybe_wait_bpf_programs(map);
>>> +             if (err)
>>> +                     break;
>>> +     }
>>> +     if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
>>> +             err = -EFAULT;
>>
>> If previous err = -EFAULT, even if copy_to_user() succeeded,
>> return value will be -EFAULT, so uattr->batch.count cannot be
>> trusted. So may be do
>>      if (err != -EFAULT && copy_to_user(...))
>>         err = -EFAULT
>> ?
>> There are several other places like this.
> 
> I think whatever the err is, cp contains the right amount of entries
> correctly updated/deleted and the idea is that you should always try
> to copy that value to batch.count, and if that fails when uattr was
> created by libbpf, everything was set to  0.

This is what I mean:
   err = -EFAULT; // from previous error
   if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
     err = -EFAULT;
   return err;
User space will not trust uattr->batch.count even copy_to_user()
is successful since -EFAULT is returned.

There are two ways to address this issue if previous error is -EFAULT,
   1. do not copy_to_user() and return -EFAULT, which is I suggested
      in the above.
   2. go ahead to do copy_to_user() and if it is successful, change
      return value to something different from -EFAULT to indicate
      that uattr->batch.count is valid.

I feel it is important to return actual error code -EFAULT to
user so user knows some fault happens. Returning other error code
may be misleading during debugging.

> 
>>
>>> +err_put:
>>
>> You don't need err_put label in the above.
>>
>>> +     return err;
>>> +}
>>> +int generic_map_update_batch(struct bpf_map *map,
>>> +                          const union bpf_attr *attr,
>>> +                          union bpf_attr __user *uattr)
>>> +{
>>> +     void __user *values = u64_to_user_ptr(attr->batch.values);
>>> +     void __user *keys = u64_to_user_ptr(attr->batch.keys);
>>> +     u32 value_size, cp, max_count;
>>> +     int ufd = attr->map_fd;
>>> +     void *key, *value;
>>> +     struct fd f;
>>> +     int err;
>>> +
>>> +     f = fdget(ufd);
>>> +     if (attr->batch.elem_flags & ~BPF_F_LOCK)
>>> +             return -EINVAL;
>>> +
>>> +     if ((attr->batch.elem_flags & BPF_F_LOCK) &&
>>> +         !map_value_has_spin_lock(map)) {
>>> +             err = -EINVAL;
>>> +             goto err_put;
>>
>> Directly return -EINVAL?
>>
>>> +     }
>>> +
>>> +     value_size = bpf_map_value_size(map);
>>> +
>>> +     max_count = attr->batch.count;
>>> +     if (!max_count)
>>> +             return 0;
>>> +
>>> +     err = -ENOMEM;
>>> +     value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);
>>> +     if (!value)
>>> +             goto err_put;
>>
>> Directly return -ENOMEM?
>>
>>> +
>>> +     for (cp = 0; cp < max_count; cp++) {
>>> +             key = __bpf_copy_key(keys + cp * map->key_size, map->key_size);
>>
>> Do you need to free 'key' after its use?
>>
>>> +             if (IS_ERR(key)) {
>>> +                     err = PTR_ERR(key);
>>> +                     break;
>>> +             }
>>> +             err = -EFAULT;
>>> +             if (copy_from_user(value, values + cp * value_size, value_size))
>>> +                     break;
>>> +
>>> +             err = bpf_map_update_value(map, f, key, value,
>>> +                                        attr->batch.elem_flags);
>>> +
>>> +             if (err)
>>> +                     break;
>>> +     }
>>> +
>>> +     if (copy_to_user(&uattr->batch.count, &cp, sizeof(cp)))
>>> +             err = -EFAULT;
>>
>> Similar to the above comment, if err already -EFAULT, no need
>> to do copy_to_user().
>>
>>> +
>>> +     kfree(value);
>>> +err_put:
>>
>> err_put label is not needed.
>>
>>> +     return err;
>>> +}
>>> +
>>>    static int __generic_map_lookup_batch(struct bpf_map *map,
>>>                                      const union bpf_attr *attr,
>>>                                      union bpf_attr __user *uattr,
>>> @@ -3117,8 +3231,12 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
>>>
>>>        if (cmd == BPF_MAP_LOOKUP_BATCH)
>>>                BPF_DO_BATCH(map->ops->map_lookup_batch);
>>> -     else
>>> +     else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH)
>>>                BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch);
>>> +     else if (cmd == BPF_MAP_UPDATE_BATCH)
>>> +             BPF_DO_BATCH(map->ops->map_update_batch);
>>> +     else
>>> +             BPF_DO_BATCH(map->ops->map_delete_batch);
>>
>> Also need to check map_get_sys_perms() permissions for these two new
>> commands. Both delete and update needs FMODE_CAN_WRITE permission.
>>
> I also got confused for a moment, the check is correct since is using
> '!=' not '=='
> if (cmd != BPF_MAP_LOOKUP_BATCH &&
>              !(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
> 
> so basically that means that cmd is update,delete or lookup_and_delete
> so we check map_get_sys_perms.

I missed this. Thanks for explanation!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 bpf-next 1/9] bpf: add bpf_map_{value_size,update_value,map_copy_value} functions
  2019-11-19 19:30 ` [PATCH v2 bpf-next 1/9] bpf: add bpf_map_{value_size,update_value,map_copy_value} functions Brian Vazquez
@ 2019-11-22 16:36   ` John Fastabend
  0 siblings, 0 replies; 26+ messages in thread
From: John Fastabend @ 2019-11-22 16:36 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

Brian Vazquez wrote:
> This commit moves reusable code from map_lookup_elem and map_update_elem
> to avoid code duplication in kernel/bpf/syscall.c.
> 
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> ---
>  kernel/bpf/syscall.c | 271 ++++++++++++++++++++++++-------------------
>  1 file changed, 151 insertions(+), 120 deletions(-)
> 

Nice bit of cleanup.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops
  2019-11-19 19:30 ` [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops Brian Vazquez
  2019-11-21 17:36   ` Yonghong Song
@ 2019-11-22 17:25   ` John Fastabend
  1 sibling, 0 replies; 26+ messages in thread
From: John Fastabend @ 2019-11-22 17:25 UTC (permalink / raw)
  To: Brian Vazquez, Brian Vazquez, Alexei Starovoitov,
	Daniel Borkmann, David S . Miller
  Cc: Yonghong Song, Stanislav Fomichev, Petar Penkov,
	Willem de Bruijn, linux-kernel, netdev, bpf, Brian Vazquez

Brian Vazquez wrote:
> This commit introduces generic support for the bpf_map_lookup_batch and
> bpf_map_lookup_and_delete_batch ops. This implementation can be used by
> almost all the bpf maps since its core implementation is relying on the
> existing map_get_next_key, map_lookup_elem and map_delete_elem
> functions. The bpf syscall subcommands introduced are:
> 
>   BPF_MAP_LOOKUP_BATCH
>   BPF_MAP_LOOKUP_AND_DELETE_BATCH
> 
> The UAPI attribute is:
> 
>   struct { /* struct used by BPF_MAP_*_BATCH commands */
>          __aligned_u64   in_batch;       /* start batch,
>                                           * NULL to start from beginning
>                                           */
>          __aligned_u64   out_batch;      /* output: next start batch */
>          __aligned_u64   keys;
>          __aligned_u64   values;
>          __u32           count;          /* input/output:
>                                           * input: # of key/value
>                                           * elements
>                                           * output: # of filled elements
>                                           */
>          __u32           map_fd;
>          __u64           elem_flags;
>          __u64           flags;
>   } batch;
> 
> in_batch/out_batch are opaque values use to communicate between
> user/kernel space, in_batch/out_batch must be of key_size length.
> 
> To start iterating from the beginning in_batch must be null,
> count is the # of key/value elements to retrieve. Note that the 'keys'
> buffer must be a buffer of key_size * count size and the 'values' buffer
> must be value_size * count, where value_size must be aligned to 8 bytes
> by userspace if it's dealing with percpu maps. 'count' will contain the
> number of keys/values successfully retrieved. Note that 'count' is an
> input/output variable and it can contain a lower value after a call.
> 
> If there's no more entries to retrieve, ENOENT will be returned. If error
> is ENOENT, count might be > 0 in case it copied some values but there were
> no more entries to retrieve.
> 
> Note that if the return code is an error and not -EFAULT,
> count indicates the number of elements successfully processed.
> 
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h      |  11 +++
>  include/uapi/linux/bpf.h |  19 +++++
>  kernel/bpf/syscall.c     | 176 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 206 insertions(+)
> 

Couple additional comments if we are getting a new rev anyways 

[...]

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index cc714c9d5b4cc..d0d3d0e0eaca4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1127,6 +1127,124 @@ static int map_get_next_key(union bpf_attr *attr)
>  	return err;
>  }
>  
> +static int __generic_map_lookup_batch(struct bpf_map *map,
> +				      const union bpf_attr *attr,
> +				      union bpf_attr __user *uattr,
> +				      bool do_delete)
> +{
> +	void __user *ubatch = u64_to_user_ptr(attr->batch.in_batch);
> +	void __user *uobatch = u64_to_user_ptr(attr->batch.out_batch);
> +	void __user *values = u64_to_user_ptr(attr->batch.values);
> +	void __user *keys = u64_to_user_ptr(attr->batch.keys);
> +	void *buf, *prev_key, *key, *value;
> +	u32 value_size, cp, max_count;
> +	bool first_key = false;
> +	int err, retry = 3;
                 ^^^^^^^^^^
define magic value maybe MAP_LOOKUP_RETRIES
> +
> +	if (attr->batch.elem_flags & ~BPF_F_LOCK)
> +		return -EINVAL;
> +	}

[...]

> +
> +	value_size = bpf_map_value_size(map);
> +
> +	max_count = attr->batch.count;
> +	if (!max_count)
> +		return 0;
> +
> +	err = -ENOMEM;
> +	buf = kmalloc(map->key_size + value_size, GFP_USER | __GFP_NOWARN);

Should we also set __GFP_NORETRY or __GFP_RETRY_MAYFAIL perhaps?

> +	if (!buf)
> +		goto err_put;
> +
> +	err = -EFAULT;
> +	first_key = false;
> +	if (ubatch && copy_from_user(buf, ubatch, map->key_size))
> +		goto free_buf;
> +	key = buf;
> +	value = key + map->key_size;
> +	if (!ubatch) {
> +		prev_key = NULL;
> +		first_key = true;
> +	}
> +
> +

nit: extra newline not needed

> +	for (cp = 0; cp < max_count; cp++) {
> +		if (cp || first_key) {
> +			rcu_read_lock();
> +			err = map->ops->map_get_next_key(map, prev_key, key);
> +			rcu_read_unlock();
> +			if (err)
> +				break;
> +		}
> +		err = bpf_map_copy_value(map, key, value,
> +					 attr->batch.elem_flags, do_delete);
> +
> +		if (err == -ENOENT) {
> +			if (retry) {
> +				retry--;
> +				continue;
> +			}
> +			err = -EINTR;
> +			break;
> +		}
> +
> +		if (err)
> +			goto free_buf;
> +
> +		if (copy_to_user(keys + cp * map->key_size, key,
> +				 map->key_size)) {
> +			err = -EFAULT;
> +			goto free_buf;
> +		}
> +		if (copy_to_user(values + cp * value_size, value, value_size)) {
> +			err = -EFAULT;
> +			goto free_buf;

You could do the same here as in the retry==0 above case, break and report
back up to user cp values?

> +		}
> +
> +		prev_key = key;
> +		retry = 3;
                ^^^^^^^^^
same nit as above, use define so we can tune this if needed.

> +	}
> +	if (!err) {
> +		rcu_read_lock();
> +		err = map->ops->map_get_next_key(map, prev_key, key);
> +		rcu_read_unlock();
> +	}
> +
> +	if ((copy_to_user(&uattr->batch.count, &cp, sizeof(cp)) ||
> +		    (copy_to_user(uobatch, key, map->key_size))))
> +		err = -EFAULT;
> +
> +free_buf:
> +	kfree(buf);
> +err_put:
> +	return err;
> +}

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-11-22 17:25 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 19:30 [PATCH v2 bpf-next 0/9] add bpf batch ops to process more than 1 elem Brian Vazquez
2019-11-19 19:30 ` [PATCH v2 bpf-next 1/9] bpf: add bpf_map_{value_size,update_value,map_copy_value} functions Brian Vazquez
2019-11-22 16:36   ` John Fastabend
2019-11-19 19:30 ` [PATCH v2 bpf-next 2/9] bpf: add generic support for lookup and lookup_and_delete batch ops Brian Vazquez
2019-11-21 17:36   ` Yonghong Song
2019-11-21 21:36     ` Brian Vazquez
2019-11-22  0:34       ` Yonghong Song
2019-11-22 17:25   ` John Fastabend
2019-11-19 19:30 ` [PATCH v2 bpf-next 3/9] bpf: add generic support for update and delete " Brian Vazquez
2019-11-21 18:00   ` Yonghong Song
2019-11-22  5:50     ` Brian Vazquez
2019-11-22  6:56       ` Yonghong Song
2019-11-19 19:30 ` [PATCH v2 bpf-next 4/9] bpf: add lookup and updated batch ops to arraymap Brian Vazquez
2019-11-19 19:30 ` [PATCH v2 bpf-next 5/9] bpf: add batch ops to all htab bpf map Brian Vazquez
2019-11-21 18:27   ` Yonghong Song
2019-11-21 21:27     ` Brian Vazquez
2019-11-19 19:30 ` [PATCH v2 bpf-next 6/9] tools/bpf: sync uapi header bpf.h Brian Vazquez
2019-11-19 19:30 ` [PATCH v2 bpf-next 7/9] libbpf: add libbpf support to batch ops Brian Vazquez
2019-11-21 18:30   ` Yonghong Song
2019-11-19 19:30 ` [PATCH v2 bpf-next 8/9] selftests/bpf: add batch ops testing for hmap and hmap_percpu Brian Vazquez
2019-11-21 18:36   ` Yonghong Song
2019-11-21 21:16     ` Brian Vazquez
2019-11-19 19:30 ` [PATCH v2 bpf-next 9/9] selftests/bpf: add batch ops testing to array bpf map Brian Vazquez
2019-11-21 18:43   ` Yonghong Song
2019-11-21 21:14     ` Brian Vazquez
2019-11-22  0:22       ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).