netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] Single-cpu updates for per-cpu maps
@ 2019-12-18 14:22 Paul Chaignon
  2019-12-18 14:23 ` [PATCH bpf-next 1/3] bpf: " Paul Chaignon
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Paul Chaignon @ 2019-12-18 14:22 UTC (permalink / raw)
  To: bpf
  Cc: paul.chaignon, netdev, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko

Currently, userspace programs have to update the values of all CPUs at
once when updating per-cpu maps.  This limitation prevents the update of a
single CPU's value without the risk of missing concurrent updates on other
CPU's values.

The first patch allows userspace to update the value of a specific CPU in
per-cpu maps.  The second and last patches add test cases and support in
bpftool respectively.

Paul Chaignon (3):
  bpf: Single-cpu updates for per-cpu maps
  selftests/bpf: Tests for single-cpu updates of per-cpu maps
  bpftool: Support single-cpu updates for per-cpu maps

 include/uapi/linux/bpf.h                      |  4 ++
 kernel/bpf/arraymap.c                         | 19 +++--
 kernel/bpf/hashtab.c                          | 49 +++++++------
 kernel/bpf/local_storage.c                    | 16 +++--
 kernel/bpf/syscall.c                          | 17 +++--
 .../bpf/bpftool/Documentation/bpftool-map.rst | 13 ++--
 tools/bpf/bpftool/bash-completion/bpftool     |  2 +-
 tools/bpf/bpftool/map.c                       | 70 ++++++++++++++-----
 tools/include/uapi/linux/bpf.h                |  4 ++
 tools/testing/selftests/bpf/test_maps.c       | 34 ++++++++-
 10 files changed, 168 insertions(+), 60 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 1/3] bpf: Single-cpu updates for per-cpu maps
  2019-12-18 14:22 [PATCH bpf-next 0/3] Single-cpu updates for per-cpu maps Paul Chaignon
@ 2019-12-18 14:23 ` Paul Chaignon
  2019-12-18 18:00   ` Alexei Starovoitov
  2019-12-18 19:10   ` Yonghong Song
  2019-12-18 14:23 ` [PATCH bpf-next 2/3] selftests/bpf: Tests for single-cpu updates of " Paul Chaignon
  2019-12-18 14:23 ` [PATCH bpf-next 3/3] bpftool: Support single-cpu updates for " Paul Chaignon
  2 siblings, 2 replies; 8+ messages in thread
From: Paul Chaignon @ 2019-12-18 14:23 UTC (permalink / raw)
  To: bpf
  Cc: paul.chaignon, netdev, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko

Currently, userspace programs have to update the values of all CPUs at
once when updating per-cpu maps.  This limitation prevents the update of
a single CPU's value without the risk of missing concurrent updates on
other CPU's values.

This patch allows userspace to update the value of a specific CPU in
per-cpu maps.  The CPU whose value should be updated is encoded in the
32 upper-bits of the flags argument, as follows.  The new BPF_CPU flag
can be combined with existing flags.

  bpf_map_update_elem(..., cpuid << 32 | BPF_CPU)

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
---
 include/uapi/linux/bpf.h       |  4 +++
 kernel/bpf/arraymap.c          | 19 ++++++++-----
 kernel/bpf/hashtab.c           | 49 ++++++++++++++++++++--------------
 kernel/bpf/local_storage.c     | 16 +++++++----
 kernel/bpf/syscall.c           | 17 +++++++++---
 tools/include/uapi/linux/bpf.h |  4 +++
 6 files changed, 74 insertions(+), 35 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dbbcf0b02970..2efb17d2c77a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -316,6 +316,10 @@ enum bpf_attach_type {
 #define BPF_NOEXIST	1 /* create new element if it didn't exist */
 #define BPF_EXIST	2 /* update existing element */
 #define BPF_F_LOCK	4 /* spin_lock-ed map_lookup/map_update */
+#define BPF_CPU		8 /* single-cpu update for per-cpu maps */
+
+/* CPU mask for single-cpu updates */
+#define BPF_CPU_MASK	0xFFFFFFFF00000000ULL
 
 /* flags for BPF_MAP_CREATE command */
 #define BPF_F_NO_PREALLOC	(1U << 0)
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index f0d19bbb9211..a96e94696819 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -302,7 +302,8 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 	u32 index = *(u32 *)key;
 	char *val;
 
-	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
+	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_F_LOCK &
+				  ~BPF_CPU) > BPF_EXIST))
 		/* unknown flags */
 		return -EINVAL;
 
@@ -341,7 +342,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 	int cpu, off = 0;
 	u32 size;
 
-	if (unlikely(map_flags > BPF_EXIST))
+	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_CPU) > BPF_EXIST))
 		/* unknown flags */
 		return -EINVAL;
 
@@ -349,7 +350,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 		/* all elements were pre-allocated, cannot insert a new one */
 		return -E2BIG;
 
-	if (unlikely(map_flags == BPF_NOEXIST))
+	if (unlikely(map_flags & BPF_NOEXIST))
 		/* all elements already exist */
 		return -EEXIST;
 
@@ -362,9 +363,15 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
 	size = round_up(map->value_size, 8);
 	rcu_read_lock();
 	pptr = array->pptrs[index & array->index_mask];
-	for_each_possible_cpu(cpu) {
-		bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value + off, size);
-		off += size;
+	if (map_flags & BPF_CPU) {
+		bpf_long_memcpy(per_cpu_ptr(pptr, map_flags >> 32), value,
+				size);
+	} else {
+		for_each_possible_cpu(cpu) {
+			bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value + off,
+					size);
+			off += size;
+		}
 	}
 	rcu_read_unlock();
 	return 0;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 22066a62c8c9..be45c7c4509f 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -695,12 +695,12 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 }
 
 static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
-			    void *value, bool onallcpus)
+			    void *value, int cpuid)
 {
-	if (!onallcpus) {
+	if (cpuid == -1) {
 		/* copy true value_size bytes */
 		memcpy(this_cpu_ptr(pptr), value, htab->map.value_size);
-	} else {
+	} else if (cpuid == -2) {
 		u32 size = round_up(htab->map.value_size, 8);
 		int off = 0, cpu;
 
@@ -709,6 +709,10 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
 					value + off, size);
 			off += size;
 		}
+	} else {
+		u32 size = round_up(htab->map.value_size, 8);
+
+		bpf_long_memcpy(per_cpu_ptr(pptr, cpuid), value, size);
 	}
 }
 
@@ -720,7 +724,7 @@ static bool fd_htab_map_needs_adjust(const struct bpf_htab *htab)
 
 static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 					 void *value, u32 key_size, u32 hash,
-					 bool percpu, bool onallcpus,
+					 bool percpu, int cpuid,
 					 struct htab_elem *old_elem)
 {
 	u32 size = htab->map.value_size;
@@ -781,7 +785,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 			}
 		}
 
-		pcpu_copy_value(htab, pptr, value, onallcpus);
+		pcpu_copy_value(htab, pptr, value, cpuid);
 
 		if (!prealloc)
 			htab_elem_set_ptr(l_new, key_size, pptr);
@@ -804,11 +808,11 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 static int check_flags(struct bpf_htab *htab, struct htab_elem *l_old,
 		       u64 map_flags)
 {
-	if (l_old && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
+	if (l_old && (map_flags & BPF_NOEXIST) == BPF_NOEXIST)
 		/* elem already exists */
 		return -EEXIST;
 
-	if (!l_old && (map_flags & ~BPF_F_LOCK) == BPF_EXIST)
+	if (!l_old && (map_flags & BPF_EXIST) == BPF_EXIST)
 		/* elem doesn't exist, cannot update it */
 		return -ENOENT;
 
@@ -827,7 +831,8 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 	u32 key_size, hash;
 	int ret;
 
-	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
+	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_F_LOCK &
+				  ~BPF_CPU) > BPF_EXIST))
 		/* unknown flags */
 		return -EINVAL;
 
@@ -919,7 +924,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
 	u32 key_size, hash;
 	int ret;
 
-	if (unlikely(map_flags > BPF_EXIST))
+	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_CPU) > BPF_EXIST))
 		/* unknown flags */
 		return -EINVAL;
 
@@ -974,7 +979,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
 
 static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 					 void *value, u64 map_flags,
-					 bool onallcpus)
+					 int cpuid)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
 	struct htab_elem *l_new = NULL, *l_old;
@@ -984,7 +989,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	u32 key_size, hash;
 	int ret;
 
-	if (unlikely(map_flags > BPF_EXIST))
+	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_CPU) > BPF_EXIST))
 		/* unknown flags */
 		return -EINVAL;
 
@@ -1009,10 +1014,10 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 	if (l_old) {
 		/* per-cpu hash map can update value in-place */
 		pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size),
-				value, onallcpus);
+				value, cpuid);
 	} else {
 		l_new = alloc_htab_elem(htab, key, value, key_size,
-					hash, true, onallcpus, NULL);
+					hash, true, cpuid, NULL);
 		if (IS_ERR(l_new)) {
 			ret = PTR_ERR(l_new);
 			goto err;
@@ -1027,7 +1032,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 
 static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 					     void *value, u64 map_flags,
-					     bool onallcpus)
+					     int cpuid)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
 	struct htab_elem *l_new = NULL, *l_old;
@@ -1075,10 +1080,10 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 
 		/* per-cpu hash map can update value in-place */
 		pcpu_copy_value(htab, htab_elem_get_ptr(l_old, key_size),
-				value, onallcpus);
+				value, cpuid);
 	} else {
 		pcpu_copy_value(htab, htab_elem_get_ptr(l_new, key_size),
-				value, onallcpus);
+				value, cpuid);
 		hlist_nulls_add_head_rcu(&l_new->hash_node, head);
 		l_new = NULL;
 	}
@@ -1093,14 +1098,14 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 static int htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 				       void *value, u64 map_flags)
 {
-	return __htab_percpu_map_update_elem(map, key, value, map_flags, false);
+	return __htab_percpu_map_update_elem(map, key, value, map_flags, -1);
 }
 
 static int htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 					   void *value, u64 map_flags)
 {
 	return __htab_lru_percpu_map_update_elem(map, key, value, map_flags,
-						 false);
+						 -1);
 }
 
 /* Called from syscall or from eBPF program */
@@ -1316,15 +1321,19 @@ int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
 			   u64 map_flags)
 {
 	struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
+	int cpuid = -2; /* update values of all cpus */
 	int ret;
 
+	if (map_flags & BPF_CPU)
+		cpuid = map_flags >> 32;
+
 	rcu_read_lock();
 	if (htab_is_lru(htab))
 		ret = __htab_lru_percpu_map_update_elem(map, key, value,
-							map_flags, true);
+							map_flags, cpuid);
 	else
 		ret = __htab_percpu_map_update_elem(map, key, value, map_flags,
-						    true);
+						    cpuid);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 2ba750725cb2..2f93d75fc74b 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -206,7 +206,7 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *_key,
 	int cpu, off = 0;
 	u32 size;
 
-	if (map_flags != BPF_ANY && map_flags != BPF_EXIST)
+	if (map_flags & ~BPF_CPU_MASK & ~BPF_CPU & ~BPF_EXIST)
 		return -EINVAL;
 
 	rcu_read_lock();
@@ -223,10 +223,16 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *_map, void *_key,
 	 * so no kernel data leaks possible
 	 */
 	size = round_up(_map->value_size, 8);
-	for_each_possible_cpu(cpu) {
-		bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu),
-				value + off, size);
-		off += size;
+	if (map_flags & BPF_CPU) {
+		cpu = map_flags >> 32;
+		bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu), value,
+				size);
+	} else {
+		for_each_possible_cpu(cpu) {
+			bpf_long_memcpy(per_cpu_ptr(storage->percpu_buf, cpu),
+					value + off, size);
+			off += size;
+		}
 	}
 	rcu_read_unlock();
 	return 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b08c362f4e02..83e190929f2e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -953,7 +953,9 @@ static int map_update_elem(union bpf_attr *attr)
 	struct bpf_map *map;
 	void *key, *value;
 	u32 value_size;
+	bool per_cpu;
 	struct fd f;
+	int cpuid;
 	int err;
 
 	if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM))
@@ -974,16 +976,23 @@ static int map_update_elem(union bpf_attr *attr)
 		goto err_put;
 	}
 
+	per_cpu = map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+		  map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
+		  map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
+		  map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE;
+	if ((attr->flags & BPF_CPU) &&
+	    (!per_cpu || (attr->flags >> 32) >= num_possible_cpus())) {
+		err = -EINVAL;
+		goto err_put;
+	}
+
 	key = __bpf_copy_key(ukey, map->key_size);
 	if (IS_ERR(key)) {
 		err = PTR_ERR(key);
 		goto err_put;
 	}
 
-	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
-	    map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
+	if (per_cpu && !(attr->flags & BPF_CPU))
 		value_size = round_up(map->value_size, 8) * num_possible_cpus();
 	else
 		value_size = map->value_size;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index dbbcf0b02970..2efb17d2c77a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -316,6 +316,10 @@ enum bpf_attach_type {
 #define BPF_NOEXIST	1 /* create new element if it didn't exist */
 #define BPF_EXIST	2 /* update existing element */
 #define BPF_F_LOCK	4 /* spin_lock-ed map_lookup/map_update */
+#define BPF_CPU		8 /* single-cpu update for per-cpu maps */
+
+/* CPU mask for single-cpu updates */
+#define BPF_CPU_MASK	0xFFFFFFFF00000000ULL
 
 /* flags for BPF_MAP_CREATE command */
 #define BPF_F_NO_PREALLOC	(1U << 0)
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 2/3] selftests/bpf: Tests for single-cpu updates of per-cpu maps
  2019-12-18 14:22 [PATCH bpf-next 0/3] Single-cpu updates for per-cpu maps Paul Chaignon
  2019-12-18 14:23 ` [PATCH bpf-next 1/3] bpf: " Paul Chaignon
@ 2019-12-18 14:23 ` Paul Chaignon
  2019-12-18 14:23 ` [PATCH bpf-next 3/3] bpftool: Support single-cpu updates for " Paul Chaignon
  2 siblings, 0 replies; 8+ messages in thread
From: Paul Chaignon @ 2019-12-18 14:23 UTC (permalink / raw)
  To: bpf
  Cc: paul.chaignon, netdev, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko

This patch adds test cases for single-cpu updates of per-cpu maps.  It
updates values for the first and last CPUs and checks that only their
values were updated.

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
---
 tools/testing/selftests/bpf/test_maps.c | 34 ++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index 02eae1e864c2..e3d3b15e1b91 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -150,6 +150,7 @@ static void test_hashmap_percpu(unsigned int task, void *data)
 	BPF_DECLARE_PERCPU(long, value);
 	long long key, next_key, first_key;
 	int expected_key_mask = 0;
+	long new_value = 42;
 	int fd, i;
 
 	fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_HASH, sizeof(key),
@@ -232,6 +233,23 @@ static void test_hashmap_percpu(unsigned int task, void *data)
 	key = 1;
 	assert(bpf_map_update_elem(fd, &key, value, BPF_EXIST) == 0);
 
+	/* Update value for a single CPU. */
+	assert(bpf_map_update_elem(fd, &key, &new_value,
+				   BPF_NOEXIST | BPF_CPU) == -1 &&
+	       errno == EEXIST);
+	assert(bpf_map_update_elem(fd, &key, &new_value,
+				   ((__u64)nr_cpus) << 32 | BPF_CPU) == -1 &&
+	       errno == EINVAL);
+	assert(bpf_map_update_elem(fd, &key, &new_value,
+				   BPF_EXIST | BPF_CPU) == 0);
+	assert(bpf_map_update_elem(fd, &key, &new_value,
+				   (nr_cpus - 1ULL) << 32 | BPF_CPU) == 0);
+	assert(bpf_map_lookup_elem(fd, &key, value) == 0);
+	assert(bpf_percpu(value, 0) == new_value);
+	for (i = 1; i < nr_cpus - 1; i++)
+		assert(bpf_percpu(value, i) == i + 100);
+	assert(bpf_percpu(value, nr_cpus - 1) == new_value);
+
 	/* Delete both elements. */
 	key = 1;
 	assert(bpf_map_delete_elem(fd, &key) == 0);
@@ -402,6 +420,7 @@ static void test_arraymap_percpu(unsigned int task, void *data)
 	unsigned int nr_cpus = bpf_num_possible_cpus();
 	BPF_DECLARE_PERCPU(long, values);
 	int key, next_key, fd, i;
+	long new_value = 42;
 
 	fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_ARRAY, sizeof(key),
 			    sizeof(bpf_percpu(values, 0)), 2, 0);
@@ -449,8 +468,21 @@ static void test_arraymap_percpu(unsigned int task, void *data)
 	assert(bpf_map_get_next_key(fd, &next_key, &next_key) == -1 &&
 	       errno == ENOENT);
 
-	/* Delete shouldn't succeed. */
+	/* Update value for a single CPU */
 	key = 1;
+	assert(bpf_map_update_elem(fd, &key, &new_value,
+				   ((__u64)nr_cpus) << 32 | BPF_CPU) == -1 &&
+	       errno == EINVAL);
+	assert(bpf_map_update_elem(fd, &key, &new_value, BPF_CPU) == 0);
+	assert(bpf_map_update_elem(fd, &key, &new_value,
+				   (nr_cpus - 1ULL) << 32 | BPF_CPU) == 0);
+	assert(bpf_map_lookup_elem(fd, &key, values) == 0);
+	assert(bpf_percpu(values, 0) == new_value);
+	for (i = 1; i < nr_cpus - 1; i++)
+		assert(bpf_percpu(values, i) == i + 100);
+	assert(bpf_percpu(values, nr_cpus - 1) == new_value);
+
+	/* Delete shouldn't succeed. */
 	assert(bpf_map_delete_elem(fd, &key) == -1 && errno == EINVAL);
 
 	close(fd);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf-next 3/3] bpftool: Support single-cpu updates for per-cpu maps
  2019-12-18 14:22 [PATCH bpf-next 0/3] Single-cpu updates for per-cpu maps Paul Chaignon
  2019-12-18 14:23 ` [PATCH bpf-next 1/3] bpf: " Paul Chaignon
  2019-12-18 14:23 ` [PATCH bpf-next 2/3] selftests/bpf: Tests for single-cpu updates of " Paul Chaignon
@ 2019-12-18 14:23 ` Paul Chaignon
  2 siblings, 0 replies; 8+ messages in thread
From: Paul Chaignon @ 2019-12-18 14:23 UTC (permalink / raw)
  To: bpf
  Cc: paul.chaignon, netdev, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko

This patch adds support for the new BPF_CPU flag in bpftool, to enable
single-cpu updates of per-cpu maps.  It can be combined with existing
flags; for example, to update the value for key 0 on CPU 9 only if it
doesn't already exist:

  bpftool map update key 0 0 0 0 value 1 0 0 0 noexist cpu 9

Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
---
 .../bpf/bpftool/Documentation/bpftool-map.rst | 13 ++--
 tools/bpf/bpftool/bash-completion/bpftool     |  2 +-
 tools/bpf/bpftool/map.c                       | 70 ++++++++++++++-----
 3 files changed, 61 insertions(+), 24 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index cdeae8ae90ba..72aa9b72f08f 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -25,7 +25,7 @@ MAP COMMANDS
 |	**bpftool** **map create**     *FILE* **type** *TYPE* **key** *KEY_SIZE* **value** *VALUE_SIZE* \
 |		**entries** *MAX_ENTRIES* **name** *NAME* [**flags** *FLAGS*] [**dev** *NAME*]
 |	**bpftool** **map dump**       *MAP*
-|	**bpftool** **map update**     *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]
+|	**bpftool** **map update**     *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]...
 |	**bpftool** **map lookup**     *MAP* [**key** *DATA*]
 |	**bpftool** **map getnext**    *MAP* [**key** *DATA*]
 |	**bpftool** **map delete**     *MAP*  **key** *DATA*
@@ -43,7 +43,7 @@ MAP COMMANDS
 |	*DATA* := { [**hex**] *BYTES* }
 |	*PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* | **name** *PROG_NAME* }
 |	*VALUE* := { *DATA* | *MAP* | *PROG* }
-|	*UPDATE_FLAGS* := { **any** | **exist** | **noexist** }
+|	*UPDATE_FLAGS* := { **any** | **exist** | **noexist** | **cpu** *CPU_ID* }
 |	*TYPE* := { **hash** | **array** | **prog_array** | **perf_event_array** | **percpu_hash**
 |		| **percpu_array** | **stack_trace** | **cgroup_array** | **lru_hash**
 |		| **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps**
@@ -73,9 +73,12 @@ DESCRIPTION
 	**bpftool map update**  *MAP* [**key** *DATA*] [**value** *VALUE*] [*UPDATE_FLAGS*]
 		  Update map entry for a given *KEY*.
 
-		  *UPDATE_FLAGS* can be one of: **any** update existing entry
-		  or add if doesn't exit; **exist** update only if entry already
-		  exists; **noexist** update only if entry doesn't exist.
+		  *UPDATE_FLAGS* can be: **any** update existing entry or add
+		  if doesn't exit; **exist** update only if entry already
+		  exists; **noexist** update only if entry doesn't exist;
+		  **cpu** update only value for given *CPU_ID* in per-CPU map.
+		  Only one of **any**, **exist**, and **noexist** can be
+		  specified.
 
 		  If the **hex** keyword is provided in front of the bytes
 		  sequence, the bytes are parsed as hexadeximal values, even if
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 754d8395e451..116bf3dffb47 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -689,7 +689,7 @@ _bpftool()
                             esac
 
                             _bpftool_once_attr 'key'
-                            local UPDATE_FLAGS='any exist noexist'
+                            local UPDATE_FLAGS='any exist noexist cpu'
                             for (( idx=3; idx < ${#words[@]}-1; idx++ )); do
                                 if [[ ${words[idx]} == 'value' ]]; then
                                     # 'value' is present, but is not the last
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index c01f76fa6876..da1455f460b1 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -470,9 +470,10 @@ static void fill_per_cpu_value(struct bpf_map_info *info, void *value)
 		memcpy(value + i * step, value, info->value_size);
 }
 
-static int parse_elem(char **argv, struct bpf_map_info *info,
-		      void *key, void *value, __u32 key_size, __u32 value_size,
-		      __u32 *flags, __u32 **value_fd)
+static int
+__parse_elem(char **argv, struct bpf_map_info *info, void *key, void *value,
+	     __u32 key_size, __u32 value_size, __u64 *flags, __u32 **value_fd,
+	     bool any_flag)
 {
 	if (!*argv) {
 		if (!key && !value)
@@ -494,8 +495,8 @@ static int parse_elem(char **argv, struct bpf_map_info *info,
 		if (!argv)
 			return -1;
 
-		return parse_elem(argv, info, NULL, value, key_size, value_size,
-				  flags, value_fd);
+		return __parse_elem(argv, info, NULL, value, key_size,
+				    value_size, flags, value_fd, any_flag);
 	} else if (is_prefix(*argv, "value")) {
 		int fd;
 
@@ -556,30 +557,63 @@ static int parse_elem(char **argv, struct bpf_map_info *info,
 			fill_per_cpu_value(info, value);
 		}
 
-		return parse_elem(argv, info, key, NULL, key_size, value_size,
-				  flags, NULL);
+		return __parse_elem(argv, info, key, NULL, key_size,
+				    value_size, flags, NULL, any_flag);
 	} else if (is_prefix(*argv, "any") || is_prefix(*argv, "noexist") ||
 		   is_prefix(*argv, "exist")) {
-		if (!flags) {
+		if (any_flag || *flags & (BPF_NOEXIST | BPF_EXIST)) {
 			p_err("flags specified multiple times: %s", *argv);
 			return -1;
 		}
 
-		if (is_prefix(*argv, "any"))
-			*flags = BPF_ANY;
-		else if (is_prefix(*argv, "noexist"))
-			*flags = BPF_NOEXIST;
-		else if (is_prefix(*argv, "exist"))
-			*flags = BPF_EXIST;
+		if (is_prefix(*argv, "any")) {
+			*flags |= BPF_ANY;
+			any_flag = true;
+		} else if (is_prefix(*argv, "noexist")) {
+			*flags |= BPF_NOEXIST;
+		} else if (is_prefix(*argv, "exist")) {
+			*flags |= BPF_EXIST;
+		}
+
+		return __parse_elem(argv + 1, info, key, value, key_size,
+				    value_size, flags, value_fd, any_flag);
+	} else if (is_prefix(*argv, "cpu")) {
+		unsigned long long cpuid;
+		char *endptr;
+
+		if (*flags & BPF_CPU) {
+			p_err("flags specified multiple times: %s", *argv);
+			return -1;
+		}
 
-		return parse_elem(argv + 1, info, key, value, key_size,
-				  value_size, NULL, value_fd);
+		cpuid = strtoull(*(argv + 1), &endptr, 0);
+		if (*endptr) {
+			p_err("can't parse CPU id %s", *(argv + 1));
+			return -1;
+		}
+		if (cpuid >= get_possible_cpus()) {
+			p_err("incorrect value for CPU id");
+			return -1;
+		}
+
+		*flags |= (cpuid << 32) | BPF_CPU;
+
+		return __parse_elem(argv + 2, info, key, value, key_size,
+				    value_size, flags, value_fd, any_flag);
 	}
 
 	p_err("expected key or value, got: %s", *argv);
 	return -1;
 }
 
+static int
+parse_elem(char **argv, struct bpf_map_info *info, void *key, void *value,
+	   __u32 key_size, __u32 value_size, __u64 *flags, __u32 **value_fd)
+{
+	return __parse_elem(argv, info, key, value, key_size, value_size,
+			    flags, value_fd, false);
+}
+
 static void show_map_header_json(struct bpf_map_info *info, json_writer_t *wtr)
 {
 	jsonw_uint_field(wtr, "id", info->id);
@@ -1114,7 +1148,7 @@ static int do_update(int argc, char **argv)
 	struct bpf_map_info info = {};
 	__u32 len = sizeof(info);
 	__u32 *value_fd = NULL;
-	__u32 flags = BPF_ANY;
+	__u64 flags = BPF_ANY;
 	void *key, *value;
 	int fd, err;
 
@@ -1558,7 +1592,7 @@ static int do_help(int argc, char **argv)
 		"       DATA := { [hex] BYTES }\n"
 		"       " HELP_SPEC_PROGRAM "\n"
 		"       VALUE := { DATA | MAP | PROG }\n"
-		"       UPDATE_FLAGS := { any | exist | noexist }\n"
+		"       UPDATE_FLAGS := { any | exist | noexist | cpu CPU_ID }\n"
 		"       TYPE := { hash | array | prog_array | perf_event_array | percpu_hash |\n"
 		"                 percpu_array | stack_trace | cgroup_array | lru_hash |\n"
 		"                 lru_percpu_hash | lpm_trie | array_of_maps | hash_of_maps |\n"
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Single-cpu updates for per-cpu maps
  2019-12-18 14:23 ` [PATCH bpf-next 1/3] bpf: " Paul Chaignon
@ 2019-12-18 18:00   ` Alexei Starovoitov
  2019-12-19  9:14     ` Paul Chaignon
  2019-12-18 19:10   ` Yonghong Song
  1 sibling, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2019-12-18 18:00 UTC (permalink / raw)
  To: Paul Chaignon
  Cc: bpf, paul.chaignon, netdev, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko

On Wed, Dec 18, 2019 at 03:23:04PM +0100, Paul Chaignon wrote:
> Currently, userspace programs have to update the values of all CPUs at
> once when updating per-cpu maps.  This limitation prevents the update of
> a single CPU's value without the risk of missing concurrent updates on
> other CPU's values.
> 
> This patch allows userspace to update the value of a specific CPU in
> per-cpu maps.  The CPU whose value should be updated is encoded in the
> 32 upper-bits of the flags argument, as follows.  The new BPF_CPU flag
> can be combined with existing flags.

In general makes sense. Could you elaborate more on concrete issue?

>   bpf_map_update_elem(..., cpuid << 32 | BPF_CPU)
> 
> Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
> ---
>  include/uapi/linux/bpf.h       |  4 +++
>  kernel/bpf/arraymap.c          | 19 ++++++++-----
>  kernel/bpf/hashtab.c           | 49 ++++++++++++++++++++--------------
>  kernel/bpf/local_storage.c     | 16 +++++++----
>  kernel/bpf/syscall.c           | 17 +++++++++---
>  tools/include/uapi/linux/bpf.h |  4 +++
>  6 files changed, 74 insertions(+), 35 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index dbbcf0b02970..2efb17d2c77a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -316,6 +316,10 @@ enum bpf_attach_type {
>  #define BPF_NOEXIST	1 /* create new element if it didn't exist */
>  #define BPF_EXIST	2 /* update existing element */
>  #define BPF_F_LOCK	4 /* spin_lock-ed map_lookup/map_update */
> +#define BPF_CPU		8 /* single-cpu update for per-cpu maps */

BPF_F_CPU would be more consistent with the rest of flags.

Can BPF_F_CURRENT_CPU be supported as well?

And for consistency support this flag in map_lookup_elem too?

> +
> +/* CPU mask for single-cpu updates */
> +#define BPF_CPU_MASK	0xFFFFFFFF00000000ULL

what is the reason to expose this in uapi?

>  /* flags for BPF_MAP_CREATE command */
>  #define BPF_F_NO_PREALLOC	(1U << 0)
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index f0d19bbb9211..a96e94696819 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -302,7 +302,8 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
>  	u32 index = *(u32 *)key;
>  	char *val;
>  
> -	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
> +	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_F_LOCK &
> +				  ~BPF_CPU) > BPF_EXIST))

that reads odd.
More traditional would be ~ (A | B | C)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Single-cpu updates for per-cpu maps
  2019-12-18 14:23 ` [PATCH bpf-next 1/3] bpf: " Paul Chaignon
  2019-12-18 18:00   ` Alexei Starovoitov
@ 2019-12-18 19:10   ` Yonghong Song
  2019-12-19  0:49     ` Andrii Nakryiko
  1 sibling, 1 reply; 8+ messages in thread
From: Yonghong Song @ 2019-12-18 19:10 UTC (permalink / raw)
  To: Paul Chaignon, bpf
  Cc: paul.chaignon, netdev, Alexei Starovoitov, Daniel Borkmann,
	Martin Lau, Song Liu, Andrii Nakryiko



On 12/18/19 6:23 AM, Paul Chaignon wrote:
> Currently, userspace programs have to update the values of all CPUs at
> once when updating per-cpu maps.  This limitation prevents the update of
> a single CPU's value without the risk of missing concurrent updates on
> other CPU's values.
> 
> This patch allows userspace to update the value of a specific CPU in
> per-cpu maps.  The CPU whose value should be updated is encoded in the
> 32 upper-bits of the flags argument, as follows.  The new BPF_CPU flag
> can be combined with existing flags.
> 
>    bpf_map_update_elem(..., cpuid << 32 | BPF_CPU)

Some additional comments beyond Alexei's one.

> 
> Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
> ---
>   include/uapi/linux/bpf.h       |  4 +++
>   kernel/bpf/arraymap.c          | 19 ++++++++-----
>   kernel/bpf/hashtab.c           | 49 ++++++++++++++++++++--------------
>   kernel/bpf/local_storage.c     | 16 +++++++----
>   kernel/bpf/syscall.c           | 17 +++++++++---
>   tools/include/uapi/linux/bpf.h |  4 +++
>   6 files changed, 74 insertions(+), 35 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index dbbcf0b02970..2efb17d2c77a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -316,6 +316,10 @@ enum bpf_attach_type {
>   #define BPF_NOEXIST	1 /* create new element if it didn't exist */
>   #define BPF_EXIST	2 /* update existing element */
>   #define BPF_F_LOCK	4 /* spin_lock-ed map_lookup/map_update */
> +#define BPF_CPU		8 /* single-cpu update for per-cpu maps */
> +
> +/* CPU mask for single-cpu updates */
> +#define BPF_CPU_MASK	0xFFFFFFFF00000000ULL
BPF_F_CPU_MASK?

>   
>   /* flags for BPF_MAP_CREATE command */
>   #define BPF_F_NO_PREALLOC	(1U << 0)
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index f0d19bbb9211..a96e94696819 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -302,7 +302,8 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
>   	u32 index = *(u32 *)key;
>   	char *val;
>   
> -	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
> +	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_F_LOCK &
> +				  ~BPF_CPU) > BPF_EXIST))

Maybe create a macro ARRAY_UPDATE_FLAG_MASK similar to existing
ARRAY_CREATE_FLAG_MASK? This will make a little easier to follow,
esp. we got three individual flags here.
There are possibly some other places as well below can be done
in a similar way.

>   		/* unknown flags */
>   		return -EINVAL;
>   
> @@ -341,7 +342,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>   	int cpu, off = 0;
>   	u32 size;
>   
> -	if (unlikely(map_flags > BPF_EXIST))
> +	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_CPU) > BPF_EXIST))

~(BPF_F_CPU_MASK | BPF_F_CPU) or create a macro for like 
ARRAY_UPDATE_CPU_MASK for (BPF_F_CPU_MASK | BPF_F_CPU)?

>   		/* unknown flags */
>   		return -EINVAL;
>   
> @@ -349,7 +350,7 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>   		/* all elements were pre-allocated, cannot insert a new one */
>   		return -E2BIG;
>   
> -	if (unlikely(map_flags == BPF_NOEXIST))
> +	if (unlikely(map_flags & BPF_NOEXIST))
>   		/* all elements already exist */
>   		return -EEXIST;
>   
> @@ -362,9 +363,15 @@ int bpf_percpu_array_update(struct bpf_map *map, void *key, void *value,
>   	size = round_up(map->value_size, 8);
>   	rcu_read_lock();
>   	pptr = array->pptrs[index & array->index_mask];
> -	for_each_possible_cpu(cpu) {
> -		bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value + off, size);
> -		off += size;
> +	if (map_flags & BPF_CPU) {
> +		bpf_long_memcpy(per_cpu_ptr(pptr, map_flags >> 32), value,
> +				size);
> +	} else {
> +		for_each_possible_cpu(cpu) {
> +			bpf_long_memcpy(per_cpu_ptr(pptr, cpu), value + off,
> +					size);
> +			off += size;
> +		}
>   	}
>   	rcu_read_unlock();
>   	return 0;
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 22066a62c8c9..be45c7c4509f 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -695,12 +695,12 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
>   }
>   
>   static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
> -			    void *value, bool onallcpus)
> +			    void *value, int cpuid)
>   {
> -	if (!onallcpus) {
> +	if (cpuid == -1) {

Magic number -1 and -2 should be macros?

>   		/* copy true value_size bytes */
>   		memcpy(this_cpu_ptr(pptr), value, htab->map.value_size);
> -	} else {
> +	} else if (cpuid == -2) {
>   		u32 size = round_up(htab->map.value_size, 8);
>   		int off = 0, cpu;
>   
> @@ -709,6 +709,10 @@ static void pcpu_copy_value(struct bpf_htab *htab, void __percpu *pptr,
[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Single-cpu updates for per-cpu maps
  2019-12-18 19:10   ` Yonghong Song
@ 2019-12-19  0:49     ` Andrii Nakryiko
  0 siblings, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2019-12-19  0:49 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Paul Chaignon, bpf, paul.chaignon, netdev, Alexei Starovoitov,
	Daniel Borkmann, Martin Lau, Song Liu, Andrii Nakryiko

On Wed, Dec 18, 2019 at 11:11 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 12/18/19 6:23 AM, Paul Chaignon wrote:
> > Currently, userspace programs have to update the values of all CPUs at
> > once when updating per-cpu maps.  This limitation prevents the update of
> > a single CPU's value without the risk of missing concurrent updates on
> > other CPU's values.
> >
> > This patch allows userspace to update the value of a specific CPU in
> > per-cpu maps.  The CPU whose value should be updated is encoded in the
> > 32 upper-bits of the flags argument, as follows.  The new BPF_CPU flag
> > can be combined with existing flags.
> >
> >    bpf_map_update_elem(..., cpuid << 32 | BPF_CPU)
>
> Some additional comments beyond Alexei's one.
>
> >
> > Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
> > ---
> >   include/uapi/linux/bpf.h       |  4 +++
> >   kernel/bpf/arraymap.c          | 19 ++++++++-----
> >   kernel/bpf/hashtab.c           | 49 ++++++++++++++++++++--------------
> >   kernel/bpf/local_storage.c     | 16 +++++++----
> >   kernel/bpf/syscall.c           | 17 +++++++++---
> >   tools/include/uapi/linux/bpf.h |  4 +++
> >   6 files changed, 74 insertions(+), 35 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index dbbcf0b02970..2efb17d2c77a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -316,6 +316,10 @@ enum bpf_attach_type {
> >   #define BPF_NOEXIST 1 /* create new element if it didn't exist */
> >   #define BPF_EXIST   2 /* update existing element */
> >   #define BPF_F_LOCK  4 /* spin_lock-ed map_lookup/map_update */
> > +#define BPF_CPU              8 /* single-cpu update for per-cpu maps */
> > +
> > +/* CPU mask for single-cpu updates */
> > +#define BPF_CPU_MASK 0xFFFFFFFF00000000ULL
> BPF_F_CPU_MASK?

Maybe even define this as a function-like macro:

#define BPF_F_CPU_NR(cpu) ((cpu) << 32)

so that it can be easily combined with other flags in code like so:

bpf_map_update_element(...., BPF_F_LOCK | BPF_F_CPU_NR(10))

BPF_F_CPU_NR can automatically set BPF_F_CPU flag as well, btw.

[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: Single-cpu updates for per-cpu maps
  2019-12-18 18:00   ` Alexei Starovoitov
@ 2019-12-19  9:14     ` Paul Chaignon
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Chaignon @ 2019-12-19  9:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, paul.chaignon, netdev, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko

Thanks everyone for the reviews and suggestions!

On Wed, Dec 18, 2019 at 10:00:44AM -0800, Alexei Starovoitov wrote:
> On Wed, Dec 18, 2019 at 03:23:04PM +0100, Paul Chaignon wrote:
> > Currently, userspace programs have to update the values of all CPUs at
> > once when updating per-cpu maps.  This limitation prevents the update of
> > a single CPU's value without the risk of missing concurrent updates on
> > other CPU's values.
> > 
> > This patch allows userspace to update the value of a specific CPU in
> > per-cpu maps.  The CPU whose value should be updated is encoded in the
> > 32 upper-bits of the flags argument, as follows.  The new BPF_CPU flag
> > can be combined with existing flags.
> 
> In general makes sense. Could you elaborate more on concrete issue?

Sure.  I have a BPF program that matches incoming packets against
packet-filtering rules, with a NIC that steers some flows (correspond to
different tenants) to specific CPUs.  Rules are stored in a per-cpu
hashmap with a counter (counting matching packets) and an off/on switch.
To enable a rule on a new CPU, I lookup the value for that rule, switch
the on/off variable in the value corresponding to the given CPU and
update the map.  However, the counters corresponding to other CPUs for
that same rule might have been updated between the lookup and the
update.

Other BPF users have requested the same feature before on the bcc
repository [1].  They probably have different (tracing-related?) use
cases.

1 - https://github.com/iovisor/bcc/issues/1886

> 
> >   bpf_map_update_elem(..., cpuid << 32 | BPF_CPU)
> > 
> > Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
> > ---
> >  include/uapi/linux/bpf.h       |  4 +++
> >  kernel/bpf/arraymap.c          | 19 ++++++++-----
> >  kernel/bpf/hashtab.c           | 49 ++++++++++++++++++++--------------
> >  kernel/bpf/local_storage.c     | 16 +++++++----
> >  kernel/bpf/syscall.c           | 17 +++++++++---
> >  tools/include/uapi/linux/bpf.h |  4 +++
> >  6 files changed, 74 insertions(+), 35 deletions(-)
> > 
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index dbbcf0b02970..2efb17d2c77a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -316,6 +316,10 @@ enum bpf_attach_type {
> >  #define BPF_NOEXIST	1 /* create new element if it didn't exist */
> >  #define BPF_EXIST	2 /* update existing element */
> >  #define BPF_F_LOCK	4 /* spin_lock-ed map_lookup/map_update */
> > +#define BPF_CPU		8 /* single-cpu update for per-cpu maps */
> 
> BPF_F_CPU would be more consistent with the rest of flags.
> 
> Can BPF_F_CURRENT_CPU be supported as well?

You mean to update the value corresponding to the CPU on which the
userspace program is running?  BPF_F_CURRENT_CPU is a mask on the lower 
32 bits so it would clash with existing flags, but there's probably
another way to implement the same.  Not sure I see the use case though;
userspace programs can easily update the value for the current CPU with
BPF_F_CPU.

> 
> And for consistency support this flag in map_lookup_elem too?

Sure, I'll add it to the v2.

> 
> > +
> > +/* CPU mask for single-cpu updates */
> > +#define BPF_CPU_MASK	0xFFFFFFFF00000000ULL
> 
> what is the reason to expose this in uapi?

No reason; that's a mistake.

> 
> >  /* flags for BPF_MAP_CREATE command */
> >  #define BPF_F_NO_PREALLOC	(1U << 0)
> > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> > index f0d19bbb9211..a96e94696819 100644
> > --- a/kernel/bpf/arraymap.c
> > +++ b/kernel/bpf/arraymap.c
> > @@ -302,7 +302,8 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
> >  	u32 index = *(u32 *)key;
> >  	char *val;
> >  
> > -	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST))
> > +	if (unlikely((map_flags & ~BPF_CPU_MASK & ~BPF_F_LOCK &
> > +				  ~BPF_CPU) > BPF_EXIST))
> 
> that reads odd.
> More traditional would be ~ (A | B | C)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-19  9:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18 14:22 [PATCH bpf-next 0/3] Single-cpu updates for per-cpu maps Paul Chaignon
2019-12-18 14:23 ` [PATCH bpf-next 1/3] bpf: " Paul Chaignon
2019-12-18 18:00   ` Alexei Starovoitov
2019-12-19  9:14     ` Paul Chaignon
2019-12-18 19:10   ` Yonghong Song
2019-12-19  0:49     ` Andrii Nakryiko
2019-12-18 14:23 ` [PATCH bpf-next 2/3] selftests/bpf: Tests for single-cpu updates of " Paul Chaignon
2019-12-18 14:23 ` [PATCH bpf-next 3/3] bpftool: Support single-cpu updates for " Paul Chaignon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).