All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] bpf: support percpu ARRAY map
@ 2016-01-11 15:56 Ming Lei
  2016-01-11 15:56 ` [PATCH 1/9] bpf: prepare for moving map common stuff into one place Ming Lei
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:56 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau

Hi,

In case of ARRAY map, the index of the array is used as key
of the map, then inevitably the mapped element/value can be
accessed from more than one CPU concurrently, so expensive
atomic operations are often required in eBPF prog. And we can
see these usages in tracex3, sockex1 and sockex3 in sample/bpf/
of kernel tree.

This patchset trys to introduce percpu ARRAY map to address
the issue.

The 1st two patches prepares for supporting percpu map, and
introduces one file to hold the map common functions.

The following 3 patches introdues percpu version of update/
lookup element in bpf_map_ops, bpf helpers and syscall, so
that percpu value can be retrieved/updated from eBPF prog
and syscall. 

The 6th patch implements percpu array map.

The last 3 patches are changes in samples/bpf, and implements
test for perpcu array and converts to percpu array in sockex1
exmaple.

 include/linux/bpf.h        |  10 ++++
 include/uapi/linux/bpf.h   |  10 ++++
 kernel/bpf/Makefile        |   2 +-
 kernel/bpf/arraymap.c      | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
 kernel/bpf/bpf_map.h       |  15 ++++++
 kernel/bpf/core.c          |   2 +
 kernel/bpf/hashtab.c       |   4 ++
 kernel/bpf/helpers.c       |  53 +++++++++++++++++++++
 kernel/bpf/map.c           |  43 +++++++++++++++++
 kernel/bpf/syscall.c       |  48 +++++++++++++++----
 net/core/filter.c          |   4 ++
 samples/bpf/bpf_helpers.h  |   5 ++
 samples/bpf/libbpf.c       |  42 +++++++++++++++++
 samples/bpf/libbpf.h       |   5 ++
 samples/bpf/sockex1_kern.c |   7 +--
 samples/bpf/sockex1_user.c |  20 ++++++--
 samples/bpf/test_maps.c    | 110 +++++++++++++++++++++++++++++++++++++++++++
 17 files changed, 493 insertions(+), 40 deletions(-)



thanks,
Ming

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/9] bpf: prepare for moving map common stuff into one place
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
@ 2016-01-11 15:56 ` Ming Lei
  2016-01-11 18:24   ` kbuild test robot
  2016-01-11 15:56 ` [PATCH 2/9] bpf: array map: use pre-defined nop map function Ming Lei
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:56 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

This patch introduces some nop functions of map callback
for use in all map implementation.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 kernel/bpf/Makefile  |  2 +-
 kernel/bpf/bpf_map.h |  9 +++++++++
 kernel/bpf/map.c     | 26 ++++++++++++++++++++++++++
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 kernel/bpf/bpf_map.h
 create mode 100644 kernel/bpf/map.c

diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 13272582..191d54b 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -1,4 +1,4 @@
 obj-y := core.o
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o
-obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o
+obj-$(CONFIG_BPF_SYSCALL) += map.o hashtab.o arraymap.o
diff --git a/kernel/bpf/bpf_map.h b/kernel/bpf/bpf_map.h
new file mode 100644
index 0000000..7e596c1
--- /dev/null
+++ b/kernel/bpf/bpf_map.h
@@ -0,0 +1,9 @@
+#ifndef INT_BPF_MAP_COMMON_H
+#define INT_BPF_MAP_COMMON_H
+
+#include <linux/bpf.h>
+
+extern void *map_lookup_elem_nop(struct bpf_map *map, void *key);
+extern int map_delete_elem_nop(struct bpf_map *map, void *key);
+
+#endif
diff --git a/kernel/bpf/map.c b/kernel/bpf/map.c
new file mode 100644
index 0000000..bf113fb
--- /dev/null
+++ b/kernel/bpf/map.c
@@ -0,0 +1,26 @@
+/*
+ * bpf map common stuff
+ *
+ *	Copyright (c) 2016 Ming Lei
+ *
+ * Authors:
+ *	Ming Lei <tom.leiming@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/bpf.h>
+
+void *map_lookup_elem_nop(struct bpf_map *map, void *key)
+{
+	return NULL;
+}
+
+int map_delete_elem_nop(struct bpf_map *map, void *key)
+{
+	return -EINVAL;
+}
+
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/9] bpf: array map: use pre-defined nop map function
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
  2016-01-11 15:56 ` [PATCH 1/9] bpf: prepare for moving map common stuff into one place Ming Lei
@ 2016-01-11 15:56 ` Ming Lei
  2016-01-11 19:08   ` Alexei Starovoitov
  2016-01-11 15:56 ` [PATCH 3/9] bpf: introduce percpu verion of lookup/update in bpf_map_ops Ming Lei
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:56 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

So that we can remove the per-map nop map fucntions.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 kernel/bpf/arraymap.c | 19 +++++--------------
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index b0799bc..9ad9031 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -17,6 +17,8 @@
 #include <linux/filter.h>
 #include <linux/perf_event.h>
 
+#include "bpf_map.h"
+
 /* Called from syscall */
 static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 {
@@ -115,12 +117,6 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 	return 0;
 }
 
-/* Called from syscall or from eBPF program */
-static int array_map_delete_elem(struct bpf_map *map, void *key)
-{
-	return -EINVAL;
-}
-
 /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
 static void array_map_free(struct bpf_map *map)
 {
@@ -142,7 +138,7 @@ static const struct bpf_map_ops array_ops = {
 	.map_get_next_key = array_map_get_next_key,
 	.map_lookup_elem = array_map_lookup_elem,
 	.map_update_elem = array_map_update_elem,
-	.map_delete_elem = array_map_delete_elem,
+	.map_delete_elem = map_delete_elem_nop,
 };
 
 static struct bpf_map_type_list array_type __read_mostly = {
@@ -178,11 +174,6 @@ static void fd_array_map_free(struct bpf_map *map)
 	kvfree(array);
 }
 
-static void *fd_array_map_lookup_elem(struct bpf_map *map, void *key)
-{
-	return NULL;
-}
-
 /* only called from syscall */
 static int fd_array_map_update_elem(struct bpf_map *map, void *key,
 				    void *value, u64 map_flags)
@@ -262,7 +253,7 @@ static const struct bpf_map_ops prog_array_ops = {
 	.map_alloc = fd_array_map_alloc,
 	.map_free = fd_array_map_free,
 	.map_get_next_key = array_map_get_next_key,
-	.map_lookup_elem = fd_array_map_lookup_elem,
+	.map_lookup_elem = map_lookup_elem_nop,
 	.map_update_elem = fd_array_map_update_elem,
 	.map_delete_elem = fd_array_map_delete_elem,
 	.map_fd_get_ptr = prog_fd_array_get_ptr,
@@ -328,7 +319,7 @@ static const struct bpf_map_ops perf_event_array_ops = {
 	.map_alloc = fd_array_map_alloc,
 	.map_free = perf_event_array_map_free,
 	.map_get_next_key = array_map_get_next_key,
-	.map_lookup_elem = fd_array_map_lookup_elem,
+	.map_lookup_elem = map_lookup_elem_nop,
 	.map_update_elem = fd_array_map_update_elem,
 	.map_delete_elem = fd_array_map_delete_elem,
 	.map_fd_get_ptr = perf_event_fd_array_get_ptr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/9] bpf: introduce percpu verion of lookup/update in bpf_map_ops
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
  2016-01-11 15:56 ` [PATCH 1/9] bpf: prepare for moving map common stuff into one place Ming Lei
  2016-01-11 15:56 ` [PATCH 2/9] bpf: array map: use pre-defined nop map function Ming Lei
@ 2016-01-11 15:56 ` Ming Lei
  2016-01-11 15:56 ` [PATCH 4/9] bpf: add percpu version of lookup/update element helpers Ming Lei
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:56 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

This patch is preparing for supporting percpu map, which
will be done in the following patches.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bpf.h   |  5 +++++
 kernel/bpf/arraymap.c |  6 ++++++
 kernel/bpf/bpf_map.h  |  4 ++++
 kernel/bpf/hashtab.c  |  4 ++++
 kernel/bpf/map.c      | 11 +++++++++++
 5 files changed, 30 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 83d1926..7fa339f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -25,6 +25,11 @@ struct bpf_map_ops {
 	int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags);
 	int (*map_delete_elem)(struct bpf_map *map, void *key);
 
+	/* funcs callable from userspace and from eBPF programs */
+	void *(*map_lookup_elem_percpu)(struct bpf_map *map, void *key, u32 cpu);
+	int (*map_update_elem_percpu)(struct bpf_map *map, void *key,
+			void *value, u64 flags, u32 cpu);
+
 	/* funcs called by prog_array and perf_event_array map */
 	void *(*map_fd_get_ptr) (struct bpf_map *map, int fd);
 	void (*map_fd_put_ptr) (void *ptr);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 9ad9031..20b9f2c 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -139,6 +139,8 @@ static const struct bpf_map_ops array_ops = {
 	.map_lookup_elem = array_map_lookup_elem,
 	.map_update_elem = array_map_update_elem,
 	.map_delete_elem = map_delete_elem_nop,
+	.map_lookup_elem_percpu = map_lookup_elem_percpu_nop,
+	.map_update_elem_percpu = map_update_elem_percpu_nop,
 };
 
 static struct bpf_map_type_list array_type __read_mostly = {
@@ -258,6 +260,8 @@ static const struct bpf_map_ops prog_array_ops = {
 	.map_delete_elem = fd_array_map_delete_elem,
 	.map_fd_get_ptr = prog_fd_array_get_ptr,
 	.map_fd_put_ptr = prog_fd_array_put_ptr,
+	.map_lookup_elem_percpu = map_lookup_elem_percpu_nop,
+	.map_update_elem_percpu = map_update_elem_percpu_nop,
 };
 
 static struct bpf_map_type_list prog_array_type __read_mostly = {
@@ -324,6 +328,8 @@ static const struct bpf_map_ops perf_event_array_ops = {
 	.map_delete_elem = fd_array_map_delete_elem,
 	.map_fd_get_ptr = perf_event_fd_array_get_ptr,
 	.map_fd_put_ptr = perf_event_fd_array_put_ptr,
+	.map_lookup_elem_percpu = map_lookup_elem_percpu_nop,
+	.map_update_elem_percpu = map_update_elem_percpu_nop,
 };
 
 static struct bpf_map_type_list perf_event_array_type __read_mostly = {
diff --git a/kernel/bpf/bpf_map.h b/kernel/bpf/bpf_map.h
index 7e596c1..adab4e6 100644
--- a/kernel/bpf/bpf_map.h
+++ b/kernel/bpf/bpf_map.h
@@ -5,5 +5,9 @@
 
 extern void *map_lookup_elem_nop(struct bpf_map *map, void *key);
 extern int map_delete_elem_nop(struct bpf_map *map, void *key);
+extern void *map_lookup_elem_percpu_nop(struct bpf_map *map, void *key,
+		u32 cpu);
+extern int map_update_elem_percpu_nop(struct bpf_map *map, void *key,
+		void *value, u64 flags, u32 cpu);
 
 #endif
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index c5b30fd..893e2e4 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -14,6 +14,8 @@
 #include <linux/filter.h>
 #include <linux/vmalloc.h>
 
+#include "bpf_map.h"
+
 struct bucket {
 	struct hlist_head head;
 	raw_spinlock_t lock;
@@ -384,6 +386,8 @@ static const struct bpf_map_ops htab_ops = {
 	.map_lookup_elem = htab_map_lookup_elem,
 	.map_update_elem = htab_map_update_elem,
 	.map_delete_elem = htab_map_delete_elem,
+	.map_lookup_elem_percpu = map_lookup_elem_percpu_nop,
+	.map_update_elem_percpu = map_update_elem_percpu_nop,
 };
 
 static struct bpf_map_type_list htab_type __read_mostly = {
diff --git a/kernel/bpf/map.c b/kernel/bpf/map.c
index bf113fb..b94458a 100644
--- a/kernel/bpf/map.c
+++ b/kernel/bpf/map.c
@@ -24,3 +24,14 @@ int map_delete_elem_nop(struct bpf_map *map, void *key)
 	return -EINVAL;
 }
 
+void *map_lookup_elem_percpu_nop(struct bpf_map *map, void *key, u32 cpu)
+{
+	return NULL;
+}
+
+int map_update_elem_percpu_nop(struct bpf_map *map, void *key, void *value,
+		u64 flags, u32 cpu)
+{
+	return -EINVAL;
+}
+
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/9] bpf: add percpu version of lookup/update element helpers
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
                   ` (2 preceding siblings ...)
  2016-01-11 15:56 ` [PATCH 3/9] bpf: introduce percpu verion of lookup/update in bpf_map_ops Ming Lei
@ 2016-01-11 15:56 ` Ming Lei
  2016-01-11 15:56 ` [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem Ming Lei
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:56 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

Prepare for supporting percpu map.

These introduced two callback & helpers can be used to
retrieve/update the value from one specific CPU for percpu map.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bpf.h      |  3 +++
 include/uapi/linux/bpf.h |  6 ++++++
 kernel/bpf/core.c        |  2 ++
 kernel/bpf/helpers.c     | 53 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/core/filter.c        |  4 ++++
 5 files changed, 68 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7fa339f..75d75d8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -219,6 +219,9 @@ extern const struct bpf_func_proto bpf_get_current_comm_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
 
+extern const struct bpf_func_proto bpf_map_lookup_elem_percpu_proto;
+extern const struct bpf_func_proto bpf_map_update_elem_percpu_proto;
+
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8bed7f1..2658917 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -270,6 +270,12 @@ enum bpf_func_id {
 	 */
 	BPF_FUNC_perf_event_output,
 	BPF_FUNC_skb_load_bytes,
+
+	/* void *map_lookup_elem(&map, &key, cpu) */
+	BPF_FUNC_map_lookup_elem_percpu,
+	/* int map_update_elem(&map, &key, &value, flags, cpu) */
+	BPF_FUNC_map_update_elem_percpu,
+
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 972d9a8..71a09fa 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -769,6 +769,8 @@ const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
 	return NULL;
 }
+const struct bpf_func_proto bpf_map_lookup_elem_percpu_proto __weak;
+const struct bpf_func_proto bpf_map_update_elem_percpu_proto __weak;
 
 /* Always built-in helper functions. */
 const struct bpf_func_proto bpf_tail_call_proto = {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4504ca6..d05164a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -54,6 +54,36 @@ const struct bpf_func_proto bpf_map_lookup_elem_proto = {
 	.arg2_type	= ARG_PTR_TO_MAP_KEY,
 };
 
+static u64 bpf_map_lookup_elem_percpu(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	/* verifier checked that R1 contains a valid pointer to bpf_map
+	 * and R2 points to a program stack and map->key_size bytes were
+	 * initialized
+	 */
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
+	void *key = (void *) (unsigned long) r2;
+	u32 cpu = (u32)(unsigned long) r3;
+	void *value;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	value = map->ops->map_lookup_elem_percpu(map, key, cpu);
+
+	/* lookup() returns either pointer to element value or NULL
+	 * which is the meaning of PTR_TO_MAP_VALUE_OR_NULL type
+	 */
+	return (unsigned long) value;
+}
+
+const struct bpf_func_proto bpf_map_lookup_elem_percpu_proto = {
+	.func		= bpf_map_lookup_elem_percpu,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_MAP_KEY,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 static u64 bpf_map_update_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 {
 	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
@@ -75,6 +105,29 @@ const struct bpf_func_proto bpf_map_update_elem_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
+static u64 bpf_map_update_elem_percpu(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
+	void *key = (void *) (unsigned long) r2;
+	void *value = (void *) (unsigned long) r3;
+	u32 cpu = (u32)(unsigned long) r5;
+
+	WARN_ON_ONCE(!rcu_read_lock_held());
+
+	return map->ops->map_update_elem_percpu(map, key, value, r4, cpu);
+}
+
+const struct bpf_func_proto bpf_map_update_elem_percpu_proto = {
+	.func		= bpf_map_update_elem_percpu,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_MAP_KEY,
+	.arg3_type	= ARG_PTR_TO_MAP_VALUE,
+	.arg4_type	= ARG_ANYTHING,
+	.arg5_type	= ARG_ANYTHING,
+};
+
 static u64 bpf_map_delete_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 {
 	struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
diff --git a/net/core/filter.c b/net/core/filter.c
index 35e6fed..8c558fc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1743,8 +1743,12 @@ sk_filter_func_proto(enum bpf_func_id func_id)
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
 		return &bpf_map_lookup_elem_proto;
+	case BPF_FUNC_map_lookup_elem_percpu:
+		return &bpf_map_lookup_elem_percpu_proto;
 	case BPF_FUNC_map_update_elem:
 		return &bpf_map_update_elem_proto;
+	case BPF_FUNC_map_update_elem_percpu:
+		return &bpf_map_update_elem_percpu_proto;
 	case BPF_FUNC_map_delete_elem:
 		return &bpf_map_delete_elem_proto;
 	case BPF_FUNC_get_prandom_u32:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
                   ` (3 preceding siblings ...)
  2016-01-11 15:56 ` [PATCH 4/9] bpf: add percpu version of lookup/update element helpers Ming Lei
@ 2016-01-11 15:56 ` Ming Lei
  2016-01-11 19:02   ` Alexei Starovoitov
  2016-01-11 15:56 ` [PATCH 6/9] bpf: arraymap: introduce BPF_MAP_TYPE_ARRAY_PERCPU Ming Lei
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:56 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

Prepare for supporting percpu map in the following patch.

Now userspace can lookup/update mapped value in one specific
CPU in case of percpu map.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/uapi/linux/bpf.h |  3 +++
 kernel/bpf/syscall.c     | 48 ++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2658917..63b04c6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -73,6 +73,8 @@ enum bpf_cmd {
 	BPF_PROG_LOAD,
 	BPF_OBJ_PIN,
 	BPF_OBJ_GET,
+	BPF_MAP_LOOKUP_ELEM_PERCPU,
+	BPF_MAP_UPDATE_ELEM_PERCPU,
 };
 
 enum bpf_map_type {
@@ -114,6 +116,7 @@ union bpf_attr {
 			__aligned_u64 next_key;
 		};
 		__u64		flags;
+		__u32		cpu;
 	};
 
 	struct { /* anonymous struct used by BPF_PROG_LOAD command */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6373970..280c93b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -231,8 +231,9 @@ static void __user *u64_to_ptr(__u64 val)
 
 /* last field in 'union bpf_attr' used by this command */
 #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD value
+#define BPF_MAP_LOOKUP_ELEM_PERCPU_LAST_FIELD cpu
 
-static int map_lookup_elem(union bpf_attr *attr)
+static int map_lookup_elem(union bpf_attr *attr, bool percpu)
 {
 	void __user *ukey = u64_to_ptr(attr->key);
 	void __user *uvalue = u64_to_ptr(attr->value);
@@ -242,8 +243,14 @@ static int map_lookup_elem(union bpf_attr *attr)
 	struct fd f;
 	int err;
 
-	if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
-		return -EINVAL;
+	if (!percpu) {
+		if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
+			return -EINVAL;
+	} else {
+		if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM_PERCPU) ||
+				attr->cpu >= num_possible_cpus())
+			return -EINVAL;
+	}
 
 	f = fdget(ufd);
 	map = __bpf_map_get(f);
@@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
 		goto free_key;
 
 	rcu_read_lock();
-	ptr = map->ops->map_lookup_elem(map, key);
+	if (!percpu)
+		ptr = map->ops->map_lookup_elem(map, key);
+	else
+		ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
 	if (ptr)
 		memcpy(value, ptr, map->value_size);
 	rcu_read_unlock();
@@ -290,8 +300,9 @@ err_put:
 }
 
 #define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
+#define BPF_MAP_UPDATE_ELEM_PERCPU_LAST_FIELD cpu
 
-static int map_update_elem(union bpf_attr *attr)
+static int map_update_elem(union bpf_attr *attr, bool percpu)
 {
 	void __user *ukey = u64_to_ptr(attr->key);
 	void __user *uvalue = u64_to_ptr(attr->value);
@@ -301,8 +312,14 @@ static int map_update_elem(union bpf_attr *attr)
 	struct fd f;
 	int err;
 
-	if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM))
-		return -EINVAL;
+	if (!percpu) {
+		if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM))
+			return -EINVAL;
+	} else {
+		if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM_PERCPU) ||
+				attr->cpu >= num_possible_cpus())
+			return -EINVAL;
+	}
 
 	f = fdget(ufd);
 	map = __bpf_map_get(f);
@@ -331,7 +348,12 @@ static int map_update_elem(union bpf_attr *attr)
 	 * therefore all map accessors rely on this fact, so do the same here
 	 */
 	rcu_read_lock();
-	err = map->ops->map_update_elem(map, key, value, attr->flags);
+	if (!percpu)
+		err = map->ops->map_update_elem(map, key, value, attr->flags);
+	else
+		err = map->ops->map_update_elem_percpu(map, key, value,
+				attr->flags, attr->cpu);
+
 	rcu_read_unlock();
 
 free_value:
@@ -772,10 +794,16 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 		err = map_create(&attr);
 		break;
 	case BPF_MAP_LOOKUP_ELEM:
-		err = map_lookup_elem(&attr);
+		err = map_lookup_elem(&attr, false);
+		break;
+	case BPF_MAP_LOOKUP_ELEM_PERCPU:
+		err = map_lookup_elem(&attr, true);
 		break;
 	case BPF_MAP_UPDATE_ELEM:
-		err = map_update_elem(&attr);
+		err = map_update_elem(&attr, false);
+		break;
+	case BPF_MAP_UPDATE_ELEM_PERCPU:
+		err = map_update_elem(&attr, true);
 		break;
 	case BPF_MAP_DELETE_ELEM:
 		err = map_delete_elem(&attr);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/9] bpf: arraymap: introduce BPF_MAP_TYPE_ARRAY_PERCPU
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
                   ` (4 preceding siblings ...)
  2016-01-11 15:56 ` [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem Ming Lei
@ 2016-01-11 15:56 ` Ming Lei
  2016-01-11 19:14   ` Alexei Starovoitov
  2016-01-11 15:56 ` [PATCH 7/9] sample/bpf: introduces helpers for percpu array example Ming Lei
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:56 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

This patch introduces percpu array map so that expensive
atomic operations can be avoided in eBPF prog in case of
ARRAY map.

PERCPU MAP uses the percpu version of update/lookup element
helpers and callbacks to access element in the map, and
the previous update/lookup element helpers and callbacks
don't work at the same time.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 include/linux/bpf.h      |   2 +
 include/uapi/linux/bpf.h |   1 +
 kernel/bpf/arraymap.c    | 136 ++++++++++++++++++++++++++++++++++++++++++-----
 kernel/bpf/bpf_map.h     |   2 +
 kernel/bpf/map.c         |   6 +++
 5 files changed, 135 insertions(+), 12 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 75d75d8..909dc1e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -153,9 +153,11 @@ struct bpf_array {
 	 */
 	enum bpf_prog_type owner_prog_type;
 	bool owner_jited;
+	bool percpu;
 	union {
 		char value[0] __aligned(8);
 		void *ptrs[0] __aligned(8);
+		void __percpu *pptrs[0] __aligned(8);
 	};
 };
 #define MAX_TAIL_CALL_CNT 32
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 63b04c6..70968fd 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -83,6 +83,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_ARRAY,
 	BPF_MAP_TYPE_PROG_ARRAY,
 	BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+	BPF_MAP_TYPE_ARRAY_PERCPU,
 };
 
 enum bpf_prog_type {
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 20b9f2c..dbafa6a 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -19,11 +19,36 @@
 
 #include "bpf_map.h"
 
-/* Called from syscall */
-static struct bpf_map *array_map_alloc(union bpf_attr *attr)
+static void free_percpu_array(struct bpf_array *array)
+{
+	int i;
+
+	for (i = 0; i < array->map.max_entries; i++)
+		free_percpu(array->pptrs[i]);
+}
+
+static int alloc_percpu_array(struct bpf_array *array, int cnt, int elem_size)
+{
+	int i;
+
+	for (i = 0; i < cnt; i++) {
+		void __percpu *ptr = __alloc_percpu(elem_size, 8);
+
+		if (!ptr) {
+			free_percpu_array(array);
+			return -ENOMEM;
+		}
+		array->pptrs[i] = ptr;
+	}
+
+	array->percpu = true;
+	return 0;
+}
+
+static struct bpf_map *__array_map_alloc(union bpf_attr *attr, bool percpu)
 {
 	struct bpf_array *array;
-	u32 elem_size, array_size;
+	u32 elem_size, array_size, elem_alloc_size;
 
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 || attr->key_size != 4 ||
@@ -38,12 +63,22 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 
 	elem_size = round_up(attr->value_size, 8);
 
+	/*
+	 * In case of percpu-array, each element in the allocated array
+	 * points to one percpu element.
+	 */
+	if (percpu)
+		elem_alloc_size = sizeof(void *);
+	else
+		elem_alloc_size = elem_size;
+
 	/* check round_up into zero and u32 overflow */
-	if (elem_size == 0 ||
-	    attr->max_entries > (U32_MAX - PAGE_SIZE - sizeof(*array)) / elem_size)
+	if (elem_alloc_size == 0 ||
+	    attr->max_entries > (U32_MAX - PAGE_SIZE - sizeof(*array)) /
+	    elem_alloc_size)
 		return ERR_PTR(-ENOMEM);
 
-	array_size = sizeof(*array) + attr->max_entries * elem_size;
+	array_size = sizeof(*array) + attr->max_entries * elem_alloc_size;
 
 	/* allocate all map elements and zero-initialize them */
 	array = kzalloc(array_size, GFP_USER | __GFP_NOWARN);
@@ -53,16 +88,39 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 			return ERR_PTR(-ENOMEM);
 	}
 
+	if (percpu) {
+		if (alloc_percpu_array(array, attr->max_entries,
+				       attr->value_size)) {
+			kvfree(array);
+			return ERR_PTR(-ENOMEM);
+		}
+		array->map.pages = round_up(attr->max_entries *
+				attr->value_size * num_possible_cpus(),
+				PAGE_SIZE) >> PAGE_SHIFT;
+	}
+
 	/* copy mandatory map attributes */
 	array->map.key_size = attr->key_size;
 	array->map.value_size = attr->value_size;
 	array->map.max_entries = attr->max_entries;
-	array->map.pages = round_up(array_size, PAGE_SIZE) >> PAGE_SHIFT;
+	array->map.pages += round_up(array_size, PAGE_SIZE) >> PAGE_SHIFT;
 	array->elem_size = elem_size;
 
 	return &array->map;
 }
 
+/* Called from syscall */
+static struct bpf_map *array_map_alloc(union bpf_attr *attr)
+{
+	return __array_map_alloc(attr, false);
+}
+
+/* Called from syscall */
+static struct bpf_map *percpu_array_map_alloc(union bpf_attr *attr)
+{
+	return __array_map_alloc(attr, true);
+}
+
 /* Called from syscall or from eBPF program */
 static void *array_map_lookup_elem(struct bpf_map *map, void *key)
 {
@@ -75,6 +133,19 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
 	return array->value + array->elem_size * index;
 }
 
+/* Called from syscall or from eBPF program */
+static void *array_map_lookup_elem_percpu(struct bpf_map *map,
+		void *key, u32 cpu)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u32 index = *(u32 *)key;
+
+	if (index >= array->map.max_entries)
+		return NULL;
+
+	return per_cpu_ptr(array->pptrs[index], cpu);
+}
+
 /* Called from syscall */
 static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 {
@@ -95,11 +166,10 @@ static int array_map_get_next_key(struct bpf_map *map, void *key, void *next_key
 }
 
 /* Called from syscall or from eBPF program */
-static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
-				 u64 map_flags)
+static inline int __array_map_update_elem(struct bpf_array *array,
+					  u32 index, void *value,
+					  u64 map_flags, void *ptr)
 {
-	struct bpf_array *array = container_of(map, struct bpf_array, map);
-	u32 index = *(u32 *)key;
 
 	if (map_flags > BPF_EXIST)
 		/* unknown flags */
@@ -113,10 +183,32 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
 		/* all elements already exist */
 		return -EEXIST;
 
-	memcpy(array->value + array->elem_size * index, value, map->value_size);
+	memcpy(ptr, value, array->map.value_size);
 	return 0;
 }
 
+/* Called from syscall or from eBPF program */
+static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
+				 u64 map_flags)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u32 index = *(u32 *)key;
+	void *ptr = array->value + array->elem_size * index;
+
+	return __array_map_update_elem(array, index, value, map_flags, ptr);
+}
+
+/* Called from syscall or from eBPF program */
+static int array_map_update_elem_percpu(struct bpf_map *map, void *key,
+					void *value, u64 map_flags, u32 cpu)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u32 index = *(u32 *)key;
+	void *ptr = per_cpu_ptr(array->pptrs[index], cpu);
+
+	return __array_map_update_elem(array, index, value, map_flags, ptr);
+}
+
 /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
 static void array_map_free(struct bpf_map *map)
 {
@@ -129,6 +221,9 @@ static void array_map_free(struct bpf_map *map)
 	 */
 	synchronize_rcu();
 
+	if (array->percpu)
+		free_percpu_array(array);
+
 	kvfree(array);
 }
 
@@ -148,9 +243,26 @@ static struct bpf_map_type_list array_type __read_mostly = {
 	.type = BPF_MAP_TYPE_ARRAY,
 };
 
+static const struct bpf_map_ops percpu_array_ops = {
+	.map_alloc = percpu_array_map_alloc,
+	.map_free = array_map_free,
+	.map_get_next_key = array_map_get_next_key,
+	.map_lookup_elem = map_lookup_elem_nop,
+	.map_update_elem = map_update_elem_nop,
+	.map_delete_elem = map_delete_elem_nop,
+	.map_lookup_elem_percpu = array_map_lookup_elem_percpu,
+	.map_update_elem_percpu = array_map_update_elem_percpu,
+};
+
+static struct bpf_map_type_list percpu_array_type __read_mostly = {
+	.ops = &percpu_array_ops,
+	.type = BPF_MAP_TYPE_ARRAY_PERCPU,
+};
+
 static int __init register_array_map(void)
 {
 	bpf_register_map_type(&array_type);
+	bpf_register_map_type(&percpu_array_type);
 	return 0;
 }
 late_initcall(register_array_map);
diff --git a/kernel/bpf/bpf_map.h b/kernel/bpf/bpf_map.h
index adab4e6..8957a60 100644
--- a/kernel/bpf/bpf_map.h
+++ b/kernel/bpf/bpf_map.h
@@ -5,6 +5,8 @@
 
 extern void *map_lookup_elem_nop(struct bpf_map *map, void *key);
 extern int map_delete_elem_nop(struct bpf_map *map, void *key);
+extern int map_update_elem_nop(struct bpf_map *map, void *key,
+		void *value, u64 flags);
 extern void *map_lookup_elem_percpu_nop(struct bpf_map *map, void *key,
 		u32 cpu);
 extern int map_update_elem_percpu_nop(struct bpf_map *map, void *key,
diff --git a/kernel/bpf/map.c b/kernel/bpf/map.c
index b94458a..48252a6 100644
--- a/kernel/bpf/map.c
+++ b/kernel/bpf/map.c
@@ -24,6 +24,12 @@ int map_delete_elem_nop(struct bpf_map *map, void *key)
 	return -EINVAL;
 }
 
+int map_update_elem_nop(struct bpf_map *map, void *key, void *value, u64 flags)
+{
+	return -EINVAL;
+}
+
+
 void *map_lookup_elem_percpu_nop(struct bpf_map *map, void *key, u32 cpu)
 {
 	return NULL;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/9] sample/bpf: introduces helpers for percpu array example
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
                   ` (5 preceding siblings ...)
  2016-01-11 15:56 ` [PATCH 6/9] bpf: arraymap: introduce BPF_MAP_TYPE_ARRAY_PERCPU Ming Lei
@ 2016-01-11 15:56 ` Ming Lei
  2016-01-11 15:57 ` [PATCH 8/9] sample/bpf: sockex1: user percpu array map Ming Lei
  2016-01-11 15:57 ` [PATCH 9/9] samples/bpf: test " Ming Lei
  8 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:56 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

This patches introduces helpers for using percpu array.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 samples/bpf/bpf_helpers.h |  5 +++++
 samples/bpf/libbpf.c      | 42 ++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/libbpf.h      |  5 +++++
 3 files changed, 52 insertions(+)

diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 7ad19e1..9996f38 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -39,6 +39,11 @@ static int (*bpf_redirect)(int ifindex, int flags) =
 	(void *) BPF_FUNC_redirect;
 static int (*bpf_perf_event_output)(void *ctx, void *map, int index, void *data, int size) =
 	(void *) BPF_FUNC_perf_event_output;
+static void *(*bpf_map_lookup_elem_percpu)(void *map, void *key, unsigned cpu) =
+	(void *) BPF_FUNC_map_lookup_elem_percpu;
+static int (*bpf_map_update_elem_percpu)(void *map, void *key, void *value,
+				  unsigned long long flags, unsigned cpu) =
+	(void *) BPF_FUNC_map_update_elem_percpu;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 65a8d48..2a240c6 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -43,6 +43,20 @@ int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags)
 	return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
 }
 
+int bpf_update_elem_percpu(int fd, void *key, void *value,
+			   unsigned long long flags, unsigned cpu)
+{
+	union bpf_attr attr = {
+		.map_fd = fd,
+		.key = ptr_to_u64(key),
+		.value = ptr_to_u64(value),
+		.flags = flags,
+		.cpu = cpu
+	};
+
+	return syscall(__NR_bpf, BPF_MAP_UPDATE_ELEM_PERCPU, &attr, sizeof(attr));
+}
+
 int bpf_lookup_elem(int fd, void *key, void *value)
 {
 	union bpf_attr attr = {
@@ -54,6 +68,34 @@ int bpf_lookup_elem(int fd, void *key, void *value)
 	return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
 }
 
+int bpf_lookup_elem_percpu(int fd, void *key, void *value, unsigned cpu)
+{
+	union bpf_attr attr = {
+		.map_fd = fd,
+		.key = ptr_to_u64(key),
+		.value = ptr_to_u64(value),
+		.cpu = cpu,
+	};
+
+	return syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM_PERCPU, &attr, sizeof(attr));
+}
+
+int bpf_lookup_elem_allcpu(int fd, void *key, void *value_percpu, void *value,
+			   int (*handle_one_cpu)(unsigned, void *, void *))
+{
+	unsigned cpu;
+	int ret;
+
+	for (cpu = 0; cpu < sysconf(_SC_NPROCESSORS_CONF); cpu++) {
+		ret = bpf_lookup_elem_percpu(fd, key, value_percpu, cpu);
+		if (!ret)
+			ret = handle_one_cpu(cpu, value_percpu, value);
+		if  (ret)
+			return ret;
+	}
+	return 0;
+}
+
 int bpf_delete_elem(int fd, void *key)
 {
 	union bpf_attr attr = {
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 014aacf..e94484d 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -8,6 +8,11 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
 		   int max_entries);
 int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags);
 int bpf_lookup_elem(int fd, void *key, void *value);
+int bpf_update_elem_percpu(int fd, void *key, void *value,
+			   unsigned long long flags, unsigned cpu);
+int bpf_lookup_elem_percpu(int fd, void *key, void *value, unsigned cpu);
+int bpf_lookup_elem_allcpu(int fd, void *key, void *value_percpu, void *value,
+			   int (*handle_one_cpu)(unsigned, void *, void *));
 int bpf_delete_elem(int fd, void *key);
 int bpf_get_next_key(int fd, void *key, void *next_key);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 8/9] sample/bpf: sockex1: user percpu array map
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
                   ` (6 preceding siblings ...)
  2016-01-11 15:56 ` [PATCH 7/9] sample/bpf: introduces helpers for percpu array example Ming Lei
@ 2016-01-11 15:57 ` Ming Lei
  2016-01-11 15:57 ` [PATCH 9/9] samples/bpf: test " Ming Lei
  8 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:57 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

It is demonstrated the expensive atomic operations can
be removed in eBPF prog.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 samples/bpf/sockex1_kern.c |  7 ++++---
 samples/bpf/sockex1_user.c | 20 ++++++++++++++++----
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/sockex1_kern.c b/samples/bpf/sockex1_kern.c
index ed18e9a..400518a 100644
--- a/samples/bpf/sockex1_kern.c
+++ b/samples/bpf/sockex1_kern.c
@@ -5,7 +5,7 @@
 #include "bpf_helpers.h"
 
 struct bpf_map_def SEC("maps") my_map = {
-	.type = BPF_MAP_TYPE_ARRAY,
+	.type = BPF_MAP_TYPE_ARRAY_PERCPU,
 	.key_size = sizeof(u32),
 	.value_size = sizeof(long),
 	.max_entries = 256,
@@ -16,13 +16,14 @@ int bpf_prog1(struct __sk_buff *skb)
 {
 	int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
 	long *value;
+	unsigned cpu = bpf_get_smp_processor_id();
 
 	if (skb->pkt_type != PACKET_OUTGOING)
 		return 0;
 
-	value = bpf_map_lookup_elem(&my_map, &index);
+	value = bpf_map_lookup_elem_percpu(&my_map, &index, cpu);
 	if (value)
-		__sync_fetch_and_add(value, skb->len);
+		*value += skb->len;
 
 	return 0;
 }
diff --git a/samples/bpf/sockex1_user.c b/samples/bpf/sockex1_user.c
index 678ce46..8572e81 100644
--- a/samples/bpf/sockex1_user.c
+++ b/samples/bpf/sockex1_user.c
@@ -6,6 +6,14 @@
 #include <unistd.h>
 #include <arpa/inet.h>
 
+static int handle_one_cpu(unsigned cpu, void *val_cpu, void *val)
+{
+	long long *cnt = val;
+
+	*cnt += *(long *)val_cpu;
+	return 0;
+}
+
 int main(int ac, char **argv)
 {
 	char filename[256];
@@ -28,17 +36,21 @@ int main(int ac, char **argv)
 	(void) f;
 
 	for (i = 0; i < 5; i++) {
-		long long tcp_cnt, udp_cnt, icmp_cnt;
+		long cnt_percpu;
+		long long tcp_cnt = 0, udp_cnt = 0, icmp_cnt = 0;
 		int key;
 
 		key = IPPROTO_TCP;
-		assert(bpf_lookup_elem(map_fd[0], &key, &tcp_cnt) == 0);
+		assert(bpf_lookup_elem_allcpu(map_fd[0], &key,
+		       &cnt_percpu, &tcp_cnt, handle_one_cpu) == 0);
 
 		key = IPPROTO_UDP;
-		assert(bpf_lookup_elem(map_fd[0], &key, &udp_cnt) == 0);
+		assert(bpf_lookup_elem_allcpu(map_fd[0], &key,
+		       &cnt_percpu, &udp_cnt, handle_one_cpu) == 0);
 
 		key = IPPROTO_ICMP;
-		assert(bpf_lookup_elem(map_fd[0], &key, &icmp_cnt) == 0);
+		assert(bpf_lookup_elem_allcpu(map_fd[0], &key,
+		       &cnt_percpu, &icmp_cnt, handle_one_cpu) == 0);
 
 		printf("TCP %lld UDP %lld ICMP %lld bytes\n",
 		       tcp_cnt, udp_cnt, icmp_cnt);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 9/9] samples/bpf: test percpu array map
  2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
                   ` (7 preceding siblings ...)
  2016-01-11 15:57 ` [PATCH 8/9] sample/bpf: sockex1: user percpu array map Ming Lei
@ 2016-01-11 15:57 ` Ming Lei
  2016-01-12 15:44   ` David Laight
  8 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-11 15:57 UTC (permalink / raw)
  To: linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
---
 samples/bpf/test_maps.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)

diff --git a/samples/bpf/test_maps.c b/samples/bpf/test_maps.c
index 6299ee9..ff5d373 100644
--- a/samples/bpf/test_maps.c
+++ b/samples/bpf/test_maps.c
@@ -142,6 +142,106 @@ static void test_arraymap_sanity(int i, void *data)
 	close(map_fd);
 }
 
+static int handle_one_cpu(unsigned cpu, void *val_cpu, void *val)
+{
+	unsigned long *cnt = val;
+
+	*cnt += *(long *)val_cpu;
+	return 0;
+}
+
+
+static void test_percpu_arraymap_all_cpu_sanity(int i)
+{
+	int key, map_fd;
+	unsigned long value = 0;
+	unsigned long val_cpu;
+	unsigned cpu;
+	unsigned nr_keys = 2;
+	unsigned nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+
+	map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY_PERCPU, sizeof(key),
+				sizeof(value), nr_keys);
+	if (map_fd < 0) {
+		printf("failed to create arraymap '%s'\n", strerror(errno));
+		exit(1);
+	}
+
+	for (key = 0; key < nr_keys; key++) {
+		for (cpu = 0; cpu < nr_cpus; cpu++) {
+			val_cpu = key;
+			assert(bpf_update_elem_percpu(map_fd, &key, &val_cpu,
+						      BPF_ANY, cpu) == 0);
+		}
+	}
+
+	for (key = 0; key < nr_keys; key++) {
+		assert(bpf_lookup_elem_allcpu(map_fd, &key, &val_cpu,
+					      &value, handle_one_cpu) == 0);
+		assert(value == nr_cpus * key);
+	}
+
+	close(map_fd);
+}
+
+static void test_percpu_arraymap_single_cpu_sanity(int i, void *data)
+{
+	int key, next_key, map_fd;
+	long long value;
+	unsigned cpu = *(unsigned *)data;
+
+	map_fd = bpf_create_map(BPF_MAP_TYPE_ARRAY_PERCPU, sizeof(key),
+				sizeof(value), 2);
+	if (map_fd < 0) {
+		printf("failed to create arraymap '%s'\n", strerror(errno));
+		exit(1);
+	}
+
+	key = 1;
+	value = 1234;
+	/* insert key=1 element */
+	assert(bpf_update_elem_percpu(map_fd, &key, &value, BPF_ANY, cpu) == 0);
+
+	value = 0;
+	assert(bpf_update_elem_percpu(map_fd, &key, &value, BPF_NOEXIST,
+				      cpu) == -1 && errno == EEXIST);
+
+	/* check that key=1 can be found */
+	assert(bpf_lookup_elem_percpu(map_fd, &key, &value, cpu) == 0 &&
+				      value == 1234);
+
+	key = 0;
+	/* check that key=0 is also found and zero initialized */
+	assert(bpf_lookup_elem_percpu(map_fd, &key, &value, cpu) == 0 &&
+				      value == 0);
+
+
+	/* key=0 and key=1 were inserted, check that key=2 cannot be inserted
+	 * due to max_entries limit
+	 */
+	key = 2;
+	assert(bpf_update_elem_percpu(map_fd, &key, &value, BPF_EXIST,
+				      cpu) == -1 && errno == E2BIG);
+
+	/* check that key = 2 doesn't exist */
+	assert(bpf_lookup_elem_percpu(map_fd, &key, &value, cpu) == -1 &&
+				      errno == ENOENT);
+
+	/* iterate over two elements */
+	assert(bpf_get_next_key(map_fd, &key, &next_key) == 0 &&
+	       next_key == 0);
+	assert(bpf_get_next_key(map_fd, &next_key, &next_key) == 0 &&
+	       next_key == 1);
+	assert(bpf_get_next_key(map_fd, &next_key, &next_key) == -1 &&
+	       errno == ENOENT);
+
+	/* delete shouldn't succeed */
+	key = 1;
+	assert(bpf_delete_elem(map_fd, &key) == -1 && errno == EINVAL);
+
+	close(map_fd);
+}
+
 #define MAP_SIZE (32 * 1024)
 static void test_map_large(void)
 {
@@ -281,8 +381,18 @@ static void test_map_parallel(void)
 
 int main(void)
 {
+	int cpu;
+	unsigned data;
+
 	test_hashmap_sanity(0, NULL);
 	test_arraymap_sanity(0, NULL);
+
+	for (cpu = 0; cpu < sysconf(_SC_NPROCESSORS_CONF); cpu++) {
+		data = cpu;
+		test_percpu_arraymap_single_cpu_sanity(0, &data);
+	}
+	test_percpu_arraymap_all_cpu_sanity(0);
+
 	test_map_large();
 	test_map_parallel();
 	test_map_stress();
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/9] bpf: prepare for moving map common stuff into one place
  2016-01-11 15:56 ` [PATCH 1/9] bpf: prepare for moving map common stuff into one place Ming Lei
@ 2016-01-11 18:24   ` kbuild test robot
  0 siblings, 0 replies; 28+ messages in thread
From: kbuild test robot @ 2016-01-11 18:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: kbuild-all, linux-kernel, Alexei Starovoitov, David S. Miller,
	netdev, Daniel Borkmann, Martin KaFai Lau, Ming Lei

[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]

Hi Ming,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.4 next-20160111]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Ming-Lei/bpf-support-percpu-ARRAY-map/20160112-000350
config: sparc64-allmodconfig (attached as .config)
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All error/warnings (new ones prefixed by >>):

   kernel/bpf/map.c: In function 'map_delete_elem_nop':
>> kernel/bpf/map.c:24:10: error: 'EINVAL' undeclared (first use in this function)
     return -EINVAL;
             ^
   kernel/bpf/map.c:24:10: note: each undeclared identifier is reported only once for each function it appears in
>> kernel/bpf/map.c:25:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +/EINVAL +24 kernel/bpf/map.c

    18	{
    19		return NULL;
    20	}
    21	
    22	int map_delete_elem_nop(struct bpf_map *map, void *key)
    23	{
  > 24		return -EINVAL;
  > 25	}
    26	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 44250 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-11 15:56 ` [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem Ming Lei
@ 2016-01-11 19:02   ` Alexei Starovoitov
  2016-01-12  5:00     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2016-01-11 19:02 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alexei Starovoitov, David S. Miller, netdev,
	Daniel Borkmann, Martin KaFai Lau

On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
> Prepare for supporting percpu map in the following patch.
> 
> Now userspace can lookup/update mapped value in one specific
> CPU in case of percpu map.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
...
> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
>  		goto free_key;
>  
>  	rcu_read_lock();
> -	ptr = map->ops->map_lookup_elem(map, key);
> +	if (!percpu)
> +		ptr = map->ops->map_lookup_elem(map, key);
> +	else
> +		ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);

I think this approach is less potent than Martin's for several reasons:
- bpf program shouldn't be supplying bpf_smp_processor_id(), since
  it's error prone and a bit slower than doing it explicitly as in:
  http://patchwork.ozlabs.org/patch/564482/
  although Martin's patch also needs to use this_cpu_ptr() instead
  of per_cpu_ptr(.., smp_processor_id());

- two new bpf helpers are not necessary in Martin's approach.
  regular map_lookup_elem() will work for both per-cpu maps.

- such map_lookup_elem_percpu() from syscall is not accurate.
  Martin's approach via smp_call_function_single() returns precise value,
  whereas here memcpy() will race with other cpus.
 
Overall I think both pre-cpu hash and per-cpu array maps are quite useful.
For this particular set I would suggest to rebase on top of Martin's
to reuse BPF_MAP_LOOKUP_PERCPU_ELEM command that should be applicable
to both per-cpu array and per-cpu hash maps.
and add BPF_MAP_UPDATE_PERCPU_ELEM via smp_call as another patch
that should work for both as well.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] bpf: array map: use pre-defined nop map function
  2016-01-11 15:56 ` [PATCH 2/9] bpf: array map: use pre-defined nop map function Ming Lei
@ 2016-01-11 19:08   ` Alexei Starovoitov
  0 siblings, 0 replies; 28+ messages in thread
From: Alexei Starovoitov @ 2016-01-11 19:08 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alexei Starovoitov, David S. Miller, netdev,
	Daniel Borkmann, Martin KaFai Lau

On Mon, Jan 11, 2016 at 11:56:54PM +0800, Ming Lei wrote:
> So that we can remove the per-map nop map fucntions.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  kernel/bpf/arraymap.c | 19 +++++--------------
>  1 file changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index b0799bc..9ad9031 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -17,6 +17,8 @@
>  #include <linux/filter.h>
>  #include <linux/perf_event.h>
>  
> +#include "bpf_map.h"
> +
>  /* Called from syscall */
>  static struct bpf_map *array_map_alloc(union bpf_attr *attr)
>  {
> @@ -115,12 +117,6 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
>  	return 0;
>  }
>  
> -/* Called from syscall or from eBPF program */
> -static int array_map_delete_elem(struct bpf_map *map, void *key)
> -{
> -	return -EINVAL;
> -}
> -
>  /* Called when map->refcnt goes to zero, either from workqueue or from syscall */
>  static void array_map_free(struct bpf_map *map)
>  {
> @@ -142,7 +138,7 @@ static const struct bpf_map_ops array_ops = {
>  	.map_get_next_key = array_map_get_next_key,
>  	.map_lookup_elem = array_map_lookup_elem,
>  	.map_update_elem = array_map_update_elem,
> -	.map_delete_elem = array_map_delete_elem,
> +	.map_delete_elem = map_delete_elem_nop,

I think moving these two callbacks into separate file only
reduces readbility. It doesn't look like that we save any .text this way.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/9] bpf: arraymap: introduce BPF_MAP_TYPE_ARRAY_PERCPU
  2016-01-11 15:56 ` [PATCH 6/9] bpf: arraymap: introduce BPF_MAP_TYPE_ARRAY_PERCPU Ming Lei
@ 2016-01-11 19:14   ` Alexei Starovoitov
  0 siblings, 0 replies; 28+ messages in thread
From: Alexei Starovoitov @ 2016-01-11 19:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Alexei Starovoitov, David S. Miller, netdev,
	Daniel Borkmann, Martin KaFai Lau

On Mon, Jan 11, 2016 at 11:56:58PM +0800, Ming Lei wrote:
> This patch introduces percpu array map so that expensive
> atomic operations can be avoided in eBPF prog in case of
> ARRAY map.
> 
> PERCPU MAP uses the percpu version of update/lookup element
> helpers and callbacks to access element in the map, and
> the previous update/lookup element helpers and callbacks
> don't work at the same time.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>

useful stuff!

> +	if (percpu) {
> +		if (alloc_percpu_array(array, attr->max_entries,
> +				       attr->value_size)) {
> +			kvfree(array);
> +			return ERR_PTR(-ENOMEM);
> +		}
> +		array->map.pages = round_up(attr->max_entries *
> +				attr->value_size * num_possible_cpus(),
> +				PAGE_SIZE) >> PAGE_SHIFT;

I think it would be more accurate to add it to array_size instead of
doing page rounding here.

> -	array->map.pages = round_up(array_size, PAGE_SIZE) >> PAGE_SHIFT;
> +	array->map.pages += round_up(array_size, PAGE_SIZE) >> PAGE_SHIFT;

and this line wouldn't need to change...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-11 19:02   ` Alexei Starovoitov
@ 2016-01-12  5:00     ` Ming Lei
  2016-01-12  5:49       ` Alexei Starovoitov
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-12  5:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Linux Kernel Mailing List, Alexei Starovoitov, David S. Miller,
	Network Development, Daniel Borkmann, Martin KaFai Lau

Hi Alexei,

Thanks for your review.

On Tue, Jan 12, 2016 at 3:02 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
>> Prepare for supporting percpu map in the following patch.
>>
>> Now userspace can lookup/update mapped value in one specific
>> CPU in case of percpu map.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ...
>> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
>>               goto free_key;
>>
>>       rcu_read_lock();
>> -     ptr = map->ops->map_lookup_elem(map, key);
>> +     if (!percpu)
>> +             ptr = map->ops->map_lookup_elem(map, key);
>> +     else
>> +             ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
>
> I think this approach is less potent than Martin's for several reasons:
> - bpf program shouldn't be supplying bpf_smp_processor_id(), since
>   it's error prone and a bit slower than doing it explicitly as in:
>   http://patchwork.ozlabs.org/patch/564482/
>   although Martin's patch also needs to use this_cpu_ptr() instead
>   of per_cpu_ptr(.., smp_processor_id());

For PERCPU map, smp_processor_id() is definitely required, and
Martin's patch need that too, please see htab_percpu_map_lookup_elem()
in his patch.

>
> - two new bpf helpers are not necessary in Martin's approach.
>   regular map_lookup_elem() will work for both per-cpu maps.

For percpu ARRAY, they are not necessary, but it is flexiable to
provide them since we should allow prog to retrieve the perpcu
value, also it is easier to implement the system call with the two
helpers.

For percpu HASH, they are required since eBPF prog need to support
deleting element, so we have provide these helpers for prog to retrieve
percpu value before deleting the elem.

>
> - such map_lookup_elem_percpu() from syscall is not accurate.
>   Martin's approach via smp_call_function_single() returns precise value,

I don't understand why Martin's approach is precise and my patch isn't,
could you explain it a bit?

>   whereas here memcpy() will race with other cpus.
>
> Overall I think both pre-cpu hash and per-cpu array maps are quite useful.

percpu hash isn't a must since we can get similar effect by making real_key
and cpu_id as key with less memory consumption, but we can introduce that.

> For this particular set I would suggest to rebase on top of Martin's
> to reuse BPF_MAP_LOOKUP_PERCPU_ELEM command that should be
applicable
> to both per-cpu array and per-cpu hash maps.

Martin's patch doesn't introduce the two helpers, which is required for percpu
hash, and it also makes the syscall easier to implement.

> and add BPF_MAP_UPDATE_PERCPU_ELEM via smp_call as another patch
> that should work for both as well.




Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-12  5:00     ` Ming Lei
@ 2016-01-12  5:49       ` Alexei Starovoitov
  2016-01-12 11:05         ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2016-01-12  5:49 UTC (permalink / raw)
  To: Ming Lei
  Cc: Linux Kernel Mailing List, Alexei Starovoitov, David S. Miller,
	Network Development, Daniel Borkmann, Martin KaFai Lau

On Tue, Jan 12, 2016 at 01:00:00PM +0800, Ming Lei wrote:
> Hi Alexei,
> 
> Thanks for your review.
> 
> On Tue, Jan 12, 2016 at 3:02 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
> >> Prepare for supporting percpu map in the following patch.
> >>
> >> Now userspace can lookup/update mapped value in one specific
> >> CPU in case of percpu map.
> >>
> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> > ...
> >> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
> >>               goto free_key;
> >>
> >>       rcu_read_lock();
> >> -     ptr = map->ops->map_lookup_elem(map, key);
> >> +     if (!percpu)
> >> +             ptr = map->ops->map_lookup_elem(map, key);
> >> +     else
> >> +             ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
> >
> > I think this approach is less potent than Martin's for several reasons:
> > - bpf program shouldn't be supplying bpf_smp_processor_id(), since
> >   it's error prone and a bit slower than doing it explicitly as in:
> >   http://patchwork.ozlabs.org/patch/564482/
> >   although Martin's patch also needs to use this_cpu_ptr() instead
> >   of per_cpu_ptr(.., smp_processor_id());
> 
> For PERCPU map, smp_processor_id() is definitely required, and
> Martin's patch need that too, please see htab_percpu_map_lookup_elem()
> in his patch.

hmm. it's definitely _not_ required. right?
bpf programs shouldn't be accessing other per-cpu regions
only their own. That's what this_cpu_ptr is for.
I don't see a case where accessing other cpu per-cpu element
wouldn't be a bug in the program.

> > - two new bpf helpers are not necessary in Martin's approach.
> >   regular map_lookup_elem() will work for both per-cpu maps.
> 
> For percpu ARRAY, they are not necessary, but it is flexiable to
> provide them since we should allow prog to retrieve the perpcu
> value, also it is easier to implement the system call with the two
> helpers.
> 
> For percpu HASH, they are required since eBPF prog need to support
> deleting element, so we have provide these helpers for prog to retrieve
> percpu value before deleting the elem.

bpf programs cannot have loops, so there is no valid case to access
other cpu element, since program cannot aggregate all-cpu values.
Therefore the programs can only update/lookup this_cpu element and
delete such element across all cpus.

> > - such map_lookup_elem_percpu() from syscall is not accurate.
> >   Martin's approach via smp_call_function_single() returns precise value,
> 
> I don't understand why Martin's approach is precise and my patch isn't,
> could you explain it a bit?

because simple mempcy() called from syscall will race with lookup/increment
done to this_cpu element on another cpu. To avoid this race the smp_call
is needed, so that memcpy() happens on the cpu that updated the element,
so smp_call's memcpy and bpf program won't be touch that cpu value
at the same time and user space will read the correct element values.
If program updates them a lot, the value that user space reads will become
stale very quickly, but it will be valid. That's especially important
when program have multiple counters inside single element value.

> >   whereas here memcpy() will race with other cpus.
> >
> > Overall I think both pre-cpu hash and per-cpu array maps are quite useful.
> 
> percpu hash isn't a must since we can get similar effect by making real_key
> and cpu_id as key with less memory consumption, but we can introduce that.

I don't think so. bpf programs shouldn't be dealing with smp_processor_id()
It was poor man's per-cpu hack and it had too many disadvantages.
Like get_next_key() doesn't work properly when key is {key+processor_id},
so walking over hash map to aggregate fake per-cpu elements requires
user space to create another map just for walking.
map->max_entries limit becomes bogus.
this_cpu_ptr(..) is typically faster than per_cpu_ptr(.., smp_proc_id())

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-12  5:49       ` Alexei Starovoitov
@ 2016-01-12 11:05         ` Ming Lei
  2016-01-12 19:10           ` Martin KaFai Lau
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-12 11:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Linux Kernel Mailing List, Alexei Starovoitov, David S. Miller,
	Network Development, Daniel Borkmann, Martin KaFai Lau

On Tue, Jan 12, 2016 at 1:49 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Jan 12, 2016 at 01:00:00PM +0800, Ming Lei wrote:
>> Hi Alexei,
>>
>> Thanks for your review.
>>
>> On Tue, Jan 12, 2016 at 3:02 AM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
>> >> Prepare for supporting percpu map in the following patch.
>> >>
>> >> Now userspace can lookup/update mapped value in one specific
>> >> CPU in case of percpu map.
>> >>
>> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> > ...
>> >> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
>> >>               goto free_key;
>> >>
>> >>       rcu_read_lock();
>> >> -     ptr = map->ops->map_lookup_elem(map, key);
>> >> +     if (!percpu)
>> >> +             ptr = map->ops->map_lookup_elem(map, key);
>> >> +     else
>> >> +             ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
>> >
>> > I think this approach is less potent than Martin's for several reasons:
>> > - bpf program shouldn't be supplying bpf_smp_processor_id(), since
>> >   it's error prone and a bit slower than doing it explicitly as in:
>> >   http://patchwork.ozlabs.org/patch/564482/
>> >   although Martin's patch also needs to use this_cpu_ptr() instead
>> >   of per_cpu_ptr(.., smp_processor_id());
>>
>> For PERCPU map, smp_processor_id() is definitely required, and
>> Martin's patch need that too, please see htab_percpu_map_lookup_elem()
>> in his patch.
>
> hmm. it's definitely _not_ required. right?
> bpf programs shouldn't be accessing other per-cpu regions
> only their own. That's what this_cpu_ptr is for.
> I don't see a case where accessing other cpu per-cpu element
> wouldn't be a bug in the program.
>
>> > - two new bpf helpers are not necessary in Martin's approach.
>> >   regular map_lookup_elem() will work for both per-cpu maps.
>>
>> For percpu ARRAY, they are not necessary, but it is flexiable to
>> provide them since we should allow prog to retrieve the perpcu
>> value, also it is easier to implement the system call with the two
>> helpers.
>>
>> For percpu HASH, they are required since eBPF prog need to support
>> deleting element, so we have provide these helpers for prog to retrieve
>> percpu value before deleting the elem.
>
> bpf programs cannot have loops, so there is no valid case to access
> other cpu element, since program cannot aggregate all-cpu values.
> Therefore the programs can only update/lookup this_cpu element and
> delete such element across all cpus.

Looks I missed the point of looping constraint, then basically delete element
helper doesn't make sense in percpu hash.

>
>> > - such map_lookup_elem_percpu() from syscall is not accurate.
>> >   Martin's approach via smp_call_function_single() returns precise value,
>>
>> I don't understand why Martin's approach is precise and my patch isn't,
>> could you explain it a bit?
>
> because simple mempcy() called from syscall will race with lookup/increment
> done to this_cpu element on another cpu. To avoid this race the smp_call
> is needed, so that memcpy() happens on the cpu that updated the element,
> so smp_call's memcpy and bpf program won't be touch that cpu value
> at the same time and user space will read the correct element values.
> If program updates them a lot, the value that user space reads will become
> stale very quickly, but it will be valid. That's especially important
> when program have multiple counters inside single element value.

But smp_call is often very slow because of IPI, so the value acculated
finally becomes stale easily even though the value from the requested cpu
is 'precise' at the exact time, especially when there are lots of CPUs, so I
think using smp_call is really a bad idea. And smp_call is worse than
iterating from CPUs simply.

>
>> >   whereas here memcpy() will race with other cpus.
>> >
>> > Overall I think both pre-cpu hash and per-cpu array maps are quite useful.
>>
>> percpu hash isn't a must since we can get similar effect by making real_key
>> and cpu_id as key with less memory consumption, but we can introduce that.
>
> I don't think so. bpf programs shouldn't be dealing with smp_processor_id()
> It was poor man's per-cpu hack and it had too many disadvantages.
> Like get_next_key() doesn't work properly when key is {key+processor_id},
> so walking over hash map to aggregate fake per-cpu elements requires
> user space to create another map just for walking.
> map->max_entries limit becomes bogus.
> this_cpu_ptr(..) is typically faster than per_cpu_ptr(.., smp_proc_id())

OK, then this_cpu_ptr() is better since we don't need to access the value
of other CPUs.

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 9/9] samples/bpf: test percpu array map
  2016-01-11 15:57 ` [PATCH 9/9] samples/bpf: test " Ming Lei
@ 2016-01-12 15:44   ` David Laight
  0 siblings, 0 replies; 28+ messages in thread
From: David Laight @ 2016-01-12 15:44 UTC (permalink / raw)
  To: 'Ming Lei', linux-kernel, Alexei Starovoitov
  Cc: David S. Miller, netdev, Daniel Borkmann, Martin KaFai Lau

From: Ming Lei
> Sent: 11 January 2016 15:57
...
> ---
>  samples/bpf/test_maps.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 110 insertions(+)
> 
> diff --git a/samples/bpf/test_maps.c b/samples/bpf/test_maps.c
> index 6299ee9..ff5d373 100644
> --- a/samples/bpf/test_maps.c
> +++ b/samples/bpf/test_maps.c
> @@ -142,6 +142,106 @@ static void test_arraymap_sanity(int i, void *data)
>  	close(map_fd);
>  }
> 
> +static int handle_one_cpu(unsigned cpu, void *val_cpu, void *val)
> +{
> +	unsigned long *cnt = val;
> +
> +	*cnt += *(long *)val_cpu;

Integer pointer casts ring big alarm bells - they are accidents
waiting to happen.

If the pointers are 'pointer to long' then define them as such.

	David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-12 11:05         ` Ming Lei
@ 2016-01-12 19:10           ` Martin KaFai Lau
  2016-01-13  0:38             ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Martin KaFai Lau @ 2016-01-12 19:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Alexei Starovoitov, Linux Kernel Mailing List,
	Alexei Starovoitov, David S. Miller, Network Development,
	Daniel Borkmann

On Tue, Jan 12, 2016 at 07:05:47PM +0800, Ming Lei wrote:
> On Tue, Jan 12, 2016 at 1:49 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Jan 12, 2016 at 01:00:00PM +0800, Ming Lei wrote:
> >> Hi Alexei,
> >>
> >> Thanks for your review.
> >>
> >> On Tue, Jan 12, 2016 at 3:02 AM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
> >> >> Prepare for supporting percpu map in the following patch.
> >> >>
> >> >> Now userspace can lookup/update mapped value in one specific
> >> >> CPU in case of percpu map.
> >> >>
> >> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >> > ...
> >> >> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
> >> >>               goto free_key;
> >> >>
> >> >>       rcu_read_lock();
> >> >> -     ptr = map->ops->map_lookup_elem(map, key);
> >> >> +     if (!percpu)
> >> >> +             ptr = map->ops->map_lookup_elem(map, key);
> >> >> +     else
> >> >> +             ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
> >> >
> >> > I think this approach is less potent than Martin's for several reasons:
> >> > - bpf program shouldn't be supplying bpf_smp_processor_id(), since
> >> >   it's error prone and a bit slower than doing it explicitly as in:
> >> >   https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_patch_564482_&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=kb6DfquDoMLBv0hgOO76O9SMvdCnhwnEwhgON8868I8&s=QtJkMfQDB55jn_aA_umJ8jiJRQlQhW5UxYO5YdxuGNI&e=
> >> >   although Martin's patch also needs to use this_cpu_ptr() instead
> >> >   of per_cpu_ptr(.., smp_processor_id());
> >>
> >> For PERCPU map, smp_processor_id() is definitely required, and
> >> Martin's patch need that too, please see htab_percpu_map_lookup_elem()
> >> in his patch.
> >
> > hmm. it's definitely _not_ required. right?
> > bpf programs shouldn't be accessing other per-cpu regions
> > only their own. That's what this_cpu_ptr is for.
> > I don't see a case where accessing other cpu per-cpu element
> > wouldn't be a bug in the program.
> >
> >> > - two new bpf helpers are not necessary in Martin's approach.
> >> >   regular map_lookup_elem() will work for both per-cpu maps.
> >>
> >> For percpu ARRAY, they are not necessary, but it is flexiable to
> >> provide them since we should allow prog to retrieve the perpcu
> >> value, also it is easier to implement the system call with the two
> >> helpers.
> >>
> >> For percpu HASH, they are required since eBPF prog need to support
> >> deleting element, so we have provide these helpers for prog to retrieve
> >> percpu value before deleting the elem.
> >
> > bpf programs cannot have loops, so there is no valid case to access
> > other cpu element, since program cannot aggregate all-cpu values.
> > Therefore the programs can only update/lookup this_cpu element and
> > delete such element across all cpus.
>
> Looks I missed the point of looping constraint, then basically delete element
> helper doesn't make sense in percpu hash.
>
> >
> >> > - such map_lookup_elem_percpu() from syscall is not accurate.
> >> >   Martin's approach via smp_call_function_single() returns precise value,
> >>
> >> I don't understand why Martin's approach is precise and my patch isn't,
> >> could you explain it a bit?
> >
> > because simple mempcy() called from syscall will race with lookup/increment
> > done to this_cpu element on another cpu. To avoid this race the smp_call
> > is needed, so that memcpy() happens on the cpu that updated the element,
> > so smp_call's memcpy and bpf program won't be touch that cpu value
> > at the same time and user space will read the correct element values.
> > If program updates them a lot, the value that user space reads will become
> > stale very quickly, but it will be valid. That's especially important
> > when program have multiple counters inside single element value.
>
> But smp_call is often very slow because of IPI, so the value acculated
> finally becomes stale easily even though the value from the requested cpu
> is 'precise' at the exact time, especially when there are lots of CPUs, so I
> think using smp_call is really a bad idea. And smp_call is worse than
> iterating from CPUs simply.
The userspace usually only aggregates value across all cpu every X seconds.
I hardly consider some number of micro-seconds old data is stale.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-12 19:10           ` Martin KaFai Lau
@ 2016-01-13  0:38             ` Ming Lei
  2016-01-13  2:22               ` Martin KaFai Lau
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-13  0:38 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Alexei Starovoitov, Linux Kernel Mailing List,
	Alexei Starovoitov, David S. Miller, Network Development,
	Daniel Borkmann

On Wed, Jan 13, 2016 at 3:10 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Tue, Jan 12, 2016 at 07:05:47PM +0800, Ming Lei wrote:
>> On Tue, Jan 12, 2016 at 1:49 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Tue, Jan 12, 2016 at 01:00:00PM +0800, Ming Lei wrote:
>> >> Hi Alexei,
>> >>
>> >> Thanks for your review.
>> >>
>> >> On Tue, Jan 12, 2016 at 3:02 AM, Alexei Starovoitov
>> >> <alexei.starovoitov@gmail.com> wrote:
>> >> > On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
>> >> >> Prepare for supporting percpu map in the following patch.
>> >> >>
>> >> >> Now userspace can lookup/update mapped value in one specific
>> >> >> CPU in case of percpu map.
>> >> >>
>> >> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> >> > ...
>> >> >> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
>> >> >>               goto free_key;
>> >> >>
>> >> >>       rcu_read_lock();
>> >> >> -     ptr = map->ops->map_lookup_elem(map, key);
>> >> >> +     if (!percpu)
>> >> >> +             ptr = map->ops->map_lookup_elem(map, key);
>> >> >> +     else
>> >> >> +             ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
>> >> >
>> >> > I think this approach is less potent than Martin's for several reasons:
>> >> > - bpf program shouldn't be supplying bpf_smp_processor_id(), since
>> >> >   it's error prone and a bit slower than doing it explicitly as in:
>> >> >   https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_patch_564482_&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=kb6DfquDoMLBv0hgOO76O9SMvdCnhwnEwhgON8868I8&s=QtJkMfQDB55jn_aA_umJ8jiJRQlQhW5UxYO5YdxuGNI&e=
>> >> >   although Martin's patch also needs to use this_cpu_ptr() instead
>> >> >   of per_cpu_ptr(.., smp_processor_id());
>> >>
>> >> For PERCPU map, smp_processor_id() is definitely required, and
>> >> Martin's patch need that too, please see htab_percpu_map_lookup_elem()
>> >> in his patch.
>> >
>> > hmm. it's definitely _not_ required. right?
>> > bpf programs shouldn't be accessing other per-cpu regions
>> > only their own. That's what this_cpu_ptr is for.
>> > I don't see a case where accessing other cpu per-cpu element
>> > wouldn't be a bug in the program.
>> >
>> >> > - two new bpf helpers are not necessary in Martin's approach.
>> >> >   regular map_lookup_elem() will work for both per-cpu maps.
>> >>
>> >> For percpu ARRAY, they are not necessary, but it is flexiable to
>> >> provide them since we should allow prog to retrieve the perpcu
>> >> value, also it is easier to implement the system call with the two
>> >> helpers.
>> >>
>> >> For percpu HASH, they are required since eBPF prog need to support
>> >> deleting element, so we have provide these helpers for prog to retrieve
>> >> percpu value before deleting the elem.
>> >
>> > bpf programs cannot have loops, so there is no valid case to access
>> > other cpu element, since program cannot aggregate all-cpu values.
>> > Therefore the programs can only update/lookup this_cpu element and
>> > delete such element across all cpus.
>>
>> Looks I missed the point of looping constraint, then basically delete element
>> helper doesn't make sense in percpu hash.
>>
>> >
>> >> > - such map_lookup_elem_percpu() from syscall is not accurate.
>> >> >   Martin's approach via smp_call_function_single() returns precise value,
>> >>
>> >> I don't understand why Martin's approach is precise and my patch isn't,
>> >> could you explain it a bit?
>> >
>> > because simple mempcy() called from syscall will race with lookup/increment
>> > done to this_cpu element on another cpu. To avoid this race the smp_call
>> > is needed, so that memcpy() happens on the cpu that updated the element,
>> > so smp_call's memcpy and bpf program won't be touch that cpu value
>> > at the same time and user space will read the correct element values.
>> > If program updates them a lot, the value that user space reads will become
>> > stale very quickly, but it will be valid. That's especially important
>> > when program have multiple counters inside single element value.
>>
>> But smp_call is often very slow because of IPI, so the value acculated
>> finally becomes stale easily even though the value from the requested cpu
>> is 'precise' at the exact time, especially when there are lots of CPUs, so I
>> think using smp_call is really a bad idea. And smp_call is worse than
>> iterating from CPUs simply.
> The userspace usually only aggregates value across all cpu every X seconds.

That is just in your case, and Alexei worried the issue of data stale.

> I hardly consider some number of micro-seconds old data is stale.

Firstly CPU can do hugh things in micro-seconds, such as the if's irq
may just come duirng the period.

Secondly, the time can become longer(maybe dozens of us, or in milli-seconds)
if CPU number is very bigger.

So why not do it in the quick way?


-- 
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-13  0:38             ` Ming Lei
@ 2016-01-13  2:22               ` Martin KaFai Lau
  2016-01-13  3:17                 ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Martin KaFai Lau @ 2016-01-13  2:22 UTC (permalink / raw)
  To: Ming Lei
  Cc: Alexei Starovoitov, Linux Kernel Mailing List,
	Alexei Starovoitov, David S. Miller, Network Development,
	Daniel Borkmann

On Wed, Jan 13, 2016 at 08:38:18AM +0800, Ming Lei wrote:
> On Wed, Jan 13, 2016 at 3:10 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> > On Tue, Jan 12, 2016 at 07:05:47PM +0800, Ming Lei wrote:
> >> On Tue, Jan 12, 2016 at 1:49 PM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Tue, Jan 12, 2016 at 01:00:00PM +0800, Ming Lei wrote:
> >> >> Hi Alexei,
> >> >>
> >> >> Thanks for your review.
> >> >>
> >> >> On Tue, Jan 12, 2016 at 3:02 AM, Alexei Starovoitov
> >> >> <alexei.starovoitov@gmail.com> wrote:
> >> >> > On Mon, Jan 11, 2016 at 11:56:57PM +0800, Ming Lei wrote:
> >> >> >> Prepare for supporting percpu map in the following patch.
> >> >> >>
> >> >> >> Now userspace can lookup/update mapped value in one specific
> >> >> >> CPU in case of percpu map.
> >> >> >>
> >> >> >> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> >> >> > ...
> >> >> >> @@ -265,7 +272,10 @@ static int map_lookup_elem(union bpf_attr *attr)
> >> >> >>               goto free_key;
> >> >> >>
> >> >> >>       rcu_read_lock();
> >> >> >> -     ptr = map->ops->map_lookup_elem(map, key);
> >> >> >> +     if (!percpu)
> >> >> >> +             ptr = map->ops->map_lookup_elem(map, key);
> >> >> >> +     else
> >> >> >> +             ptr = map->ops->map_lookup_elem_percpu(map, key, attr->cpu);
> >> >> >
> >> >> > I think this approach is less potent than Martin's for several reasons:
> >> >> > - bpf program shouldn't be supplying bpf_smp_processor_id(), since
> >> >> >   it's error prone and a bit slower than doing it explicitly as in:
> >> >> >   https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_patch_564482_&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=kb6DfquDoMLBv0hgOO76O9SMvdCnhwnEwhgON8868I8&s=QtJkMfQDB55jn_aA_umJ8jiJRQlQhW5UxYO5YdxuGNI&e=
> >> >> >   although Martin's patch also needs to use this_cpu_ptr() instead
> >> >> >   of per_cpu_ptr(.., smp_processor_id());
> >> >>
> >> >> For PERCPU map, smp_processor_id() is definitely required, and
> >> >> Martin's patch need that too, please see htab_percpu_map_lookup_elem()
> >> >> in his patch.
> >> >
> >> > hmm. it's definitely _not_ required. right?
> >> > bpf programs shouldn't be accessing other per-cpu regions
> >> > only their own. That's what this_cpu_ptr is for.
> >> > I don't see a case where accessing other cpu per-cpu element
> >> > wouldn't be a bug in the program.
> >> >
> >> >> > - two new bpf helpers are not necessary in Martin's approach.
> >> >> >   regular map_lookup_elem() will work for both per-cpu maps.
> >> >>
> >> >> For percpu ARRAY, they are not necessary, but it is flexiable to
> >> >> provide them since we should allow prog to retrieve the perpcu
> >> >> value, also it is easier to implement the system call with the two
> >> >> helpers.
> >> >>
> >> >> For percpu HASH, they are required since eBPF prog need to support
> >> >> deleting element, so we have provide these helpers for prog to retrieve
> >> >> percpu value before deleting the elem.
> >> >
> >> > bpf programs cannot have loops, so there is no valid case to access
> >> > other cpu element, since program cannot aggregate all-cpu values.
> >> > Therefore the programs can only update/lookup this_cpu element and
> >> > delete such element across all cpus.
> >>
> >> Looks I missed the point of looping constraint, then basically delete element
> >> helper doesn't make sense in percpu hash.
> >>
> >> >
> >> >> > - such map_lookup_elem_percpu() from syscall is not accurate.
> >> >> >   Martin's approach via smp_call_function_single() returns precise value,
> >> >>
> >> >> I don't understand why Martin's approach is precise and my patch isn't,
> >> >> could you explain it a bit?
> >> >
> >> > because simple mempcy() called from syscall will race with lookup/increment
> >> > done to this_cpu element on another cpu. To avoid this race the smp_call
> >> > is needed, so that memcpy() happens on the cpu that updated the element,
> >> > so smp_call's memcpy and bpf program won't be touch that cpu value
> >> > at the same time and user space will read the correct element values.
> >> > If program updates them a lot, the value that user space reads will become
> >> > stale very quickly, but it will be valid. That's especially important
> >> > when program have multiple counters inside single element value.
> >>
> >> But smp_call is often very slow because of IPI, so the value acculated
> >> finally becomes stale easily even though the value from the requested cpu
> >> is 'precise' at the exact time, especially when there are lots of CPUs, so I
> >> think using smp_call is really a bad idea. And smp_call is worse than
> >> iterating from CPUs simply.
> > The userspace usually only aggregates value across all cpu every X seconds.
>
> That is just in your case, and Alexei worried the issue of data stale.
I believe we are talking about validity of a value.  How to
make use of a less-stale but invalid data?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-13  2:22               ` Martin KaFai Lau
@ 2016-01-13  3:17                 ` Ming Lei
  2016-01-13  5:30                   ` Alexei Starovoitov
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-13  3:17 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Alexei Starovoitov, Linux Kernel Mailing List,
	Alexei Starovoitov, David S. Miller, Network Development,
	Daniel Borkmann

On Wed, Jan 13, 2016 at 10:22 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Wed, Jan 13, 2016 at 08:38:18AM +0800, Ming Lei wrote:
>> > The userspace usually only aggregates value across all cpu every X seconds.
>>
>> That is just in your case, and Alexei worried the issue of data stale.
> I believe we are talking about validity of a value.  How to
> make use of a less-stale but invalid data?

About the 'invalidity' thing, it should be same between using
smp_call(run in IPI irq handler) and simple memcpy().

When smp_call_function_single() is used to request to lookup element in
the specific CPU, the value of the element may be in updating in that CPU
and not completed yet in eBPF prog, then IPI comes and half updated
data is still returned to syscall.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-13  3:17                 ` Ming Lei
@ 2016-01-13  5:30                   ` Alexei Starovoitov
  2016-01-13 14:56                     ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2016-01-13  5:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin KaFai Lau, Linux Kernel Mailing List, Alexei Starovoitov,
	David S. Miller, Network Development, Daniel Borkmann

On Wed, Jan 13, 2016 at 11:17:23AM +0800, Ming Lei wrote:
> On Wed, Jan 13, 2016 at 10:22 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> > On Wed, Jan 13, 2016 at 08:38:18AM +0800, Ming Lei wrote:
> >> > The userspace usually only aggregates value across all cpu every X seconds.
> >>
> >> That is just in your case, and Alexei worried the issue of data stale.
> > I believe we are talking about validity of a value.  How to
> > make use of a less-stale but invalid data?
> 
> About the 'invalidity' thing, it should be same between using
> smp_call(run in IPI irq handler) and simple memcpy().
> 
> When smp_call_function_single() is used to request to lookup element in
> the specific CPU, the value of the element may be in updating in that CPU
> and not completed yet in eBPF prog, then IPI comes and half updated
> data is still returned to syscall.

hmm. I'm not following. bpf programs are executing with preempt disabled,
so smp_call_function_single suppose to execute when bpf is not running.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-13  5:30                   ` Alexei Starovoitov
@ 2016-01-13 14:56                     ` Ming Lei
  2016-01-14  1:19                       ` Alexei Starovoitov
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-13 14:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Linux Kernel Mailing List, Alexei Starovoitov,
	David S. Miller, Network Development, Daniel Borkmann

On Wed, Jan 13, 2016 at 1:30 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, Jan 13, 2016 at 11:17:23AM +0800, Ming Lei wrote:
>> On Wed, Jan 13, 2016 at 10:22 AM, Martin KaFai Lau <kafai@fb.com> wrote:
>> > On Wed, Jan 13, 2016 at 08:38:18AM +0800, Ming Lei wrote:
>> >> > The userspace usually only aggregates value across all cpu every X seconds.
>> >>
>> >> That is just in your case, and Alexei worried the issue of data stale.
>> > I believe we are talking about validity of a value.  How to
>> > make use of a less-stale but invalid data?
>>
>> About the 'invalidity' thing, it should be same between using
>> smp_call(run in IPI irq handler) and simple memcpy().
>>
>> When smp_call_function_single() is used to request to lookup element in
>> the specific CPU, the value of the element may be in updating in that CPU
>> and not completed yet in eBPF prog, then IPI comes and half updated
>> data is still returned to syscall.
>
> hmm. I'm not following. bpf programs are executing with preempt disabled,
> so smp_call_function_single suppose to execute when bpf is not running.

Preempt disabled doesn't mean irq disabled, does it?  So when bpf prog is
running, the IPI irq for smp_call still may come on that CPU.

Also in current non-percpu hash, the situation exists too between
lookup elem syscall and updating value of element from bpf prog in
SMP.

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-13 14:56                     ` Ming Lei
@ 2016-01-14  1:19                       ` Alexei Starovoitov
  2016-01-14  2:42                         ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2016-01-14  1:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin KaFai Lau, Linux Kernel Mailing List, Alexei Starovoitov,
	David S. Miller, Network Development, Daniel Borkmann

On Wed, Jan 13, 2016 at 10:56:38PM +0800, Ming Lei wrote:
> On Wed, Jan 13, 2016 at 1:30 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Wed, Jan 13, 2016 at 11:17:23AM +0800, Ming Lei wrote:
> >> On Wed, Jan 13, 2016 at 10:22 AM, Martin KaFai Lau <kafai@fb.com> wrote:
> >> > On Wed, Jan 13, 2016 at 08:38:18AM +0800, Ming Lei wrote:
> >> >> > The userspace usually only aggregates value across all cpu every X seconds.
> >> >>
> >> >> That is just in your case, and Alexei worried the issue of data stale.
> >> > I believe we are talking about validity of a value.  How to
> >> > make use of a less-stale but invalid data?
> >>
> >> About the 'invalidity' thing, it should be same between using
> >> smp_call(run in IPI irq handler) and simple memcpy().
> >>
> >> When smp_call_function_single() is used to request to lookup element in
> >> the specific CPU, the value of the element may be in updating in that CPU
> >> and not completed yet in eBPF prog, then IPI comes and half updated
> >> data is still returned to syscall.
> >
> > hmm. I'm not following. bpf programs are executing with preempt disabled,
> > so smp_call_function_single suppose to execute when bpf is not running.
> 
> Preempt disabled doesn't mean irq disabled, does it?  So when bpf prog is
> running, the IPI irq for smp_call still may come on that CPU.

In case of kprobes irqs are disabled, but yeah for sockets smp_call won't help.
Can probably use schedule_work_on(), but that's too heavy.
I guess we need bpf_map_lookup_and_delete_elem() syscall command, so we can
delete single pointer out of per-cpu hash map and in call_rcu() copy precise
counters.

> Also in current non-percpu hash, the situation exists too between
> lookup elem syscall and updating value of element from bpf prog in
> SMP.

looks like regular bpf_map_lookup_elem() syscall will return inaccurate data
even for per-cpu hash. hmm. we need to brain storm more on it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-14  1:19                       ` Alexei Starovoitov
@ 2016-01-14  2:42                         ` Ming Lei
  2016-01-14  5:08                           ` Alexei Starovoitov
  0 siblings, 1 reply; 28+ messages in thread
From: Ming Lei @ 2016-01-14  2:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Linux Kernel Mailing List, Alexei Starovoitov,
	David S. Miller, Network Development, Daniel Borkmann

On Thu, Jan 14, 2016 at 9:19 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, Jan 13, 2016 at 10:56:38PM +0800, Ming Lei wrote:
>> On Wed, Jan 13, 2016 at 1:30 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> > On Wed, Jan 13, 2016 at 11:17:23AM +0800, Ming Lei wrote:
>> >> On Wed, Jan 13, 2016 at 10:22 AM, Martin KaFai Lau <kafai@fb.com> wrote:
>> >> > On Wed, Jan 13, 2016 at 08:38:18AM +0800, Ming Lei wrote:
>> >> >> > The userspace usually only aggregates value across all cpu every X seconds.
>> >> >>
>> >> >> That is just in your case, and Alexei worried the issue of data stale.
>> >> > I believe we are talking about validity of a value.  How to
>> >> > make use of a less-stale but invalid data?
>> >>
>> >> About the 'invalidity' thing, it should be same between using
>> >> smp_call(run in IPI irq handler) and simple memcpy().
>> >>
>> >> When smp_call_function_single() is used to request to lookup element in
>> >> the specific CPU, the value of the element may be in updating in that CPU
>> >> and not completed yet in eBPF prog, then IPI comes and half updated
>> >> data is still returned to syscall.
>> >
>> > hmm. I'm not following. bpf programs are executing with preempt disabled,
>> > so smp_call_function_single suppose to execute when bpf is not running.
>>
>> Preempt disabled doesn't mean irq disabled, does it?  So when bpf prog is
>> running, the IPI irq for smp_call still may come on that CPU.
>
> In case of kprobes irqs are disabled, but yeah for sockets smp_call won't help.

>From 'Documentation/kprobes.txt', looks irqs aren't disabled always, see blow:

    Probe handlers are run with preemption disabled.  Depending on the
    architecture and optimization state, handlers may also run with
    interrupts disabled (e.g., kretprobe handlers and optimized kprobe
    handlers run without interrupt disabled on x86/x86-64).

> Can probably use schedule_work_on(), but that's too heavy.
> I guess we need bpf_map_lookup_and_delete_elem() syscall command, so we can
> delete single pointer out of per-cpu hash map and in call_rcu() copy precise
> counters.

The partial update is one generic issue, not only on percpu map.

>
>> Also in current non-percpu hash, the situation exists too between
>> lookup elem syscall and updating value of element from bpf prog in
>> SMP.
>
> looks like regular bpf_map_lookup_elem() syscall will return inaccurate data
> even for per-cpu hash. hmm. we need to brain storm more on it.

That is the reason I don't like smp_call now, since the issue is generic
and not only on percpu map.

But any generic protection might introduce some cost in updating path
from eBPF prog, which we don't like too.

The partial update only exists when one element holds more than one
counter, or one element holds one 64bit counter on 32bit machine(which
can be thought as double counter too).

1) single counter case

- if the counter in the element may be updated concurrently, the counter
has to be updated with atomic operation in prog, and that is perpcu map's
value to avoid the atomic operation

- now no protection is needed since the updating on the element is atomic

2) multiple counter case

- lots of protection can be used, such per-element rw-spin, percpu lock,
srcu, ..., but each each one may introduce cost in update path of prog.

- prog code can choose if they want precise counting with the extra cost.

- the lock mechanism can be provided by bpf helpers


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-14  2:42                         ` Ming Lei
@ 2016-01-14  5:08                           ` Alexei Starovoitov
  2016-01-14  7:16                             ` Ming Lei
  0 siblings, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2016-01-14  5:08 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin KaFai Lau, Linux Kernel Mailing List, Alexei Starovoitov,
	David S. Miller, Network Development, Daniel Borkmann

On Thu, Jan 14, 2016 at 10:42:44AM +0800, Ming Lei wrote:
> >
> > In case of kprobes irqs are disabled, but yeah for sockets smp_call won't help.
> 
> From 'Documentation/kprobes.txt', looks irqs aren't disabled always, see blow:
> 
>     Probe handlers are run with preemption disabled.  Depending on the
>     architecture and optimization state, handlers may also run with
>     interrupts disabled (e.g., kretprobe handlers and optimized kprobe
>     handlers run without interrupt disabled on x86/x86-64).

bpf tracing progs go through ftrace that disables irqs even for
optimized kprobes on x64.
but yeah, there could be an arch that doesn't do it
and long term we probably want to do something about it on x64 as well.
tracepoints+bpf will be with irqs on as well.

> 2) multiple counter case
> 
> - lots of protection can be used, such per-element rw-spin, percpu lock,
> srcu, ..., but each each one may introduce cost in update path of prog.
> - the lock mechanism can be provided by bpf helpers

The above techniques cannot be easily used with bpf progs, since it would
require very significant additions to verifier.
Say we introduce a helper that takes some hidden lock and increments
the counter which is part of map element value. What will you pass into it?
An address of the counter? How verifier can statically check it?
Theoretically it's doable, it's quite complex and run-time performance
would be bad if we have to do lock,++,unlock for every counter.
Existing bpf_xadd insn is likely going to be faster despite cache line 
bouncing comparing to per-cpu lock,++,unlock

from your other email:
> 3) if we use syscall to implement Ri(i=1...3), the period between T(i)
> and T(i+1)
> can become quite big, for example dozens of seconds, so the accumulated value
> in A4 can't represent the actual/correct value(counter) at any time between T0
> and T4, and the value is wrong actually, and all events in above diagram
> (E0(0)~E0(2M), E1(0)~E1(1M),  E2(0) .... E2(10K), ...) aren't counted at all,
> and the missed number can be quite huge.
> So does the value got by A4 make sense for user?

yes it does. In your example it's number of packets received.
Regardless how slow or fast the per-cpu loop is the aggreate value is still valid.
The kernel is full of loops like:
 for_each_possible_cpu(cpu) {
   struct stats *pcpu = per_cpu_ptr(stats, cpu);
   sum1 += pcpu->cnt1;
   sum2 += pcpu->cnt2;
 }
and they compute valid values.
It doesn't matter how slow or fast that loop is.
Obviously the faster it is the more accurate the aggragtes will be,
but one can add mdelay() after each iteration and it's still valid.

Anyway, me and Martin had a discussion offline about this. To summarize:
. smp_call() is not a good approach, since it works only kprobe+bpf
. disable irqs for socket-style bpf programs is not an options either, since
  pushf/popf adds unnecessary overhead and having irqs off for the life of
  the program is bad
. optional irq off for bpf progs that use per-cpu maps is just as bad
. we can do bpf_map_lookup_and_delete() technique for per-cpu hash maps:
  delete elem and do for_each_possible_cpu() { copy values into buffer }
  from call_rcu() callback, but it needs extra sync wait logic in syscall,
  so complexity is probably not worth the gain, though nice that
  it's generic and works on all archs
. we can do for_each_possible_cpu() {atomic_long_memcpy of values} in
  bpf_map_lookup() syscall. since we know that hash map values are always
  8 byte aligned, atomic_long_memcpy() will be a loop of explicit
  4-byte or 8-byte copies on 32-bit and 64-bit archs respectively.
  User space would need to provide value_size*max_cpus buffer, which will
  be partially filled by kernel due to holes in possible_cpus mask.
  For #1 'counter' use case the userspace can bzero() the buffer
  and aggregate all slots ignoring possible holes, since they're zero.
  Doing syscall for each cpu is slower, since for 40+ cpus the cost adds up.
. bpf_map_update() becomes similar with atomic_long_memcpy
  The most appealing to me is that no new helpers needed and no new
  syscall commands. For per-cpu maps bpf_map_lookup/update() from kernel
  operates only on this_cpu() and bpf_map_lookup/update() from syscall
  use value_size*num_cpus buffer.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem
  2016-01-14  5:08                           ` Alexei Starovoitov
@ 2016-01-14  7:16                             ` Ming Lei
  0 siblings, 0 replies; 28+ messages in thread
From: Ming Lei @ 2016-01-14  7:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Linux Kernel Mailing List, Alexei Starovoitov,
	David S. Miller, Network Development, Daniel Borkmann

On Thu, Jan 14, 2016 at 1:08 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Thu, Jan 14, 2016 at 10:42:44AM +0800, Ming Lei wrote:
>> >
>> > In case of kprobes irqs are disabled, but yeah for sockets smp_call won't help.
>>
>> From 'Documentation/kprobes.txt', looks irqs aren't disabled always, see blow:
>>
>>     Probe handlers are run with preemption disabled.  Depending on the
>>     architecture and optimization state, handlers may also run with
>>     interrupts disabled (e.g., kretprobe handlers and optimized kprobe
>>     handlers run without interrupt disabled on x86/x86-64).
>
> bpf tracing progs go through ftrace that disables irqs even for
> optimized kprobes on x64.
> but yeah, there could be an arch that doesn't do it
> and long term we probably want to do something about it on x64 as well.
> tracepoints+bpf will be with irqs on as well.
>
>> 2) multiple counter case
>>
>> - lots of protection can be used, such per-element rw-spin, percpu lock,
>> srcu, ..., but each each one may introduce cost in update path of prog.
>> - the lock mechanism can be provided by bpf helpers
>
> The above techniques cannot be easily used with bpf progs, since it would
> require very significant additions to verifier.
> Say we introduce a helper that takes some hidden lock and increments
> the counter which is part of map element value. What will you pass into it?
> An address of the counter? How verifier can statically check it?
> Theoretically it's doable, it's quite complex and run-time performance
> would be bad if we have to do lock,++,unlock for every counter.
> Existing bpf_xadd insn is likely going to be faster despite cache line
> bouncing comparing to per-cpu lock,++,unlock

There are two simple approaches I thought of:

1) introduce two helpers of lookup_and_lock_element(map, key) &&
unlock_element(map, key)
- the disadvantage is that unlock_element() need one extra lookup
- verifier needn't any change

2) embedded one lock at the head of returned value
- looks a bit ugly
- it is tricky to obtain the lock in kernel(syscall path)
- still needn't verifier's change

Or other ideas?

>
> from your other email:
>> 3) if we use syscall to implement Ri(i=1...3), the period between T(i)
>> and T(i+1)
>> can become quite big, for example dozens of seconds, so the accumulated value
>> in A4 can't represent the actual/correct value(counter) at any time between T0
>> and T4, and the value is wrong actually, and all events in above diagram
>> (E0(0)~E0(2M), E1(0)~E1(1M),  E2(0) .... E2(10K), ...) aren't counted at all,
>> and the missed number can be quite huge.
>> So does the value got by A4 make sense for user?
>
> yes it does. In your example it's number of packets received.
> Regardless how slow or fast the per-cpu loop is the aggreate value is still valid.
> The kernel is full of loops like:
>  for_each_possible_cpu(cpu) {
>    struct stats *pcpu = per_cpu_ptr(stats, cpu);
>    sum1 += pcpu->cnt1;
>    sum2 += pcpu->cnt2;
>  }

Yes, that is the way I suggest to use instead of using nr_cpu syscall to
aggreate perpcu value, which kind of usage I never see before.

> and they compute valid values.
> It doesn't matter how slow or fast that loop is.

I don't think so, quantity breeds quality. In syscall path, there are
lots of possible and long delay(schedule out, memory allocation,
page in, ...) especially when system is in high loading. In the above
loop kernel is using, no all these possible delay.

> Obviously the faster it is the more accurate the aggragtes will be,
> but one can add mdelay() after each iteration and it's still valid.

It might be valid, but the aggragtes can deviate too much from the
correct value, and it becomes useless, then it doesn't matter about
the validity, does it?

If you don't object, I will try to figure out one patch to support
implementing the aggragate function from bpf prog and we can use
the kernel way to aggragate percpu value, then one single syscall
is enough. Looks one new prog type is needed, but it is very similar
with the sock filter usage.

> Anyway, me and Martin had a discussion offline about this. To summarize:
> . smp_call() is not a good approach, since it works only kprobe+bpf

Yes.

> . disable irqs for socket-style bpf programs is not an options either, since
>   pushf/popf adds unnecessary overhead and having irqs off for the life of
>   the program is bad

Agree.

> . optional irq off for bpf progs that use per-cpu maps is just as bad

Yes.

> . we can do bpf_map_lookup_and_delete() technique for per-cpu hash maps:
>   delete elem and do for_each_possible_cpu() { copy values into buffer }
>   from call_rcu() callback, but it needs extra sync wait logic in syscall,

call_rcu() callback often takes long time.

>   so complexity is probably not worth the gain, though nice that
>   it's generic and works on all archs

I suggest to not consider for percpu only, since it is a generic issue,
and the approach should cover current array/hash map.

> . we can do for_each_possible_cpu() {atomic_long_memcpy of values} in
>   bpf_map_lookup() syscall. since we know that hash map values are always
>   8 byte aligned, atomic_long_memcpy() will be a loop of explicit
>   4-byte or 8-byte copies on 32-bit and 64-bit archs respectively.
>   User space would need to provide value_size*max_cpus buffer, which will
>   be partially filled by kernel due to holes in possible_cpus mask.
>   For #1 'counter' use case the userspace can bzero() the buffer
>   and aggregate all slots ignoring possible holes, since they're zero.
>   Doing syscall for each cpu is slower, since for 40+ cpus the cost adds up.

Exactly, that is why I think it is good to define the aggregate function into
bpf prog code, then we can get the total value in single syscall.

> . bpf_map_update() becomes similar with atomic_long_memcpy
>   The most appealing to me is that no new helpers needed and no new

I agree with you about no new helpers.

>   syscall commands. For per-cpu maps bpf_map_lookup/update() from kernel

No new syscall means we have to aggregate the value from kernel via
bpf prog.

>   operates only on this_cpu() and bpf_map_lookup/update() from syscall
>   use value_size*num_cpus buffer.


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-01-14  7:16 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-11 15:56 [PATCH 0/9] bpf: support percpu ARRAY map Ming Lei
2016-01-11 15:56 ` [PATCH 1/9] bpf: prepare for moving map common stuff into one place Ming Lei
2016-01-11 18:24   ` kbuild test robot
2016-01-11 15:56 ` [PATCH 2/9] bpf: array map: use pre-defined nop map function Ming Lei
2016-01-11 19:08   ` Alexei Starovoitov
2016-01-11 15:56 ` [PATCH 3/9] bpf: introduce percpu verion of lookup/update in bpf_map_ops Ming Lei
2016-01-11 15:56 ` [PATCH 4/9] bpf: add percpu version of lookup/update element helpers Ming Lei
2016-01-11 15:56 ` [PATCH 5/9] bpf: syscall: add percpu version of lookup/update elem Ming Lei
2016-01-11 19:02   ` Alexei Starovoitov
2016-01-12  5:00     ` Ming Lei
2016-01-12  5:49       ` Alexei Starovoitov
2016-01-12 11:05         ` Ming Lei
2016-01-12 19:10           ` Martin KaFai Lau
2016-01-13  0:38             ` Ming Lei
2016-01-13  2:22               ` Martin KaFai Lau
2016-01-13  3:17                 ` Ming Lei
2016-01-13  5:30                   ` Alexei Starovoitov
2016-01-13 14:56                     ` Ming Lei
2016-01-14  1:19                       ` Alexei Starovoitov
2016-01-14  2:42                         ` Ming Lei
2016-01-14  5:08                           ` Alexei Starovoitov
2016-01-14  7:16                             ` Ming Lei
2016-01-11 15:56 ` [PATCH 6/9] bpf: arraymap: introduce BPF_MAP_TYPE_ARRAY_PERCPU Ming Lei
2016-01-11 19:14   ` Alexei Starovoitov
2016-01-11 15:56 ` [PATCH 7/9] sample/bpf: introduces helpers for percpu array example Ming Lei
2016-01-11 15:57 ` [PATCH 8/9] sample/bpf: sockex1: user percpu array map Ming Lei
2016-01-11 15:57 ` [PATCH 9/9] samples/bpf: test " Ming Lei
2016-01-12 15:44   ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.