bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr
@ 2023-08-27 15:27 Yonghong Song
  2023-08-27 15:27 ` [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Patch set [1] implemented cgroup local storage BPF_MAP_TYPE_CGRP_STORAGE
similar to sk/task/inode local storage and old BPF_MAP_TYPE_CGROUP_STORAGE
map is marked as deprecated since old BPF_MAP_TYPE_CGROUP_STORAGE map can
only work with current cgroup.

Similarly, the existing BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map
is a percpu version of BPF_MAP_TYPE_CGROUP_STORAGE and only works
with current cgroup. But there is no replacement which can work
with arbitrary cgroup.

This patch set solved this problem but adding support for local
percpu kptr. The map value can have a percpu kptr field which holds
a bpf prog allocated percpu data. The below is an example,

  struct percpu_val_t {
    ... fields ...
  }

  struct map_value_t {
    struct percpu_val_t __percpu_kptr *percpu_data_ptr;
  }

In the above, 'map_value_t' is the map value type for a
BPF_MAP_TYPE_CGRP_STORAGE map. User can access 'percpu_data_ptr'
and then read/write percpu data. This covers BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
and more. So BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map type
is marked as deprecated.

In additional, local percpu kptr supports the same map type
as other kptrs including hash, lru_hash, array, sk/inode/task/cgrp
local storage. Currently, percpu data structure does not support
non-scalars or special fields (e.g., bpf_spin_lock, bpf_rb_root, etc.).
They can be supported in the future if there exist use cases.

Please for individual patches for details.

  [1] https://lore.kernel.org/all/20221026042835.672317-1-yhs@fb.com/

Changelog:
  v2 -> v3:
    - fix libbpf_str test failure.
  v1 -> v2:
    - does not support special fields in percpu data structure.
    - rename __percpu attr to __percpu_kptr attr.
    - rename BPF_KPTR_PERCPU_REF to BPF_KPTR_PERCPU.
    - better code to handle bpf_{this,per}_cpu_ptr() helpers.
    - add more negative tests.
    - fix a bpftool related test failure.

Yonghong Song (13):
  bpf: Add support for non-fix-size percpu mem allocation
  bpf: Add BPF_KPTR_PERCPU as a field type
  bpf: Add alloc/xchg/direct_access support for local percpu kptr
  bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu
    obj
  selftests/bpf: Update error message in negative linked_list test
  libbpf: Add __percpu_kptr macro definition
  selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in
    bpf_experimental.h
  selftests/bpf: Add tests for array map with local percpu kptr
  bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible
  selftests/bpf: Remove unnecessary direct read of local percpu kptr
  selftests/bpf: Add tests for cgrp_local_storage with local percpu kptr
  selftests/bpf: Add some negative tests
  bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated

 include/linux/bpf.h                           |  22 +-
 include/linux/bpf_verifier.h                  |   1 +
 include/uapi/linux/bpf.h                      |   9 +-
 kernel/bpf/btf.c                              |   5 +
 kernel/bpf/core.c                             |   8 +-
 kernel/bpf/helpers.c                          |  16 ++
 kernel/bpf/memalloc.c                         |  14 +-
 kernel/bpf/syscall.c                          |   4 +
 kernel/bpf/verifier.c                         | 191 +++++++++++++++---
 tools/include/uapi/linux/bpf.h                |   9 +-
 tools/lib/bpf/bpf_helpers.h                   |   1 +
 .../testing/selftests/bpf/bpf_experimental.h  |  31 +++
 .../selftests/bpf/prog_tests/libbpf_str.c     |   6 +-
 .../selftests/bpf/prog_tests/linked_list.c    |   4 +-
 .../selftests/bpf/prog_tests/percpu_alloc.c   | 125 ++++++++++++
 .../selftests/bpf/progs/percpu_alloc_array.c  | 183 +++++++++++++++++
 .../progs/percpu_alloc_cgrp_local_storage.c   | 105 ++++++++++
 .../selftests/bpf/progs/percpu_alloc_fail.c   | 164 +++++++++++++++
 .../selftests/bpf/test_bpftool_synctypes.py   |   9 +
 19 files changed, 853 insertions(+), 54 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_array.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_fail.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
@ 2023-08-27 15:27 ` Yonghong Song
  2023-09-13 14:10   ` Hou Tao
  2023-11-15 15:31   ` Heiko Carstens
  2023-08-27 15:27 ` [PATCH bpf-next v3 02/13] bpf: Add BPF_KPTR_PERCPU as a field type Yonghong Song
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

This is needed for later percpu mem allocation when the
allocation is done by bpf program. For such cases, a global
bpf_global_percpu_ma is added where a flexible allocation
size is needed.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h   |  4 ++--
 kernel/bpf/core.c     |  8 +++++---
 kernel/bpf/memalloc.c | 14 ++++++--------
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 12596af59c00..144dbddf53bd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -55,8 +55,8 @@ struct cgroup;
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
 extern struct kobject *btf_kobj;
-extern struct bpf_mem_alloc bpf_global_ma;
-extern bool bpf_global_ma_set;
+extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
+extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
 
 typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
 typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 0f8f036d8bd1..95599df82ee4 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -64,8 +64,8 @@
 #define OFF	insn->off
 #define IMM	insn->imm
 
-struct bpf_mem_alloc bpf_global_ma;
-bool bpf_global_ma_set;
+struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
+bool bpf_global_ma_set, bpf_global_percpu_ma_set;
 
 /* No hurry in this branch
  *
@@ -2921,7 +2921,9 @@ static int __init bpf_global_ma_init(void)
 
 	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
 	bpf_global_ma_set = !ret;
-	return ret;
+	ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
+	bpf_global_percpu_ma_set = !ret;
+	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
 }
 late_initcall(bpf_global_ma_init);
 #endif
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 9c49ae53deaf..cb60445de98a 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -499,15 +499,16 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
 	struct obj_cgroup *objcg = NULL;
 	int cpu, i, unit_size, percpu_size = 0;
 
+	/* room for llist_node and per-cpu pointer */
+	if (percpu)
+		percpu_size = LLIST_NODE_SZ + sizeof(void *);
+
 	if (size) {
 		pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL);
 		if (!pc)
 			return -ENOMEM;
 
-		if (percpu)
-			/* room for llist_node and per-cpu pointer */
-			percpu_size = LLIST_NODE_SZ + sizeof(void *);
-		else
+		if (!percpu)
 			size += LLIST_NODE_SZ; /* room for llist_node */
 		unit_size = size;
 
@@ -527,10 +528,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
 		return 0;
 	}
 
-	/* size == 0 && percpu is an invalid combination */
-	if (WARN_ON_ONCE(percpu))
-		return -EINVAL;
-
 	pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL);
 	if (!pcc)
 		return -ENOMEM;
@@ -543,6 +540,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
 			c = &cc->cache[i];
 			c->unit_size = sizes[i];
 			c->objcg = objcg;
+			c->percpu_size = percpu_size;
 			c->tgt = c;
 			prefill_mem_cache(c, cpu);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 02/13] bpf: Add BPF_KPTR_PERCPU as a field type
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
  2023-08-27 15:27 ` [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
@ 2023-08-27 15:27 ` Yonghong Song
  2023-08-27 15:27 ` [PATCH bpf-next v3 03/13] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

BPF_KPTR_PERCPU represents a percpu field type like below

  struct val_t {
    ... fields ...
  };
  struct t {
    ...
    struct val_t __percpu_kptr *percpu_data_ptr;
    ...
  };

where
  #define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr")))

While BPF_KPTR_REF points to a trusted kernel object or a trusted
local object, BPF_KPTR_PERCPU points to a trusted local
percpu object.

This patch added basic support for BPF_KPTR_PERCPU
related to percpu_kptr field parsing, recording and free operations.
BPF_KPTR_PERCPU also supports the same map types
as BPF_KPTR_REF does.

Note that unlike a local kptr, it is possible that
a BPF_KTPR_PERCPU struct may not contain any
special fields like other kptr, bpf_spin_lock, bpf_list_head, etc.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf.h  | 18 ++++++++++++------
 kernel/bpf/btf.c     |  5 +++++
 kernel/bpf/syscall.c |  4 ++++
 3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 144dbddf53bd..d7ef65846903 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -180,14 +180,15 @@ enum btf_field_type {
 	BPF_TIMER      = (1 << 1),
 	BPF_KPTR_UNREF = (1 << 2),
 	BPF_KPTR_REF   = (1 << 3),
-	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF,
-	BPF_LIST_HEAD  = (1 << 4),
-	BPF_LIST_NODE  = (1 << 5),
-	BPF_RB_ROOT    = (1 << 6),
-	BPF_RB_NODE    = (1 << 7),
+	BPF_KPTR_PERCPU = (1 << 4),
+	BPF_KPTR       = BPF_KPTR_UNREF | BPF_KPTR_REF | BPF_KPTR_PERCPU,
+	BPF_LIST_HEAD  = (1 << 5),
+	BPF_LIST_NODE  = (1 << 6),
+	BPF_RB_ROOT    = (1 << 7),
+	BPF_RB_NODE    = (1 << 8),
 	BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
 				 BPF_RB_NODE | BPF_RB_ROOT,
-	BPF_REFCOUNT   = (1 << 8),
+	BPF_REFCOUNT   = (1 << 9),
 };
 
 typedef void (*btf_dtor_kfunc_t)(void *);
@@ -300,6 +301,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
 		return "kptr";
+	case BPF_KPTR_PERCPU:
+		return "percpu_kptr";
 	case BPF_LIST_HEAD:
 		return "bpf_list_head";
 	case BPF_LIST_NODE:
@@ -325,6 +328,7 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
 		return sizeof(struct bpf_timer);
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
+	case BPF_KPTR_PERCPU:
 		return sizeof(u64);
 	case BPF_LIST_HEAD:
 		return sizeof(struct bpf_list_head);
@@ -351,6 +355,7 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
 		return __alignof__(struct bpf_timer);
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
+	case BPF_KPTR_PERCPU:
 		return __alignof__(u64);
 	case BPF_LIST_HEAD:
 		return __alignof__(struct bpf_list_head);
@@ -389,6 +394,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
 	case BPF_TIMER:
 	case BPF_KPTR_UNREF:
 	case BPF_KPTR_REF:
+	case BPF_KPTR_PERCPU:
 		break;
 	default:
 		WARN_ON_ONCE(1);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 249657c466dd..d98b76c128db 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3293,6 +3293,8 @@ static int btf_find_kptr(const struct btf *btf, const struct btf_type *t,
 		type = BPF_KPTR_UNREF;
 	else if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
 		type = BPF_KPTR_REF;
+	else if (!strcmp("percpu_kptr", __btf_name_by_offset(btf, t->name_off)))
+		type = BPF_KPTR_PERCPU;
 	else
 		return -EINVAL;
 
@@ -3457,6 +3459,7 @@ static int btf_find_struct_field(const struct btf *btf,
 			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU:
 			ret = btf_find_kptr(btf, member_type, off, sz,
 					    idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3523,6 +3526,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
 			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU:
 			ret = btf_find_kptr(btf, var_type, off, sz,
 					    idx < info_cnt ? &info[idx] : &tmp);
 			if (ret < 0)
@@ -3783,6 +3787,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 			break;
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU:
 			ret = btf_parse_kptr(btf, &rec->fields[i], &info_arr[i]);
 			if (ret < 0)
 				goto end;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ebeb0695305a..e25e7e059d7d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -514,6 +514,7 @@ void btf_record_free(struct btf_record *rec)
 		switch (rec->fields[i].type) {
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU:
 			if (rec->fields[i].kptr.module)
 				module_put(rec->fields[i].kptr.module);
 			btf_put(rec->fields[i].kptr.btf);
@@ -560,6 +561,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
 		switch (fields[i].type) {
 		case BPF_KPTR_UNREF:
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU:
 			btf_get(fields[i].kptr.btf);
 			if (fields[i].kptr.module && !try_module_get(fields[i].kptr.module)) {
 				ret = -ENXIO;
@@ -650,6 +652,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 			WRITE_ONCE(*(u64 *)field_ptr, 0);
 			break;
 		case BPF_KPTR_REF:
+		case BPF_KPTR_PERCPU:
 			xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
 			if (!xchgd_field)
 				break;
@@ -1045,6 +1048,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 				break;
 			case BPF_KPTR_UNREF:
 			case BPF_KPTR_REF:
+			case BPF_KPTR_PERCPU:
 			case BPF_REFCOUNT:
 				if (map->map_type != BPF_MAP_TYPE_HASH &&
 				    map->map_type != BPF_MAP_TYPE_PERCPU_HASH &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 03/13] bpf: Add alloc/xchg/direct_access support for local percpu kptr
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
  2023-08-27 15:27 ` [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
  2023-08-27 15:27 ` [PATCH bpf-next v3 02/13] bpf: Add BPF_KPTR_PERCPU as a field type Yonghong Song
@ 2023-08-27 15:27 ` Yonghong Song
  2023-09-06  0:40   ` Alexei Starovoitov
  2023-08-27 15:27 ` [PATCH bpf-next v3 04/13] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add two new kfunc's, bpf_percpu_obj_new_impl() and
bpf_percpu_obj_drop_impl(), to allocate a percpu obj.
Two functions are very similar to bpf_obj_new_impl()
and bpf_obj_drop_impl(). The major difference is related
to percpu handling.

    bpf_rcu_read_lock()
    struct val_t __percpu_kptr *v = map_val->percpu_data;
    ...
    bpf_rcu_read_unlock()

For a percpu data map_val like above 'v', the reg->type
is set as
	PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU
if inside rcu critical section.

MEM_RCU marking here is similar to NON_OWN_REF as 'v'
is not a owning reference. But NON_OWN_REF is
trusted and typically inside the spinlock while
MEM_RCU is under rcu read lock. RCU is preferred here
since percpu data structures mean potential concurrent
access into its contents.

Also, bpf_percpu_obj_new_impl() is restricted such that
no pointers or special fields are allowed. Therefore,
the bpf_list_head and bpf_rb_root will not be supported
in this patch set to avoid potential memory leak issue
due to racing between bpf_obj_free_fields() and another
bpf_kptr_xchg() moving an allocated object to
bpf_list_head and bpf_rb_root.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/helpers.c  |  16 ++++++
 kernel/bpf/verifier.c | 112 +++++++++++++++++++++++++++++++++---------
 2 files changed, 106 insertions(+), 22 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8bd3812fb8df..b0a9834f1051 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1902,6 +1902,14 @@ __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
 	return p;
 }
 
+__bpf_kfunc void *bpf_percpu_obj_new_impl(u64 local_type_id__k, void *meta__ign)
+{
+	u64 size = local_type_id__k;
+
+	/* The verifier has ensured that meta__ign must be NULL */
+	return bpf_mem_alloc(&bpf_global_percpu_ma, size);
+}
+
 /* Must be called under migrate_disable(), as required by bpf_mem_free */
 void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
 {
@@ -1930,6 +1938,12 @@ __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
 	__bpf_obj_drop_impl(p, meta ? meta->record : NULL);
 }
 
+__bpf_kfunc void bpf_percpu_obj_drop_impl(void *p__alloc, void *meta__ign)
+{
+	/* The verifier has ensured that meta__ign must be NULL */
+	bpf_mem_free_rcu(&bpf_global_percpu_ma, p__alloc);
+}
+
 __bpf_kfunc void *bpf_refcount_acquire_impl(void *p__refcounted_kptr, void *meta__ign)
 {
 	struct btf_struct_meta *meta = meta__ign;
@@ -2442,7 +2456,9 @@ BTF_SET8_START(generic_btf_ids)
 BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
 #endif
 BTF_ID_FLAGS(func, bpf_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_percpu_obj_new_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_obj_drop_impl, KF_RELEASE)
+BTF_ID_FLAGS(func, bpf_percpu_obj_drop_impl, KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_refcount_acquire_impl, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_push_front_impl)
 BTF_ID_FLAGS(func, bpf_list_push_back_impl)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bb78212fa5b2..6c886ead18f6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -304,7 +304,7 @@ struct bpf_kfunc_call_arg_meta {
 	/* arg_{btf,btf_id,owning_ref} are used by kfunc-specific handling,
 	 * generally to pass info about user-defined local kptr types to later
 	 * verification logic
-	 *   bpf_obj_drop
+	 *   bpf_obj_drop/bpf_percpu_obj_drop
 	 *     Record the local kptr type to be drop'd
 	 *   bpf_refcount_acquire (via KF_ARG_PTR_TO_REFCOUNTED_KPTR arg type)
 	 *     Record the local kptr type to be refcount_incr'd and use
@@ -5001,6 +5001,8 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 			perm_flags |= PTR_UNTRUSTED;
 	} else {
 		perm_flags = PTR_MAYBE_NULL | MEM_ALLOC;
+		if (kptr_field->type == BPF_KPTR_PERCPU)
+			perm_flags |= MEM_PERCPU;
 	}
 
 	if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags))
@@ -5044,7 +5046,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
 	 */
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
 				  kptr_field->kptr.btf, kptr_field->kptr.btf_id,
-				  kptr_field->type == BPF_KPTR_REF))
+				  kptr_field->type != BPF_KPTR_UNREF))
 		goto bad_type;
 	return 0;
 bad_type:
@@ -5088,7 +5090,18 @@ static bool rcu_safe_kptr(const struct btf_field *field)
 {
 	const struct btf_field_kptr *kptr = &field->kptr;
 
-	return field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id);
+	return field->type == BPF_KPTR_PERCPU ||
+	       (field->type == BPF_KPTR_REF && rcu_protected_object(kptr->btf, kptr->btf_id));
+}
+
+static u32 btf_ld_kptr_type(struct bpf_verifier_env *env, struct btf_field *kptr_field)
+{
+	if (rcu_safe_kptr(kptr_field) && in_rcu_cs(env)) {
+		if (kptr_field->type != BPF_KPTR_PERCPU)
+			return PTR_MAYBE_NULL | MEM_RCU;
+		return PTR_MAYBE_NULL | MEM_RCU | MEM_PERCPU;
+	}
+	return PTR_MAYBE_NULL | PTR_UNTRUSTED;
 }
 
 static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
@@ -5114,7 +5127,8 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	/* We only allow loading referenced kptr, since it will be marked as
 	 * untrusted, similar to unreferenced kptr.
 	 */
-	if (class != BPF_LDX && kptr_field->type == BPF_KPTR_REF) {
+	if (class != BPF_LDX &&
+	    (kptr_field->type == BPF_KPTR_REF || kptr_field->type == BPF_KPTR_PERCPU)) {
 		verbose(env, "store to referenced kptr disallowed\n");
 		return -EACCES;
 	}
@@ -5125,10 +5139,7 @@ static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 		 * value from map as PTR_TO_BTF_ID, with the correct type.
 		 */
 		mark_btf_ld_reg(env, cur_regs(env), value_regno, PTR_TO_BTF_ID, kptr_field->kptr.btf,
-				kptr_field->kptr.btf_id,
-				rcu_safe_kptr(kptr_field) && in_rcu_cs(env) ?
-				PTR_MAYBE_NULL | MEM_RCU :
-				PTR_MAYBE_NULL | PTR_UNTRUSTED);
+				kptr_field->kptr.btf_id, btf_ld_kptr_type(env, kptr_field));
 		/* For mark_ptr_or_null_reg */
 		val_reg->id = ++env->id_gen;
 	} else if (class == BPF_STX) {
@@ -5182,6 +5193,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
 			switch (field->type) {
 			case BPF_KPTR_UNREF:
 			case BPF_KPTR_REF:
+			case BPF_KPTR_PERCPU:
 				if (src != ACCESS_DIRECT) {
 					verbose(env, "kptr cannot be accessed indirectly by helper\n");
 					return -EACCES;
@@ -7320,7 +7332,7 @@ static int process_kptr_func(struct bpf_verifier_env *env, int regno,
 		verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
 		return -EACCES;
 	}
-	if (kptr_field->type != BPF_KPTR_REF) {
+	if (kptr_field->type != BPF_KPTR_REF && kptr_field->type != BPF_KPTR_PERCPU) {
 		verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
 		return -EACCES;
 	}
@@ -7831,8 +7843,10 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	if (base_type(arg_type) == ARG_PTR_TO_MEM)
 		type &= ~DYNPTR_TYPE_FLAG_MASK;
 
-	if (meta->func_id == BPF_FUNC_kptr_xchg && type_is_alloc(type))
+	if (meta->func_id == BPF_FUNC_kptr_xchg && type_is_alloc(type)) {
 		type &= ~MEM_ALLOC;
+		type &= ~MEM_PERCPU;
+	}
 
 	for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
 		expected = compatible->types[i];
@@ -7915,6 +7929,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 		break;
 	}
 	case PTR_TO_BTF_ID | MEM_ALLOC:
+	case PTR_TO_BTF_ID | MEM_PERCPU | MEM_ALLOC:
 		if (meta->func_id != BPF_FUNC_spin_lock && meta->func_id != BPF_FUNC_spin_unlock &&
 		    meta->func_id != BPF_FUNC_kptr_xchg) {
 			verbose(env, "verifier internal error: unimplemented handling of MEM_ALLOC\n");
@@ -9882,8 +9897,11 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		if (func_id == BPF_FUNC_kptr_xchg) {
 			ret_btf = meta.kptr_field->kptr.btf;
 			ret_btf_id = meta.kptr_field->kptr.btf_id;
-			if (!btf_is_kernel(ret_btf))
+			if (!btf_is_kernel(ret_btf)) {
 				regs[BPF_REG_0].type |= MEM_ALLOC;
+				if (meta.kptr_field->type == BPF_KPTR_PERCPU)
+					regs[BPF_REG_0].type |= MEM_PERCPU;
+			}
 		} else {
 			if (fn->ret_btf_id == BPF_PTR_POISON) {
 				verbose(env, "verifier internal error:");
@@ -10268,6 +10286,8 @@ enum special_kfunc_type {
 	KF_bpf_dynptr_slice,
 	KF_bpf_dynptr_slice_rdwr,
 	KF_bpf_dynptr_clone,
+	KF_bpf_percpu_obj_new_impl,
+	KF_bpf_percpu_obj_drop_impl,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -10288,6 +10308,8 @@ BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
+BTF_ID(func, bpf_percpu_obj_new_impl)
+BTF_ID(func, bpf_percpu_obj_drop_impl)
 BTF_SET_END(special_kfunc_set)
 
 BTF_ID_LIST(special_kfunc_list)
@@ -10310,6 +10332,8 @@ BTF_ID(func, bpf_dynptr_from_xdp)
 BTF_ID(func, bpf_dynptr_slice)
 BTF_ID(func, bpf_dynptr_slice_rdwr)
 BTF_ID(func, bpf_dynptr_clone)
+BTF_ID(func, bpf_percpu_obj_new_impl)
+BTF_ID(func, bpf_percpu_obj_drop_impl)
 
 static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
 {
@@ -11004,7 +11028,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 			}
 			break;
 		case KF_ARG_PTR_TO_ALLOC_BTF_ID:
-			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
+			if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC)) {
+				if (meta->func_id != special_kfunc_list[KF_bpf_obj_drop_impl]) {
+					verbose(env, "arg#%d expected for bpf_obj_drop_impl()\n", i);
+					return -EINVAL;
+				}
+			} else if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC | MEM_PERCPU)) {
+				if (meta->func_id != special_kfunc_list[KF_bpf_percpu_obj_drop_impl]) {
+					verbose(env, "arg#%d expected for bpf_percpu_obj_drop_impl()\n", i);
+					return -EINVAL;
+				}
+			} else {
 				verbose(env, "arg#%d expected pointer to allocated object\n", i);
 				return -EINVAL;
 			}
@@ -11012,8 +11046,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 				verbose(env, "allocated object must be referenced\n");
 				return -EINVAL;
 			}
-			if (meta->btf == btf_vmlinux &&
-			    meta->func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) {
+			if (meta->btf == btf_vmlinux) {
 				meta->arg_btf = reg->btf;
 				meta->arg_btf_id = reg->btf_id;
 			}
@@ -11413,6 +11446,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		/* Only exception is bpf_obj_new_impl */
 		if (meta.btf != btf_vmlinux ||
 		    (meta.func_id != special_kfunc_list[KF_bpf_obj_new_impl] &&
+		     meta.func_id != special_kfunc_list[KF_bpf_percpu_obj_new_impl] &&
 		     meta.func_id != special_kfunc_list[KF_bpf_refcount_acquire_impl])) {
 			verbose(env, "acquire kernel function does not return PTR_TO_BTF_ID\n");
 			return -EINVAL;
@@ -11426,11 +11460,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		ptr_type = btf_type_skip_modifiers(desc_btf, t->type, &ptr_type_id);
 
 		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
-			if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
+			if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] ||
+			    meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
+				struct btf_struct_meta *struct_meta;
 				struct btf *ret_btf;
 				u32 ret_btf_id;
 
-				if (unlikely(!bpf_global_ma_set))
+				if (meta.func_id == special_kfunc_list[KF_bpf_obj_new_impl] && !bpf_global_ma_set)
+					return -ENOMEM;
+
+				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && !bpf_global_percpu_ma_set)
 					return -ENOMEM;
 
 				if (((u64)(u32)meta.arg_constant.value) != meta.arg_constant.value) {
@@ -11443,24 +11482,38 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 
 				/* This may be NULL due to user not supplying a BTF */
 				if (!ret_btf) {
-					verbose(env, "bpf_obj_new requires prog BTF\n");
+					verbose(env, "bpf_obj_new/bpf_percpu_obj_new requires prog BTF\n");
 					return -EINVAL;
 				}
 
 				ret_t = btf_type_by_id(ret_btf, ret_btf_id);
 				if (!ret_t || !__btf_type_is_struct(ret_t)) {
-					verbose(env, "bpf_obj_new type ID argument must be of a struct\n");
+					verbose(env, "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct\n");
 					return -EINVAL;
 				}
 
+				struct_meta = btf_find_struct_meta(ret_btf, ret_btf_id);
+				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
+					if (!__btf_type_is_scalar_struct(env, ret_btf, ret_t, 0)) {
+						verbose(env, "bpf_percpu_obj_new type ID argument must be of a struct of scalars\n");
+						return -EINVAL;
+					}
+
+					if (struct_meta) {
+						verbose(env, "bpf_percpu_obj_new type ID argument must not contain special fields\n");
+						return -EINVAL;
+					}
+				}
+
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
 				regs[BPF_REG_0].btf = ret_btf;
 				regs[BPF_REG_0].btf_id = ret_btf_id;
+				if (meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl])
+					regs[BPF_REG_0].type |= MEM_PERCPU;
 
 				insn_aux->obj_new_size = ret_t->size;
-				insn_aux->kptr_struct_meta =
-					btf_find_struct_meta(ret_btf, ret_btf_id);
+				insn_aux->kptr_struct_meta = struct_meta;
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]) {
 				mark_reg_known_zero(env, regs, BPF_REG_0);
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC;
@@ -11597,7 +11650,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			regs[BPF_REG_0].id = ++env->id_gen;
 	} else if (btf_type_is_void(t)) {
 		if (meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id)) {
-			if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl]) {
+			if (meta.func_id == special_kfunc_list[KF_bpf_obj_drop_impl] ||
+			    meta.func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl]) {
 				insn_aux->kptr_struct_meta =
 					btf_find_struct_meta(meta.arg_btf,
 							     meta.arg_btf_id);
@@ -18266,21 +18320,35 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		insn->imm = BPF_CALL_IMM(desc->addr);
 	if (insn->off)
 		return 0;
-	if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl]) {
+	if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl] ||
+	    desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
 		u64 obj_new_size = env->insn_aux_data[insn_idx].obj_new_size;
 
+		if (desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl] && kptr_struct_meta) {
+			verbose(env, "verifier internal error: NULL kptr_struct_meta expected at insn_idx %d\n",
+				insn_idx);
+			return -EFAULT;
+		}
+
 		insn_buf[0] = BPF_MOV64_IMM(BPF_REG_1, obj_new_size);
 		insn_buf[1] = addr[0];
 		insn_buf[2] = addr[1];
 		insn_buf[3] = *insn;
 		*cnt = 4;
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_obj_drop_impl] ||
+		   desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl] ||
 		   desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]) {
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
 
+		if (desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_drop_impl] && kptr_struct_meta) {
+			verbose(env, "verifier internal error: NULL kptr_struct_meta expected at insn_idx %d\n",
+				insn_idx);
+			return -EFAULT;
+		}
+
 		if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
 		    !kptr_struct_meta) {
 			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 04/13] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (2 preceding siblings ...)
  2023-08-27 15:27 ` [PATCH bpf-next v3 03/13] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
@ 2023-08-27 15:27 ` Yonghong Song
  2023-08-27 15:27 ` [PATCH bpf-next v3 05/13] selftests/bpf: Update error message in negative linked_list test Yonghong Song
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

The bpf helpers bpf_this_cpu_ptr() and bpf_per_cpu_ptr() are re-purposed
for allocated percpu objects. For an allocated percpu obj,
the reg type is 'PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU'.

The return type for these two re-purposed helpera is
'PTR_TO_MEM | MEM_RCU | MEM_ALLOC'.
The MEM_ALLOC allows that the per-cpu data can be read and written.

Since the memory allocator bpf_mem_alloc() returns
a ptr to a percpu ptr for percpu data, the first argument
of bpf_this_cpu_ptr() and bpf_per_cpu_ptr() is patched
with a dereference before passing to the helper func.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/linux/bpf_verifier.h |  1 +
 kernel/bpf/verifier.c        | 59 +++++++++++++++++++++++++++++++-----
 2 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index b6e58dab8e27..a3236651ec64 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -480,6 +480,7 @@ struct bpf_insn_aux_data {
 	bool zext_dst; /* this insn zero extends dst reg */
 	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
 	bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
+	bool call_with_percpu_alloc_ptr; /* {this,per}_cpu_ptr() with prog percpu alloc */
 	u8 alu_state; /* used in combination with alu_limit */
 
 	/* below fields are initialized once */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6c886ead18f6..6b7e7ca611f3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6221,7 +6221,7 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		}
 
 		if (type_is_alloc(reg->type) && !type_is_non_owning_ref(reg->type) &&
-		    !reg->ref_obj_id) {
+		    !(reg->type & MEM_RCU) && !reg->ref_obj_id) {
 			verbose(env, "verifier internal error: ref_obj_id for allocated object must be non-zero\n");
 			return -EFAULT;
 		}
@@ -7765,6 +7765,7 @@ static const struct bpf_reg_types btf_ptr_types = {
 static const struct bpf_reg_types percpu_btf_ptr_types = {
 	.types = {
 		PTR_TO_BTF_ID | MEM_PERCPU,
+		PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU,
 		PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED,
 	}
 };
@@ -7941,6 +7942,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 		}
 		break;
 	case PTR_TO_BTF_ID | MEM_PERCPU:
+	case PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU:
 	case PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED:
 		/* Handled by helper specific checks */
 		break;
@@ -9547,6 +9549,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			     int *insn_idx_p)
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
+	bool returns_cpu_specific_alloc_ptr = false;
 	const struct bpf_func_proto *fn = NULL;
 	enum bpf_return_type ret_type;
 	enum bpf_type_flag ret_flag;
@@ -9785,6 +9788,23 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 
 		break;
 	}
+	case BPF_FUNC_per_cpu_ptr:
+	case BPF_FUNC_this_cpu_ptr:
+	{
+		struct bpf_reg_state *reg = &regs[BPF_REG_1];
+		const struct btf_type *type;
+
+		if (reg->type & MEM_RCU) {
+			type = btf_type_by_id(reg->btf, reg->btf_id);
+			if (!type || !btf_type_is_struct(type)) {
+				verbose(env, "Helper has invalid btf/btf_id in R1\n");
+				return -EFAULT;
+			}
+			returns_cpu_specific_alloc_ptr = true;
+			env->insn_aux_data[insn_idx].call_with_percpu_alloc_ptr = true;
+		}
+		break;
+	}
 	case BPF_FUNC_user_ringbuf_drain:
 		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
 					set_user_ringbuf_callback_state);
@@ -9874,14 +9894,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
 			regs[BPF_REG_0].mem_size = tsize;
 		} else {
-			/* MEM_RDONLY may be carried from ret_flag, but it
-			 * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise
-			 * it will confuse the check of PTR_TO_BTF_ID in
-			 * check_mem_access().
-			 */
-			ret_flag &= ~MEM_RDONLY;
+			if (returns_cpu_specific_alloc_ptr) {
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | MEM_ALLOC | MEM_RCU;
+			} else {
+				/* MEM_RDONLY may be carried from ret_flag, but it
+				 * doesn't apply on PTR_TO_BTF_ID. Fold it, otherwise
+				 * it will confuse the check of PTR_TO_BTF_ID in
+				 * check_mem_access().
+				 */
+				ret_flag &= ~MEM_RDONLY;
+				regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
+			}
 
-			regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
 			regs[BPF_REG_0].btf = meta.ret_btf;
 			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
 		}
@@ -18676,6 +18700,25 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			goto patch_call_imm;
 		}
 
+		/* bpf_per_cpu_ptr() and bpf_this_cpu_ptr() */
+		if (env->insn_aux_data[i + delta].call_with_percpu_alloc_ptr) {
+			/* patch with 'r1 = *(u64 *)(r1 + 0)' since for percpu data,
+			 * bpf_mem_alloc() returns a ptr to the percpu data ptr.
+			 */
+			insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, 0);
+			insn_buf[1] = *insn;
+			cnt = 2;
+
+			new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
+			if (!new_prog)
+				return -ENOMEM;
+
+			delta += cnt - 1;
+			env->prog = prog = new_prog;
+			insn = new_prog->insnsi + i + delta;
+			goto patch_call_imm;
+		}
+
 		/* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
 		 * and other inlining handlers are currently limited to 64 bit
 		 * only.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 05/13] selftests/bpf: Update error message in negative linked_list test
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (3 preceding siblings ...)
  2023-08-27 15:27 ` [PATCH bpf-next v3 04/13] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
@ 2023-08-27 15:27 ` Yonghong Song
  2023-08-27 15:28 ` [PATCH bpf-next v3 06/13] libbpf: Add __percpu_kptr macro definition Yonghong Song
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:27 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Some error messages are changed due to the addition of
percpu kptr support. Fix linked_list test with changed
error messages.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/testing/selftests/bpf/prog_tests/linked_list.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/linked_list.c b/tools/testing/selftests/bpf/prog_tests/linked_list.c
index 18cf7b17463d..db3bf6bbe01a 100644
--- a/tools/testing/selftests/bpf/prog_tests/linked_list.c
+++ b/tools/testing/selftests/bpf/prog_tests/linked_list.c
@@ -65,8 +65,8 @@ static struct {
 	{ "map_compat_raw_tp", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
 	{ "map_compat_raw_tp_w", "tracing progs cannot use bpf_{list_head,rb_root} yet" },
 	{ "obj_type_id_oor", "local type ID argument must be in range [0, U32_MAX]" },
-	{ "obj_new_no_composite", "bpf_obj_new type ID argument must be of a struct" },
-	{ "obj_new_no_struct", "bpf_obj_new type ID argument must be of a struct" },
+	{ "obj_new_no_composite", "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct" },
+	{ "obj_new_no_struct", "bpf_obj_new/bpf_percpu_obj_new type ID argument must be of a struct" },
 	{ "obj_drop_non_zero_off", "R1 must have zero offset when passed to release func" },
 	{ "new_null_ret", "R0 invalid mem access 'ptr_or_null_'" },
 	{ "obj_new_acq", "Unreleased reference id=" },
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 06/13] libbpf: Add __percpu_kptr macro definition
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (4 preceding siblings ...)
  2023-08-27 15:27 ` [PATCH bpf-next v3 05/13] selftests/bpf: Update error message in negative linked_list test Yonghong Song
@ 2023-08-27 15:28 ` Yonghong Song
  2023-08-27 15:28 ` [PATCH bpf-next v3 07/13] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Yonghong Song
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add __percpu_kptr macro definition in bpf_helpers.h.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/lib/bpf/bpf_helpers.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index bbab9ad9dc5a..77ceea575dc7 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -181,6 +181,7 @@ enum libbpf_tristate {
 #define __ksym __attribute__((section(".ksyms")))
 #define __kptr_untrusted __attribute__((btf_type_tag("kptr_untrusted")))
 #define __kptr __attribute__((btf_type_tag("kptr")))
+#define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr")))
 
 #define bpf_ksym_exists(sym) ({									\
 	_Static_assert(!__builtin_constant_p(!!sym), #sym " should be marked as __weak");	\
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 07/13] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (5 preceding siblings ...)
  2023-08-27 15:28 ` [PATCH bpf-next v3 06/13] libbpf: Add __percpu_kptr macro definition Yonghong Song
@ 2023-08-27 15:28 ` Yonghong Song
  2023-08-27 15:28 ` [PATCH bpf-next v3 08/13] selftests/bpf: Add tests for array map with local percpu kptr Yonghong Song
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

The new macro bpf_percpu_obj_{new/drop}() is very similar to bpf_obj_{new,drop}()
as they both take a type as the argument.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../testing/selftests/bpf/bpf_experimental.h  | 31 +++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 209811b1993a..4494eaa9937e 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -131,4 +131,35 @@ extern int bpf_rbtree_add_impl(struct bpf_rb_root *root, struct bpf_rb_node *nod
  */
 extern struct bpf_rb_node *bpf_rbtree_first(struct bpf_rb_root *root) __ksym;
 
+/* Description
+ *	Allocates a percpu object of the type represented by 'local_type_id' in
+ *	program BTF. User may use the bpf_core_type_id_local macro to pass the
+ *	type ID of a struct in program BTF.
+ *
+ *	The 'local_type_id' parameter must be a known constant.
+ *	The 'meta' parameter is rewritten by the verifier, no need for BPF
+ *	program to set it.
+ * Returns
+ *	A pointer to a percpu object of the type corresponding to the passed in
+ *	'local_type_id', or NULL on failure.
+ */
+extern void *bpf_percpu_obj_new_impl(__u64 local_type_id, void *meta) __ksym;
+
+/* Convenience macro to wrap over bpf_percpu_obj_new_impl */
+#define bpf_percpu_obj_new(type) ((type __percpu_kptr *)bpf_percpu_obj_new_impl(bpf_core_type_id_local(type), NULL))
+
+/* Description
+ *	Free an allocated percpu object. All fields of the object that require
+ *	destruction will be destructed before the storage is freed.
+ *
+ *	The 'meta' parameter is rewritten by the verifier, no need for BPF
+ *	program to set it.
+ * Returns
+ *	Void.
+ */
+extern void bpf_percpu_obj_drop_impl(void *kptr, void *meta) __ksym;
+
+/* Convenience macro to wrap over bpf_obj_drop_impl */
+#define bpf_percpu_obj_drop(kptr) bpf_percpu_obj_drop_impl(kptr, NULL)
+
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 08/13] selftests/bpf: Add tests for array map with local percpu kptr
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (6 preceding siblings ...)
  2023-08-27 15:28 ` [PATCH bpf-next v3 07/13] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Yonghong Song
@ 2023-08-27 15:28 ` Yonghong Song
  2023-08-27 15:28 ` [PATCH bpf-next v3 09/13] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add non-sleepable and sleepable tests with percpu kptr. For
non-sleepable test, four programs are executed in the order of:
  1. allocate percpu data.
  2. assign values to percpu data.
  3. retrieve percpu data.
  4. de-allocate percpu data.

The sleepable prog tried to exercise all above 4 steps in a
single prog. Also for sleepable prog, rcu_read_lock is needed
to protect direct percpu ptr access (from map value) and
following bpf_this_cpu_ptr() and bpf_per_cpu_ptr() helpers.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   |  78 ++++++++
 .../selftests/bpf/progs/percpu_alloc_array.c  | 187 ++++++++++++++++++
 2 files changed, 265 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_array.c

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
new file mode 100644
index 000000000000..0fb536822f14
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "percpu_alloc_array.skel.h"
+
+static void test_array(void)
+{
+	struct percpu_alloc_array *skel;
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	skel = percpu_alloc_array__open();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc_array__open"))
+		return;
+
+	bpf_program__set_autoload(skel->progs.test_array_map_1, true);
+	bpf_program__set_autoload(skel->progs.test_array_map_2, true);
+	bpf_program__set_autoload(skel->progs.test_array_map_3, true);
+	bpf_program__set_autoload(skel->progs.test_array_map_4, true);
+
+	skel->rodata->nr_cpus = libbpf_num_possible_cpus();
+
+	err = percpu_alloc_array__load(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_array__load"))
+		goto out;
+
+	err = percpu_alloc_array__attach(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_array__attach"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.test_array_map_1);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run array_map 1-4");
+	ASSERT_EQ(topts.retval, 0, "test_run array_map 1-4");
+	ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d");
+	ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c");
+out:
+	percpu_alloc_array__destroy(skel);
+}
+
+static void test_array_sleepable(void)
+{
+	struct percpu_alloc_array *skel;
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	skel = percpu_alloc_array__open();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc__open"))
+		return;
+
+	bpf_program__set_autoload(skel->progs.test_array_map_10, true);
+
+	skel->rodata->nr_cpus = libbpf_num_possible_cpus();
+
+	err = percpu_alloc_array__load(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_array__load"))
+		goto out;
+
+	err = percpu_alloc_array__attach(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_array__attach"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.test_array_map_10);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run array_map_10");
+	ASSERT_EQ(topts.retval, 0, "test_run array_map_10");
+	ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d");
+	ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c");
+out:
+	percpu_alloc_array__destroy(skel);
+}
+
+void test_percpu_alloc(void)
+{
+	if (test__start_subtest("array"))
+		test_array();
+	if (test__start_subtest("array_sleepable"))
+		test_array_sleepable();
+}
diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
new file mode 100644
index 000000000000..3bd7d47870a9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
@@ -0,0 +1,187 @@
+#include "bpf_experimental.h"
+
+struct val_t {
+	long b, c, d;
+};
+
+struct elem {
+	long sum;
+	struct val_t __percpu_kptr *pc;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct elem);
+} array SEC(".maps");
+
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+const volatile int nr_cpus;
+
+/* Initialize the percpu object */
+SEC("?fentry/bpf_fentry_test1")
+int BPF_PROG(test_array_map_1)
+{
+	struct val_t __percpu_kptr *p;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	return 0;
+}
+
+/* Update percpu data */
+SEC("?fentry/bpf_fentry_test2")
+int BPF_PROG(test_array_map_2)
+{
+	struct val_t __percpu_kptr *p;
+	struct val_t *v;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	v = bpf_per_cpu_ptr(p, 0);
+	if (!v)
+		return 0;
+	v->c = 1;
+	v->d = 2;
+
+	return 0;
+}
+
+int cpu0_field_d, sum_field_c;
+
+/* Summarize percpu data */
+SEC("?fentry/bpf_fentry_test3")
+int BPF_PROG(test_array_map_3)
+{
+	struct val_t __percpu_kptr *p;
+	int i, index = 0;
+	struct val_t *v;
+	struct elem *e;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	bpf_for(i, 0, nr_cpus) {
+		v = bpf_per_cpu_ptr(p, i);
+		if (v) {
+			if (i == 0)
+				cpu0_field_d = v->d;
+			sum_field_c += v->c;
+		}
+	}
+
+	return 0;
+}
+
+/* Explicitly free allocated percpu data */
+SEC("?fentry/bpf_fentry_test4")
+int BPF_PROG(test_array_map_4)
+{
+	struct val_t __percpu_kptr *p;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	/* delete */
+	p = bpf_kptr_xchg(&e->pc, NULL);
+	if (p) {
+		bpf_percpu_obj_drop(p);
+	}
+
+	return 0;
+}
+
+SEC("?fentry.s/bpf_fentry_test1")
+int BPF_PROG(test_array_map_10)
+{
+	struct val_t __percpu_kptr *p, *p1;
+	int i, index = 0;
+	struct val_t *v;
+	struct elem *e;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	bpf_rcu_read_lock();
+	p = e->pc;
+	if (!p) {
+		p = bpf_percpu_obj_new(struct val_t);
+		if (!p)
+			goto out;
+
+		p1 = bpf_kptr_xchg(&e->pc, p);
+		if (p1) {
+			/* race condition */
+			bpf_percpu_obj_drop(p1);
+		}
+
+		p = e->pc;
+		if (!p)
+			goto out;
+	}
+
+	v = bpf_this_cpu_ptr(p);
+	v->c = 3;
+	v = bpf_this_cpu_ptr(p);
+	v->c = 0;
+
+	v = bpf_per_cpu_ptr(p, 0);
+	if (!v)
+		goto out;
+	v->c = 1;
+	v->d = 2;
+
+	/* delete */
+	p1 = bpf_kptr_xchg(&e->pc, NULL);
+	if (!p1)
+		goto out;
+
+	bpf_for(i, 0, nr_cpus) {
+		v = bpf_per_cpu_ptr(p, i);
+		if (v) {
+			if (i == 0)
+				cpu0_field_d = v->d;
+			sum_field_c += v->c;
+		}
+	}
+
+	/* finally release p */
+	bpf_percpu_obj_drop(p1);
+out:
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 09/13] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (7 preceding siblings ...)
  2023-08-27 15:28 ` [PATCH bpf-next v3 08/13] selftests/bpf: Add tests for array map with local percpu kptr Yonghong Song
@ 2023-08-27 15:28 ` Yonghong Song
  2023-09-06  0:37   ` Alexei Starovoitov
  2023-08-27 15:28 ` [PATCH bpf-next v3 10/13] selftests/bpf: Remove unnecessary direct read of local percpu kptr Yonghong Song
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

In previous selftests/bpf patch, we have
  p = bpf_percpu_obj_new(struct val_t);
  if (!p)
          goto out;

  p1 = bpf_kptr_xchg(&e->pc, p);
  if (p1) {
          /* race condition */
          bpf_percpu_obj_drop(p1);
  }

  p = e->pc;
  if (!p)
          goto out;

After bpf_kptr_xchg(), we need to re-read e->pc into 'p'.
This is due to that the second argument of bpf_kptr_xchg() is marked
OBJ_RELEASE and it will be marked as invalid after the call.
So after bpf_kptr_xchg(), 'p' is an unknown scalar,
and the bpf program needs to reread from the map value.

This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU,
and if 'p' is RCU protected. If this is the case, 'p' can be marked
as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not
an owning reference any more. Such a change makes re-read
from the map value unnecessary.

Note that re-reading 'e->pc' after bpf_kptr_xchg() might get
a different value from 'p' if immediately before 'p = e->pc',
another cpu may do another bpf_kptr_xchg() and swap in another value
into 'e->pc'. If this is the case, then 'p = e->pc' may
get either 'p' or another value, and race condition already exists.
So removing direct re-reading seems fine too.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 kernel/bpf/verifier.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6b7e7ca611f3..dbba2b806017 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9660,6 +9660,26 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 				return -EFAULT;
 			}
 			err = unmark_stack_slots_dynptr(env, &regs[meta.release_regno]);
+		} else if (func_id == BPF_FUNC_kptr_xchg && meta.ref_obj_id) {
+			u32 ref_obj_id = meta.ref_obj_id;
+			bool in_rcu = in_rcu_cs(env);
+			struct bpf_func_state *state;
+			struct bpf_reg_state *reg;
+
+			err = release_reference_state(cur_func(env), ref_obj_id);
+			if (!err) {
+				bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+					if (reg->ref_obj_id == ref_obj_id) {
+						if (in_rcu && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) {
+							reg->ref_obj_id = 0;
+							reg->type &= ~MEM_ALLOC;
+							reg->type |= MEM_RCU;
+						} else {
+							mark_reg_invalid(env, reg);
+						}
+					}
+				}));
+			}
 		} else if (meta.ref_obj_id) {
 			err = release_reference(env, meta.ref_obj_id);
 		} else if (register_is_null(&regs[meta.release_regno])) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 10/13] selftests/bpf: Remove unnecessary direct read of local percpu kptr
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (8 preceding siblings ...)
  2023-08-27 15:28 ` [PATCH bpf-next v3 09/13] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
@ 2023-08-27 15:28 ` Yonghong Song
  2023-08-27 15:28 ` [PATCH bpf-next v3 11/13] selftests/bpf: Add tests for cgrp_local_storage with " Yonghong Song
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

For the second argument of bpf_kptr_xchg(), if the reg type contains
MEM_ALLOC and MEM_PERCPU, which means a percpu allocation,
after bpf_kptr_xchg(), the argument is marked as MEM_RCU and MEM_PERCPU
if in rcu critical section. This way, re-reading from the map value
is not needed. Remove it from the percpu_alloc_array.c selftest.

Without previous kernel change, the test will fail like below:

  0: R1=ctx(off=0,imm=0) R10=fp0
  ; int BPF_PROG(test_array_map_10, int a)
  0: (b4) w1 = 0                        ; R1_w=0
  ; int i, index = 0;
  1: (63) *(u32 *)(r10 -4) = r1         ; R1_w=0 R10=fp0 fp-8=0000????
  2: (bf) r2 = r10                      ; R2_w=fp0 R10=fp0
  ;
  3: (07) r2 += -4                      ; R2_w=fp-4
  ; e = bpf_map_lookup_elem(&array, &index);
  4: (18) r1 = 0xffff88810e771800       ; R1_w=map_ptr(off=0,ks=4,vs=16,imm=0)
  6: (85) call bpf_map_lookup_elem#1    ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0)
  7: (bf) r6 = r0                       ; R0_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0) R6_w=map_value_or_null(id=1,off=0,ks=4,vs=16,imm=0)
  ; if (!e)
  8: (15) if r6 == 0x0 goto pc+81       ; R6_w=map_value(off=0,ks=4,vs=16,imm=0)
  ; bpf_rcu_read_lock();
  9: (85) call bpf_rcu_read_lock#87892          ;
  ; p = e->pc;
  10: (bf) r7 = r6                      ; R6=map_value(off=0,ks=4,vs=16,imm=0) R7_w=map_value(off=0,ks=4,vs=16,imm=0)
  11: (07) r7 += 8                      ; R7_w=map_value(off=8,ks=4,vs=16,imm=0)
  12: (79) r6 = *(u64 *)(r6 +8)         ; R6_w=percpu_rcu_ptr_or_null_val_t(id=2,off=0,imm=0)
  ; if (!p) {
  13: (55) if r6 != 0x0 goto pc+13      ; R6_w=0
  ; p = bpf_percpu_obj_new(struct val_t);
  14: (18) r1 = 0x12                    ; R1_w=18
  16: (b7) r2 = 0                       ; R2_w=0
  17: (85) call bpf_percpu_obj_new_impl#87883   ; R0_w=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) refs=4
  18: (bf) r6 = r0                      ; R0=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) R6=percpu_ptr_or_null_val_t(id=4,ref_obj_id=4,off=0,imm=0) refs=4
  ; if (!p)
  19: (15) if r6 == 0x0 goto pc+69      ; R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) refs=4
  ; p1 = bpf_kptr_xchg(&e->pc, p);
  20: (bf) r1 = r7                      ; R1_w=map_value(off=8,ks=4,vs=16,imm=0) R7=map_value(off=8,ks=4,vs=16,imm=0) refs=4
  21: (bf) r2 = r6                      ; R2_w=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0) refs=4
  22: (85) call bpf_kptr_xchg#194       ; R0_w=percpu_ptr_or_null_val_t(id=6,ref_obj_id=6,off=0,imm=0) refs=6
  ; if (p1) {
  23: (15) if r0 == 0x0 goto pc+3       ; R0_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) refs=6
  ; bpf_percpu_obj_drop(p1);
  24: (bf) r1 = r0                      ; R0_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) R1_w=percpu_ptr_val_t(ref_obj_id=6,off=0,imm=0) refs=6
  25: (b7) r2 = 0                       ; R2_w=0 refs=6
  26: (85) call bpf_percpu_obj_drop_impl#87882          ;
  ; v = bpf_this_cpu_ptr(p);
  27: (bf) r1 = r6                      ; R1_w=scalar(id=7) R6=scalar(id=7)
  28: (85) call bpf_this_cpu_ptr#154
  R1 type=scalar expected=percpu_ptr_, percpu_rcu_ptr_, percpu_trusted_ptr_

The R1 which gets its value from R6 is a scalar. But before insn 22, R6 is
  R6=percpu_ptr_val_t(ref_obj_id=4,off=0,imm=0)
Its type is changed to a scalar at insn 22 without previous patch.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 tools/testing/selftests/bpf/progs/percpu_alloc_array.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
index 3bd7d47870a9..bbc45346e006 100644
--- a/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_array.c
@@ -146,10 +146,6 @@ int BPF_PROG(test_array_map_10)
 			/* race condition */
 			bpf_percpu_obj_drop(p1);
 		}
-
-		p = e->pc;
-		if (!p)
-			goto out;
 	}
 
 	v = bpf_this_cpu_ptr(p);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 11/13] selftests/bpf: Add tests for cgrp_local_storage with local percpu kptr
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (9 preceding siblings ...)
  2023-08-27 15:28 ` [PATCH bpf-next v3 10/13] selftests/bpf: Remove unnecessary direct read of local percpu kptr Yonghong Song
@ 2023-08-27 15:28 ` Yonghong Song
  2023-08-27 15:28 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add some negative tests Yonghong Song
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add a non-sleepable cgrp_local_storage test with percpu kptr. The
test does allocation of percpu data, assigning values to percpu
data and retrieval of percpu data. The de-allocation of percpu
data is done when the map is freed.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   |  40 +++++++
 .../progs/percpu_alloc_cgrp_local_storage.c   | 105 ++++++++++++++++++
 2 files changed, 145 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
index 0fb536822f14..41bf784a4bb3 100644
--- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 #include "percpu_alloc_array.skel.h"
+#include "percpu_alloc_cgrp_local_storage.skel.h"
 
 static void test_array(void)
 {
@@ -69,10 +70,49 @@ static void test_array_sleepable(void)
 	percpu_alloc_array__destroy(skel);
 }
 
+static void test_cgrp_local_storage(void)
+{
+	struct percpu_alloc_cgrp_local_storage *skel;
+	int err, cgroup_fd, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	cgroup_fd = test__join_cgroup("/percpu_alloc");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /percpu_alloc"))
+		return;
+
+	skel = percpu_alloc_cgrp_local_storage__open();
+	if (!ASSERT_OK_PTR(skel, "percpu_alloc_cgrp_local_storage__open"))
+		goto close_fd;
+
+	skel->rodata->nr_cpus = libbpf_num_possible_cpus();
+
+	err = percpu_alloc_cgrp_local_storage__load(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_cgrp_local_storage__load"))
+		goto destroy_skel;
+
+	err = percpu_alloc_cgrp_local_storage__attach(skel);
+	if (!ASSERT_OK(err, "percpu_alloc_cgrp_local_storage__attach"))
+		goto destroy_skel;
+
+	prog_fd = bpf_program__fd(skel->progs.test_cgrp_local_storage_1);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run cgrp_local_storage 1-3");
+	ASSERT_EQ(topts.retval, 0, "test_run cgrp_local_storage 1-3");
+	ASSERT_EQ(skel->bss->cpu0_field_d, 2, "cpu0_field_d");
+	ASSERT_EQ(skel->bss->sum_field_c, 1, "sum_field_c");
+
+destroy_skel:
+	percpu_alloc_cgrp_local_storage__destroy(skel);
+close_fd:
+	close(cgroup_fd);
+}
+
 void test_percpu_alloc(void)
 {
 	if (test__start_subtest("array"))
 		test_array();
 	if (test__start_subtest("array_sleepable"))
 		test_array_sleepable();
+	if (test__start_subtest("cgrp_local_storage"))
+		test_cgrp_local_storage();
 }
diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c b/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c
new file mode 100644
index 000000000000..1c36a241852c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_cgrp_local_storage.c
@@ -0,0 +1,105 @@
+#include "bpf_experimental.h"
+
+struct val_t {
+	long b, c, d;
+};
+
+struct elem {
+	long sum;
+	struct val_t __percpu_kptr *pc;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_CGRP_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, struct elem);
+} cgrp SEC(".maps");
+
+const volatile int nr_cpus;
+
+/* Initialize the percpu object */
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(test_cgrp_local_storage_1)
+{
+	struct task_struct *task;
+	struct val_t __percpu_kptr *p;
+	struct elem *e;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0,
+				 BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	return 0;
+}
+
+/* Percpu data collection */
+SEC("fentry/bpf_fentry_test2")
+int BPF_PROG(test_cgrp_local_storage_2)
+{
+	struct task_struct *task;
+	struct val_t __percpu_kptr *p;
+	struct val_t *v;
+	struct elem *e;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	v = bpf_per_cpu_ptr(p, 0);
+	if (!v)
+		return 0;
+	v->c = 1;
+	v->d = 2;
+	return 0;
+}
+
+int cpu0_field_d, sum_field_c;
+
+/* Summarize percpu data collection */
+SEC("fentry/bpf_fentry_test3")
+int BPF_PROG(test_cgrp_local_storage_3)
+{
+	struct task_struct *task;
+	struct val_t __percpu_kptr *p;
+	struct val_t *v;
+	struct elem *e;
+	int i;
+
+	task = bpf_get_current_task_btf();
+	e = bpf_cgrp_storage_get(&cgrp, task->cgroups->dfl_cgrp, 0, 0);
+	if (!e)
+		return 0;
+
+	p = e->pc;
+	if (!p)
+		return 0;
+
+	bpf_for(i, 0, nr_cpus) {
+		v = bpf_per_cpu_ptr(p, i);
+		if (v) {
+			if (i == 0)
+				cpu0_field_d = v->d;
+			sum_field_c += v->c;
+		}
+	}
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 12/13] selftests/bpf: Add some negative tests
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (10 preceding siblings ...)
  2023-08-27 15:28 ` [PATCH bpf-next v3 11/13] selftests/bpf: Add tests for cgrp_local_storage with " Yonghong Song
@ 2023-08-27 15:28 ` Yonghong Song
  2023-08-27 15:28 ` [PATCH bpf-next v3 13/13] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
  2023-09-06  0:50 ` [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr patchwork-bot+netdevbpf
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add a few negative tests for common mistakes with using percpu kptr
including:
  - store to percpu kptr.
  - type mistach in bpf_kptr_xchg arguments.
  - sleepable prog with untrusted arg for bpf_this_cpu_ptr().
  - bpf_percpu_obj_new && bpf_obj_drop, and bpf_obj_new && bpf_percpu_obj_drop
  - struct with ptr for bpf_percpu_obj_new
  - struct with special field (e.g., bpf_spin_lock) for bpf_percpu_obj_new

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 .../selftests/bpf/prog_tests/percpu_alloc.c   |   7 +
 .../selftests/bpf/progs/percpu_alloc_fail.c   | 164 ++++++++++++++++++
 2 files changed, 171 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/percpu_alloc_fail.c

diff --git a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
index 41bf784a4bb3..9541e9b3a034 100644
--- a/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/percpu_alloc.c
@@ -2,6 +2,7 @@
 #include <test_progs.h>
 #include "percpu_alloc_array.skel.h"
 #include "percpu_alloc_cgrp_local_storage.skel.h"
+#include "percpu_alloc_fail.skel.h"
 
 static void test_array(void)
 {
@@ -107,6 +108,10 @@ static void test_cgrp_local_storage(void)
 	close(cgroup_fd);
 }
 
+static void test_failure(void) {
+	RUN_TESTS(percpu_alloc_fail);
+}
+
 void test_percpu_alloc(void)
 {
 	if (test__start_subtest("array"))
@@ -115,4 +120,6 @@ void test_percpu_alloc(void)
 		test_array_sleepable();
 	if (test__start_subtest("cgrp_local_storage"))
 		test_cgrp_local_storage();
+	if (test__start_subtest("failure_tests"))
+		test_failure();
 }
diff --git a/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c b/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c
new file mode 100644
index 000000000000..1a891d30f1fe
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/percpu_alloc_fail.c
@@ -0,0 +1,164 @@
+#include "bpf_experimental.h"
+#include "bpf_misc.h"
+
+struct val_t {
+	long b, c, d;
+};
+
+struct val2_t {
+	long b;
+};
+
+struct val_with_ptr_t {
+	char *p;
+};
+
+struct val_with_rb_root_t {
+	struct bpf_spin_lock lock;
+};
+
+struct elem {
+	long sum;
+	struct val_t __percpu_kptr *pc;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, int);
+	__type(value, struct elem);
+} array SEC(".maps");
+
+long ret;
+
+SEC("?fentry/bpf_fentry_test1")
+__failure __msg("store to referenced kptr disallowed")
+int BPF_PROG(test_array_map_1)
+{
+	struct val_t __percpu_kptr *p;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	e->pc = (struct val_t __percpu_kptr *)ret;
+	return 0;
+}
+
+SEC("?fentry/bpf_fentry_test1")
+__failure __msg("invalid kptr access, R2 type=percpu_ptr_val2_t expected=ptr_val_t")
+int BPF_PROG(test_array_map_2)
+{
+	struct val2_t __percpu_kptr *p2;
+	struct val_t __percpu_kptr *p;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p2 = bpf_percpu_obj_new(struct val2_t);
+	if (!p2)
+		return 0;
+
+	p = bpf_kptr_xchg(&e->pc, p2);
+	if (p)
+		bpf_percpu_obj_drop(p);
+
+	return 0;
+}
+
+SEC("?fentry.s/bpf_fentry_test1")
+__failure __msg("R1 type=scalar expected=percpu_ptr_, percpu_rcu_ptr_, percpu_trusted_ptr_")
+int BPF_PROG(test_array_map_3)
+{
+	struct val_t __percpu_kptr *p, *p1;
+	struct val_t *v;
+	struct elem *e;
+	int index = 0;
+
+	e = bpf_map_lookup_elem(&array, &index);
+	if (!e)
+		return 0;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	p1 = bpf_kptr_xchg(&e->pc, p);
+	if (p1)
+		bpf_percpu_obj_drop(p1);
+
+	v = bpf_this_cpu_ptr(p);
+	ret = v->b;
+	return 0;
+}
+
+SEC("?fentry.s/bpf_fentry_test1")
+__failure __msg("arg#0 expected for bpf_percpu_obj_drop_impl()")
+int BPF_PROG(test_array_map_4)
+{
+	struct val_t __percpu_kptr *p;
+
+	p = bpf_percpu_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	bpf_obj_drop(p);
+	return 0;
+}
+
+SEC("?fentry.s/bpf_fentry_test1")
+__failure __msg("arg#0 expected for bpf_obj_drop_impl()")
+int BPF_PROG(test_array_map_5)
+{
+	struct val_t *p;
+
+	p = bpf_obj_new(struct val_t);
+	if (!p)
+		return 0;
+
+	bpf_percpu_obj_drop(p);
+	return 0;
+}
+
+SEC("?fentry.s/bpf_fentry_test1")
+__failure __msg("bpf_percpu_obj_new type ID argument must be of a struct of scalars")
+int BPF_PROG(test_array_map_6)
+{
+	struct val_with_ptr_t __percpu_kptr *p;
+
+	p = bpf_percpu_obj_new(struct val_with_ptr_t);
+	if (!p)
+		return 0;
+
+	bpf_percpu_obj_drop(p);
+	return 0;
+}
+
+SEC("?fentry.s/bpf_fentry_test1")
+__failure __msg("bpf_percpu_obj_new type ID argument must not contain special fields")
+int BPF_PROG(test_array_map_7)
+{
+	struct val_with_rb_root_t __percpu_kptr *p;
+
+	p = bpf_percpu_obj_new(struct val_with_rb_root_t);
+	if (!p)
+		return 0;
+
+	bpf_percpu_obj_drop(p);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH bpf-next v3 13/13] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (11 preceding siblings ...)
  2023-08-27 15:28 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add some negative tests Yonghong Song
@ 2023-08-27 15:28 ` Yonghong Song
  2023-09-06  0:50 ` [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr patchwork-bot+netdevbpf
  13 siblings, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-08-27 15:28 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr'
can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality
and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated.
Also make changes in selftests/bpf/test_bpftool_synctypes.py
and selftest libbpf_str to fix otherwise test errors.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
---
 include/uapi/linux/bpf.h                              | 9 ++++++++-
 tools/include/uapi/linux/bpf.h                        | 9 ++++++++-
 tools/testing/selftests/bpf/prog_tests/libbpf_str.c   | 6 +++++-
 tools/testing/selftests/bpf/test_bpftool_synctypes.py | 9 +++++++++
 4 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8790b3962e4b..73b155e52204 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -932,7 +932,14 @@ enum bpf_map_type {
 	 */
 	BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
 	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
-	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
+	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
+	/* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
+	 * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
+	 * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
+	 * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
+	 * deprecated.
+	 */
+	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
 	BPF_MAP_TYPE_QUEUE,
 	BPF_MAP_TYPE_STACK,
 	BPF_MAP_TYPE_SK_STORAGE,
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 8790b3962e4b..73b155e52204 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -932,7 +932,14 @@ enum bpf_map_type {
 	 */
 	BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED,
 	BPF_MAP_TYPE_REUSEPORT_SOCKARRAY,
-	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE,
+	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
+	/* BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is available to bpf programs
+	 * attaching to a cgroup. The new mechanism (BPF_MAP_TYPE_CGRP_STORAGE +
+	 * local percpu kptr) supports all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
+	 * functionality and more. So mark * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE
+	 * deprecated.
+	 */
+	BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE = BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED,
 	BPF_MAP_TYPE_QUEUE,
 	BPF_MAP_TYPE_STACK,
 	BPF_MAP_TYPE_SK_STORAGE,
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index efb8bd43653c..c440ea3311ed 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -142,10 +142,14 @@ static void test_libbpf_bpf_map_type_str(void)
 		/* Special case for map_type_name BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED
 		 * where it and BPF_MAP_TYPE_CGROUP_STORAGE have the same enum value
 		 * (map_type). For this enum value, libbpf_bpf_map_type_str() picks
-		 * BPF_MAP_TYPE_CGROUP_STORAGE.
+		 * BPF_MAP_TYPE_CGROUP_STORAGE. The same for
+		 * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED and
+		 * BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE.
 		 */
 		if (strcmp(map_type_name, "BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED") == 0)
 			continue;
+		if (strcmp(map_type_name, "BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED") == 0)
+			continue;
 
 		ASSERT_STREQ(buf, map_type_name, "exp_str_value");
 	}
diff --git a/tools/testing/selftests/bpf/test_bpftool_synctypes.py b/tools/testing/selftests/bpf/test_bpftool_synctypes.py
index 0cfece7ff4f8..0ed67b6b31dd 100755
--- a/tools/testing/selftests/bpf/test_bpftool_synctypes.py
+++ b/tools/testing/selftests/bpf/test_bpftool_synctypes.py
@@ -509,6 +509,15 @@ def main():
     source_map_types.remove('cgroup_storage_deprecated')
     source_map_types.add('cgroup_storage')
 
+    # The same applied to BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED and
+    # BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE which share the same enum value
+    # and source_map_types picks
+    # BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE_DEPRECATED/percpu_cgroup_storage_deprecated.
+    # Replace 'percpu_cgroup_storage_deprecated' with 'percpu_cgroup_storage'
+    # so it aligns with what `bpftool map help` shows.
+    source_map_types.remove('percpu_cgroup_storage_deprecated')
+    source_map_types.add('percpu_cgroup_storage')
+
     help_map_types = map_info.get_map_help()
     help_map_options = map_info.get_options()
     map_info.close()
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 09/13] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible
  2023-08-27 15:28 ` [PATCH bpf-next v3 09/13] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
@ 2023-09-06  0:37   ` Alexei Starovoitov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexei Starovoitov @ 2023-09-06  0:37 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Sun, Aug 27, 2023 at 08:28:16AM -0700, Yonghong Song wrote:
> In previous selftests/bpf patch, we have
>   p = bpf_percpu_obj_new(struct val_t);
>   if (!p)
>           goto out;
> 
>   p1 = bpf_kptr_xchg(&e->pc, p);
>   if (p1) {
>           /* race condition */
>           bpf_percpu_obj_drop(p1);
>   }
> 
>   p = e->pc;
>   if (!p)
>           goto out;
> 
> After bpf_kptr_xchg(), we need to re-read e->pc into 'p'.
> This is due to that the second argument of bpf_kptr_xchg() is marked
> OBJ_RELEASE and it will be marked as invalid after the call.
> So after bpf_kptr_xchg(), 'p' is an unknown scalar,
> and the bpf program needs to reread from the map value.
> 
> This patch checks if the 'p' has type MEM_ALLOC and MEM_PERCPU,
> and if 'p' is RCU protected. If this is the case, 'p' can be marked
> as MEM_RCU. MEM_ALLOC needs to be removed since 'p' is not
> an owning reference any more. Such a change makes re-read
> from the map value unnecessary.
> 
> Note that re-reading 'e->pc' after bpf_kptr_xchg() might get
> a different value from 'p' if immediately before 'p = e->pc',
> another cpu may do another bpf_kptr_xchg() and swap in another value
> into 'e->pc'. If this is the case, then 'p = e->pc' may
> get either 'p' or another value, and race condition already exists.
> So removing direct re-reading seems fine too.
...
> +		} else if (func_id == BPF_FUNC_kptr_xchg && meta.ref_obj_id) {
> +			u32 ref_obj_id = meta.ref_obj_id;
> +			bool in_rcu = in_rcu_cs(env);
> +			struct bpf_func_state *state;
> +			struct bpf_reg_state *reg;
> +
> +			err = release_reference_state(cur_func(env), ref_obj_id);
> +			if (!err) {
> +				bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
> +					if (reg->ref_obj_id == ref_obj_id) {
> +						if (in_rcu && (reg->type & MEM_ALLOC) && (reg->type & MEM_PERCPU)) {
> +							reg->ref_obj_id = 0;
> +							reg->type &= ~MEM_ALLOC;
> +							reg->type |= MEM_RCU;
> +						} else {
> +							mark_reg_invalid(env, reg);
> +						}
> +					}
> +				}));
> +			}
>  		} else if (meta.ref_obj_id) {
>  			err = release_reference(env, meta.ref_obj_id);

I think this open coded version of release_reference() can be safely folded into release_reference().
If it's safe to do for kptr_xchg() then it's safe to do for all KF_RELEASE kfuncs too
that call release_reference().
bpf_percpu_obj_drop() is the only such kfunc and converting its arg1
from MEM_ALLOC | MEM_PERCPU to MEM_RCU | MEM_PERCPU should be equally valid,
since bpf_percpu_obj_drop() is doing bpf_mem_free_rcu.

I'm planning to apply the whole set. Above nit can be a follow up.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 03/13] bpf: Add alloc/xchg/direct_access support for local percpu kptr
  2023-08-27 15:27 ` [PATCH bpf-next v3 03/13] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
@ 2023-09-06  0:40   ` Alexei Starovoitov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexei Starovoitov @ 2023-09-06  0:40 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Sun, Aug 27, 2023 at 08:27:44AM -0700, Yonghong Song wrote:
>  		case KF_ARG_PTR_TO_ALLOC_BTF_ID:
> -			if (reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
> +			if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC)) {
> +				if (meta->func_id != special_kfunc_list[KF_bpf_obj_drop_impl]) {
> +					verbose(env, "arg#%d expected for bpf_obj_drop_impl()\n", i);
> +					return -EINVAL;
> +				}
> +			} else if (reg->type == (PTR_TO_BTF_ID | MEM_ALLOC | MEM_PERCPU)) {
> +				if (meta->func_id != special_kfunc_list[KF_bpf_percpu_obj_drop_impl]) {
> +					verbose(env, "arg#%d expected for bpf_percpu_obj_drop_impl()\n", i);
> +					return -EINVAL;
> +				}
> +			} else {
>  				verbose(env, "arg#%d expected pointer to allocated object\n", i);
>  				return -EINVAL;

This is a bit too much of hard coded checks for "suppose to be generic" __alloc arg suffix in kfuncs.
We only have two such kfuncs:
bpf_percpu_obj_drop_impl(void *p__alloc
bpf_obj_drop_impl(void *p__alloc
so above works and I don't have a good suggestion how it can be improved,
but would be great to follow up to clean this up somehow.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr
  2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
                   ` (12 preceding siblings ...)
  2023-08-27 15:28 ` [PATCH bpf-next v3 13/13] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
@ 2023-09-06  0:50 ` patchwork-bot+netdevbpf
  13 siblings, 0 replies; 23+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-09-06  0:50 UTC (permalink / raw)
  To: Yonghong Song; +Cc: bpf, ast, andrii, daniel, kernel-team, martin.lau

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Sun, 27 Aug 2023 08:27:29 -0700 you wrote:
> Patch set [1] implemented cgroup local storage BPF_MAP_TYPE_CGRP_STORAGE
> similar to sk/task/inode local storage and old BPF_MAP_TYPE_CGROUP_STORAGE
> map is marked as deprecated since old BPF_MAP_TYPE_CGROUP_STORAGE map can
> only work with current cgroup.
> 
> Similarly, the existing BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE map
> is a percpu version of BPF_MAP_TYPE_CGROUP_STORAGE and only works
> with current cgroup. But there is no replacement which can work
> with arbitrary cgroup.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3,01/13] bpf: Add support for non-fix-size percpu mem allocation
    https://git.kernel.org/bpf/bpf-next/c/baa9555b6c1c
  - [bpf-next,v3,02/13] bpf: Add BPF_KPTR_PERCPU as a field type
    https://git.kernel.org/bpf/bpf-next/c/b24d13fa6081
  - [bpf-next,v3,03/13] bpf: Add alloc/xchg/direct_access support for local percpu kptr
    https://git.kernel.org/bpf/bpf-next/c/b3122748321a
  - [bpf-next,v3,04/13] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj
    https://git.kernel.org/bpf/bpf-next/c/119f4dfb73c5
  - [bpf-next,v3,05/13] selftests/bpf: Update error message in negative linked_list test
    https://git.kernel.org/bpf/bpf-next/c/dc5ed69ef3c4
  - [bpf-next,v3,06/13] libbpf: Add __percpu_kptr macro definition
    https://git.kernel.org/bpf/bpf-next/c/9ae302fe598b
  - [bpf-next,v3,07/13] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h
    https://git.kernel.org/bpf/bpf-next/c/7a03199abbc7
  - [bpf-next,v3,08/13] selftests/bpf: Add tests for array map with local percpu kptr
    https://git.kernel.org/bpf/bpf-next/c/ef570811b2e8
  - [bpf-next,v3,09/13] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible
    https://git.kernel.org/bpf/bpf-next/c/616334e5036b
  - [bpf-next,v3,10/13] selftests/bpf: Remove unnecessary direct read of local percpu kptr
    https://git.kernel.org/bpf/bpf-next/c/974da9f3ee66
  - [bpf-next,v3,11/13] selftests/bpf: Add tests for cgrp_local_storage with local percpu kptr
    https://git.kernel.org/bpf/bpf-next/c/38bbd586889f
  - [bpf-next,v3,12/13] selftests/bpf: Add some negative tests
    https://git.kernel.org/bpf/bpf-next/c/2bc5f8c5dbe4
  - [bpf-next,v3,13/13] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated
    https://git.kernel.org/bpf/bpf-next/c/fb94bcc2b286

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation
  2023-08-27 15:27 ` [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
@ 2023-09-13 14:10   ` Hou Tao
  2023-11-15 15:31   ` Heiko Carstens
  1 sibling, 0 replies; 23+ messages in thread
From: Hou Tao @ 2023-09-13 14:10 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Hi,

On 8/27/2023 11:27 PM, Yonghong Song wrote:
> This is needed for later percpu mem allocation when the
> allocation is done by bpf program. For such cases, a global
> bpf_global_percpu_ma is added where a flexible allocation
> size is needed.
>
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h   |  4 ++--
>  kernel/bpf/core.c     |  8 +++++---
>  kernel/bpf/memalloc.c | 14 ++++++--------
>  3 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 12596af59c00..144dbddf53bd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -55,8 +55,8 @@ struct cgroup;
>  extern struct idr btf_idr;
>  extern spinlock_t btf_idr_lock;
>  extern struct kobject *btf_kobj;
> -extern struct bpf_mem_alloc bpf_global_ma;
> -extern bool bpf_global_ma_set;
> +extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> +extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>  
>  typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
>  typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 0f8f036d8bd1..95599df82ee4 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -64,8 +64,8 @@
>  #define OFF	insn->off
>  #define IMM	insn->imm
>  
> -struct bpf_mem_alloc bpf_global_ma;
> -bool bpf_global_ma_set;
> +struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> +bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>  
>  /* No hurry in this branch
>   *
> @@ -2921,7 +2921,9 @@ static int __init bpf_global_ma_init(void)
>  
>  	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
>  	bpf_global_ma_set = !ret;
> -	return ret;
> +	ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
> +	bpf_global_percpu_ma_set = !ret;
> +	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
>  }
>  late_initcall(bpf_global_ma_init);
>  #endif
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 9c49ae53deaf..cb60445de98a 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -499,15 +499,16 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  	struct obj_cgroup *objcg = NULL;
>  	int cpu, i, unit_size, percpu_size = 0;
>  
> +	/* room for llist_node and per-cpu pointer */
> +	if (percpu)
> +		percpu_size = LLIST_NODE_SZ + sizeof(void *);
> +
>  	if (size) {
>  		pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL);
>  		if (!pc)
>  			return -ENOMEM;
>  
> -		if (percpu)
> -			/* room for llist_node and per-cpu pointer */
> -			percpu_size = LLIST_NODE_SZ + sizeof(void *);
> -		else
> +		if (!percpu)
>  			size += LLIST_NODE_SZ; /* room for llist_node */
>  		unit_size = size;
>  
> @@ -527,10 +528,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  		return 0;
>  	}
>  
> -	/* size == 0 && percpu is an invalid combination */
> -	if (WARN_ON_ONCE(percpu))
> -		return -EINVAL;
> -
>  	pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL);
>  	if (!pcc)
>  		return -ENOMEM;
> @@ -543,6 +540,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  			c = &cc->cache[i];
>  			c->unit_size = sizes[i];
>  			c->objcg = objcg;
> +			c->percpu_size = percpu_size;
>  			c->tgt = c;
>  			prefill_mem_cache(c, cpu);
>  		}

It would trigger the WARN_ON_ONCE in free_bulk().  Because for per-cpu
ptr returned by bpf memory allocator, ksize() in bpf_mem_free() will
return 16 constantly, so it will be unmatched with the unit_size of the
source bpf_mem_cache. Have not found a good solution for the problem yet
except disabling the warning for per-cpu bpf memory allocator, because
now per-cpu allocator doesn't have an API to return the size of the
returned ptr. Could we make the verifier to remember the size of the
allocator object and pass it to bpf memory allocator ?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation
  2023-08-27 15:27 ` [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
  2023-09-13 14:10   ` Hou Tao
@ 2023-11-15 15:31   ` Heiko Carstens
  2023-11-15 15:54     ` Alexei Starovoitov
  2023-11-16  1:15     ` Hou Tao
  1 sibling, 2 replies; 23+ messages in thread
From: Heiko Carstens @ 2023-11-15 15:31 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Marc Hartmayer, Mikhail Zaslonko,
	linux-s390

On Sun, Aug 27, 2023 at 08:27:34AM -0700, Yonghong Song wrote:
> This is needed for later percpu mem allocation when the
> allocation is done by bpf program. For such cases, a global
> bpf_global_percpu_ma is added where a flexible allocation
> size is needed.
> 
> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> ---
>  include/linux/bpf.h   |  4 ++--
>  kernel/bpf/core.c     |  8 +++++---
>  kernel/bpf/memalloc.c | 14 ++++++--------
>  3 files changed, 13 insertions(+), 13 deletions(-)

Both Marc and Mikhail reported out-of-memory conditions on s390 machines,
and bisected it down to this upstream commit 41a5db8d8161 ("bpf: Add
support for non-fix-size percpu mem allocation").
This seems to eat up a lot of memory only based on the number of possible
CPUs.

If we have a machine with 8GB, 6 present CPUs and 512 possible CPUs (yes,
this is a realistic scenario) the memory consumption directly after boot
is:

$ cat /sys/devices/system/cpu/present
0-5
$ cat /sys/devices/system/cpu/possible
0-511

Before this commit:

$ cat /proc/meminfo
MemTotal:        8141924 kB
MemFree:         7639872 kB

With this commit

$ cat /proc/meminfo
MemTotal:        8141924 kB
MemFree:         4852248 kB

So, this appears to be a significant regression.
I'm quoting the rest of the original patch below for reference only.

> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 12596af59c00..144dbddf53bd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -55,8 +55,8 @@ struct cgroup;
>  extern struct idr btf_idr;
>  extern spinlock_t btf_idr_lock;
>  extern struct kobject *btf_kobj;
> -extern struct bpf_mem_alloc bpf_global_ma;
> -extern bool bpf_global_ma_set;
> +extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> +extern bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>  
>  typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64);
>  typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 0f8f036d8bd1..95599df82ee4 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -64,8 +64,8 @@
>  #define OFF	insn->off
>  #define IMM	insn->imm
>  
> -struct bpf_mem_alloc bpf_global_ma;
> -bool bpf_global_ma_set;
> +struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
> +bool bpf_global_ma_set, bpf_global_percpu_ma_set;
>  
>  /* No hurry in this branch
>   *
> @@ -2921,7 +2921,9 @@ static int __init bpf_global_ma_init(void)
>  
>  	ret = bpf_mem_alloc_init(&bpf_global_ma, 0, false);
>  	bpf_global_ma_set = !ret;
> -	return ret;
> +	ret = bpf_mem_alloc_init(&bpf_global_percpu_ma, 0, true);
> +	bpf_global_percpu_ma_set = !ret;
> +	return !bpf_global_ma_set || !bpf_global_percpu_ma_set;
>  }
>  late_initcall(bpf_global_ma_init);
>  #endif
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 9c49ae53deaf..cb60445de98a 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -499,15 +499,16 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  	struct obj_cgroup *objcg = NULL;
>  	int cpu, i, unit_size, percpu_size = 0;
>  
> +	/* room for llist_node and per-cpu pointer */
> +	if (percpu)
> +		percpu_size = LLIST_NODE_SZ + sizeof(void *);
> +
>  	if (size) {
>  		pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL);
>  		if (!pc)
>  			return -ENOMEM;
>  
> -		if (percpu)
> -			/* room for llist_node and per-cpu pointer */
> -			percpu_size = LLIST_NODE_SZ + sizeof(void *);
> -		else
> +		if (!percpu)
>  			size += LLIST_NODE_SZ; /* room for llist_node */
>  		unit_size = size;
>  
> @@ -527,10 +528,6 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  		return 0;
>  	}
>  
> -	/* size == 0 && percpu is an invalid combination */
> -	if (WARN_ON_ONCE(percpu))
> -		return -EINVAL;
> -
>  	pcc = __alloc_percpu_gfp(sizeof(*cc), 8, GFP_KERNEL);
>  	if (!pcc)
>  		return -ENOMEM;
> @@ -543,6 +540,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
>  			c = &cc->cache[i];
>  			c->unit_size = sizes[i];
>  			c->objcg = objcg;
> +			c->percpu_size = percpu_size;
>  			c->tgt = c;
>  			prefill_mem_cache(c, cpu);
>  		}
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation
  2023-11-15 15:31   ` Heiko Carstens
@ 2023-11-15 15:54     ` Alexei Starovoitov
  2023-11-16  1:15     ` Hou Tao
  1 sibling, 0 replies; 23+ messages in thread
From: Alexei Starovoitov @ 2023-11-15 15:54 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Kernel Team, Martin KaFai Lau, Marc Hartmayer,
	Mikhail Zaslonko, linux-s390

On Wed, Nov 15, 2023 at 7:32 AM Heiko Carstens <hca@linux.ibm.com> wrote:
>
> On Sun, Aug 27, 2023 at 08:27:34AM -0700, Yonghong Song wrote:
> > This is needed for later percpu mem allocation when the
> > allocation is done by bpf program. For such cases, a global
> > bpf_global_percpu_ma is added where a flexible allocation
> > size is needed.
> >
> > Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> > ---
> >  include/linux/bpf.h   |  4 ++--
> >  kernel/bpf/core.c     |  8 +++++---
> >  kernel/bpf/memalloc.c | 14 ++++++--------
> >  3 files changed, 13 insertions(+), 13 deletions(-)
>
> Both Marc and Mikhail reported out-of-memory conditions on s390 machines,
> and bisected it down to this upstream commit 41a5db8d8161 ("bpf: Add
> support for non-fix-size percpu mem allocation").
> This seems to eat up a lot of memory only based on the number of possible
> CPUs.
>
> If we have a machine with 8GB, 6 present CPUs and 512 possible CPUs (yes,
> this is a realistic scenario) the memory consumption directly after boot
> is:
>
> $ cat /sys/devices/system/cpu/present
> 0-5
> $ cat /sys/devices/system/cpu/possible
> 0-511
>
> Before this commit:
>
> $ cat /proc/meminfo
> MemTotal:        8141924 kB
> MemFree:         7639872 kB
>
> With this commit
>
> $ cat /proc/meminfo
> MemTotal:        8141924 kB
> MemFree:         4852248 kB
>
> So, this appears to be a significant regression.
> I'm quoting the rest of the original patch below for reference only.

Yes. Sorry about this. The issue slipped through.
It's fixed in bpf tree by commit:
https://patchwork.kernel.org/project/netdevbpf/patch/20231111013928.948838-1-yonghong.song@linux.dev/

The fix will be sent to Linus soon.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation
  2023-11-15 15:31   ` Heiko Carstens
  2023-11-15 15:54     ` Alexei Starovoitov
@ 2023-11-16  1:15     ` Hou Tao
  2023-11-16  5:52       ` Yonghong Song
  2023-11-16 13:54       ` Heiko Carstens
  1 sibling, 2 replies; 23+ messages in thread
From: Hou Tao @ 2023-11-16  1:15 UTC (permalink / raw)
  To: Heiko Carstens, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Marc Hartmayer, Mikhail Zaslonko,
	linux-s390

Hi,

On 11/15/2023 11:31 PM, Heiko Carstens wrote:
> On Sun, Aug 27, 2023 at 08:27:34AM -0700, Yonghong Song wrote:
>> This is needed for later percpu mem allocation when the
>> allocation is done by bpf program. For such cases, a global
>> bpf_global_percpu_ma is added where a flexible allocation
>> size is needed.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>  include/linux/bpf.h   |  4 ++--
>>  kernel/bpf/core.c     |  8 +++++---
>>  kernel/bpf/memalloc.c | 14 ++++++--------
>>  3 files changed, 13 insertions(+), 13 deletions(-)
> Both Marc and Mikhail reported out-of-memory conditions on s390 machines,
> and bisected it down to this upstream commit 41a5db8d8161 ("bpf: Add
> support for non-fix-size percpu mem allocation").
> This seems to eat up a lot of memory only based on the number of possible
> CPUs.
>
> If we have a machine with 8GB, 6 present CPUs and 512 possible CPUs (yes,
> this is a realistic scenario) the memory consumption directly after boot
> is:
>
> $ cat /sys/devices/system/cpu/present
> 0-5
> $ cat /sys/devices/system/cpu/possible
> 0-511

Will the present CPUs be hot-added dynamically and eventually increase
to 512 CPUs ? Or will the present CPUs rarely be hot-added ? After all
possible CPUs are online, will these CPUs be hot-plugged dynamically ?
Because I am considering add CPU hotplug support for bpf mem allocator,
so we can allocate memory according to the present CPUs instead of
possible CPUs. But if the present CPUs will be increased to all possible
CPUs quickly, there will be not too much benefit to support hotplug in
bpf mem allocator.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation
  2023-11-16  1:15     ` Hou Tao
@ 2023-11-16  5:52       ` Yonghong Song
  2023-11-16 13:54       ` Heiko Carstens
  1 sibling, 0 replies; 23+ messages in thread
From: Yonghong Song @ 2023-11-16  5:52 UTC (permalink / raw)
  To: Hou Tao, Heiko Carstens
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, Marc Hartmayer, Mikhail Zaslonko,
	linux-s390


On 11/15/23 8:15 PM, Hou Tao wrote:
> Hi,
>
> On 11/15/2023 11:31 PM, Heiko Carstens wrote:
>> On Sun, Aug 27, 2023 at 08:27:34AM -0700, Yonghong Song wrote:
>>> This is needed for later percpu mem allocation when the
>>> allocation is done by bpf program. For such cases, a global
>>> bpf_global_percpu_ma is added where a flexible allocation
>>> size is needed.
>>>
>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>>> ---
>>>   include/linux/bpf.h   |  4 ++--
>>>   kernel/bpf/core.c     |  8 +++++---
>>>   kernel/bpf/memalloc.c | 14 ++++++--------
>>>   3 files changed, 13 insertions(+), 13 deletions(-)
>> Both Marc and Mikhail reported out-of-memory conditions on s390 machines,
>> and bisected it down to this upstream commit 41a5db8d8161 ("bpf: Add
>> support for non-fix-size percpu mem allocation").
>> This seems to eat up a lot of memory only based on the number of possible
>> CPUs.
>>
>> If we have a machine with 8GB, 6 present CPUs and 512 possible CPUs (yes,
>> this is a realistic scenario) the memory consumption directly after boot
>> is:
>>
>> $ cat /sys/devices/system/cpu/present
>> 0-5
>> $ cat /sys/devices/system/cpu/possible
>> 0-511
> Will the present CPUs be hot-added dynamically and eventually increase
> to 512 CPUs ? Or will the present CPUs rarely be hot-added ? After all
> possible CPUs are online, will these CPUs be hot-plugged dynamically ?
> Because I am considering add CPU hotplug support for bpf mem allocator,
> so we can allocate memory according to the present CPUs instead of
> possible CPUs. But if the present CPUs will be increased to all possible
> CPUs quickly, there will be not too much benefit to support hotplug in
> bpf mem allocator.

Thanks, Hou. In my development machine and also checking
some internal production machines, no suprisingly, the 'present' number
cpus is equal to the 'possible' number of cpus. I suspect in most cases,
'present' and 'possible' are the same. But it would be good that
other people can share their 'present != possible' configuration
and explain why.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation
  2023-11-16  1:15     ` Hou Tao
  2023-11-16  5:52       ` Yonghong Song
@ 2023-11-16 13:54       ` Heiko Carstens
  1 sibling, 0 replies; 23+ messages in thread
From: Heiko Carstens @ 2023-11-16 13:54 UTC (permalink / raw)
  To: Hou Tao
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau, Marc Hartmayer,
	Mikhail Zaslonko, linux-s390

On Thu, Nov 16, 2023 at 09:15:26AM +0800, Hou Tao wrote:
> > If we have a machine with 8GB, 6 present CPUs and 512 possible CPUs (yes,
> > this is a realistic scenario) the memory consumption directly after boot
> > is:
> >
> > $ cat /sys/devices/system/cpu/present
> > 0-5
> > $ cat /sys/devices/system/cpu/possible
> > 0-511
> 
> Will the present CPUs be hot-added dynamically and eventually increase
> to 512 CPUs ? Or will the present CPUs rarely be hot-added ? After all
> possible CPUs are online, will these CPUs be hot-plugged dynamically ?
> Because I am considering add CPU hotplug support for bpf mem allocator,
> so we can allocate memory according to the present CPUs instead of
> possible CPUs. But if the present CPUs will be increased to all possible
> CPUs quickly, there will be not too much benefit to support hotplug in
> bpf mem allocator.

You can assume that the present CPUs would change only very rarely. Even
though we are only talking about virtual CPUs in this case systems are
usually setup in a way that they have enough CPUs for their workload. Only
if that is not the case additional CPUs may be added (and brought online) -
which is usually much later than boot time.

Obviously the above is even more true for systems where you have to add new
CPUs in a physical way in order to change present CPUs.

So I guess it is fair to assume that if there is such a large difference
between present and possible CPUs, that this will also stay that way while
the system is running in most cases.

Or in other words: it sounds like it is worth to add CPU hotplug support
for the the bpf mem allocator (without that I would know what that would
really mean for the bpf code).

Note for the above numbers: I hacked the number of possible CPUs manually
in the kernel code just to illustrate the high memory consumption for the
report. On a real system you would see "0-399" CPUs instead.
But that's just a minor detail.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-11-16 13:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-27 15:27 [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr Yonghong Song
2023-08-27 15:27 ` [PATCH bpf-next v3 01/13] bpf: Add support for non-fix-size percpu mem allocation Yonghong Song
2023-09-13 14:10   ` Hou Tao
2023-11-15 15:31   ` Heiko Carstens
2023-11-15 15:54     ` Alexei Starovoitov
2023-11-16  1:15     ` Hou Tao
2023-11-16  5:52       ` Yonghong Song
2023-11-16 13:54       ` Heiko Carstens
2023-08-27 15:27 ` [PATCH bpf-next v3 02/13] bpf: Add BPF_KPTR_PERCPU as a field type Yonghong Song
2023-08-27 15:27 ` [PATCH bpf-next v3 03/13] bpf: Add alloc/xchg/direct_access support for local percpu kptr Yonghong Song
2023-09-06  0:40   ` Alexei Starovoitov
2023-08-27 15:27 ` [PATCH bpf-next v3 04/13] bpf: Add bpf_this_cpu_ptr/bpf_per_cpu_ptr support for allocated percpu obj Yonghong Song
2023-08-27 15:27 ` [PATCH bpf-next v3 05/13] selftests/bpf: Update error message in negative linked_list test Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 06/13] libbpf: Add __percpu_kptr macro definition Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 07/13] selftests/bpf: Add bpf_percpu_obj_{new,drop}() macro in bpf_experimental.h Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 08/13] selftests/bpf: Add tests for array map with local percpu kptr Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 09/13] bpf: Mark OBJ_RELEASE argument as MEM_RCU when possible Yonghong Song
2023-09-06  0:37   ` Alexei Starovoitov
2023-08-27 15:28 ` [PATCH bpf-next v3 10/13] selftests/bpf: Remove unnecessary direct read of local percpu kptr Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 11/13] selftests/bpf: Add tests for cgrp_local_storage with " Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 12/13] selftests/bpf: Add some negative tests Yonghong Song
2023-08-27 15:28 ` [PATCH bpf-next v3 13/13] bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated Yonghong Song
2023-09-06  0:50 ` [PATCH bpf-next v3 00/13] bpf: Add support for local percpu kptr patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).