All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/5] bpf: Add bpf_rcu_read_lock() support
@ 2022-11-03  7:21 Yonghong Song
  2022-11-03  7:21 ` [PATCH bpf-next 1/5] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-03  7:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Currently, without rcu attribute info in BTF, the verifier treats
rcu tagged pointer as a normal pointer. This might be a problem
for sleepable program where rcu_read_lock()/unlock() is not available.
For example, for a sleepable fentry program, if rcu protected memory
access is interleaved with a sleepable helper/kfunc, it is possible
the memory access after the sleepable helper/kfunc might be invalid
since the object might have been freed then. Even without
a sleepable helper/kfunc, without rcu_read_lock() protection,
it is possible that the rcu protected object might be release
in the middle of bpf program execution which may cause incorrect
result.

To prevent above cases, enable btf_type_tag("rcu") attributes,
introduce new bpf_rcu_read_lock/unlock() helpers and add verifier support.
In the rest of patch set, Patch 1 enabled btf_type_tag for __rcu
attribute. Patch 2 added new bpf_rcu_read_lock/unlock() helpers.
Patch 3 added verifier support and Patch 4 enabled sleepable
program support for cgrp local storage. Patch 5 added some tests
for new helpers and verifier support.

Yonghong Song (5):
  compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu")))
  bpf: Add bpf_rcu_read_lock/unlock helper
  bpf: Add rcu btf_type_tag verifier support
  bpf: Enable sleeptable support for cgrp local storage
  selftests/bpf: Add tests for bpf_rcu_read_lock()

 include/linux/bpf.h                           |   5 +
 include/linux/bpf_verifier.h                  |   1 +
 include/linux/compiler_types.h                |   3 +-
 include/uapi/linux/bpf.h                      |  14 +
 kernel/bpf/btf.c                              |  11 +
 kernel/bpf/core.c                             |   2 +
 kernel/bpf/helpers.c                          |  26 ++
 kernel/bpf/verifier.c                         | 129 +++++++++-
 tools/include/uapi/linux/bpf.h                |  14 +
 .../selftests/bpf/prog_tests/rcu_read_lock.c  | 101 ++++++++
 .../selftests/bpf/progs/rcu_read_lock.c       | 241 ++++++++++++++++++
 11 files changed, 537 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/rcu_read_lock.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 1/5] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu")))
  2022-11-03  7:21 [PATCH bpf-next 0/5] bpf: Add bpf_rcu_read_lock() support Yonghong Song
@ 2022-11-03  7:21 ` Yonghong Song
  2022-11-03 14:30   ` KP Singh
  2022-11-03  7:21 ` [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper Yonghong Song
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2022-11-03  7:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Currently, without rcu attribute info in BTF, the verifier treats
rcu tagged pointer as a normal pointer. This might be a problem
for sleepable program where rcu_read_lock()/unlock() is not available.
For example, for a sleepable fentry program, if rcu protected memory
access is interleaved with a sleepable helper/kfunc, it is possible
the memory access after the sleepable helper/kfunc might be invalid
since the object might have been freed then. To prevent such cases,
introducing rcu tagging for memory accesses in verifier can help
to reject such programs.

To enable rcu tagging in BTF, during kernel compilation,
define __rcu as attribute btf_type_tag("rcu") so __rcu information can
be preserved in dwarf and btf, and later can be used for bpf prog verification.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/compiler_types.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index eb0466236661..7c1afe0f4129 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -49,7 +49,8 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { }
 # endif
 # define __iomem
 # define __percpu	BTF_TYPE_TAG(percpu)
-# define __rcu
+# define __rcu		BTF_TYPE_TAG(rcu)
+
 # define __chk_user_ptr(x)	(void)0
 # define __chk_io_ptr(x)	(void)0
 /* context/locking */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-03  7:21 [PATCH bpf-next 0/5] bpf: Add bpf_rcu_read_lock() support Yonghong Song
  2022-11-03  7:21 ` [PATCH bpf-next 1/5] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
@ 2022-11-03  7:21 ` Yonghong Song
  2022-11-03 14:28   ` KP Singh
  2022-11-03 22:18   ` Kumar Kartikeya Dwivedi
  2022-11-03  7:21 ` [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support Yonghong Song
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-03  7:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add bpf_rcu_read_lock() and bpf_rcu_read_unlock() helpers.
Both helpers are available to all program types with
CAP_BPF capability.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h            |  2 ++
 include/uapi/linux/bpf.h       | 14 ++++++++++++++
 kernel/bpf/core.c              |  2 ++
 kernel/bpf/helpers.c           | 26 ++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
 5 files changed, 58 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8d948bfcb984..a9bda4c91fc7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2554,6 +2554,8 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
 extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
 extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
 extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
+extern const struct bpf_func_proto bpf_rcu_read_lock_proto;
+extern const struct bpf_func_proto bpf_rcu_read_unlock_proto;
 
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 94659f6b3395..e86389cd6133 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5481,6 +5481,18 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * void bpf_rcu_read_lock(void)
+ *	Description
+ *		Call kernel rcu_read_lock().
+ *	Return
+ *		None.
+ *
+ * void bpf_rcu_read_unlock(void)
+ *	Description
+ *		Call kernel rcu_read_unlock().
+ *	Return
+ *		None.
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
@@ -5695,6 +5707,8 @@ union bpf_attr {
 	FN(user_ringbuf_drain, 209, ##ctx)		\
 	FN(cgrp_storage_get, 210, ##ctx)		\
 	FN(cgrp_storage_delete, 211, ##ctx)		\
+	FN(rcu_read_lock, 212, ##ctx)			\
+	FN(rcu_read_unlock, 213, ##ctx)			\
 	/* */
 
 /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9c16338bcbe8..26858b579dfc 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2627,6 +2627,8 @@ const struct bpf_func_proto bpf_map_lookup_percpu_elem_proto __weak;
 const struct bpf_func_proto bpf_spin_lock_proto __weak;
 const struct bpf_func_proto bpf_spin_unlock_proto __weak;
 const struct bpf_func_proto bpf_jiffies64_proto __weak;
+const struct bpf_func_proto bpf_rcu_read_lock_proto __weak;
+const struct bpf_func_proto bpf_rcu_read_unlock_proto __weak;
 
 const struct bpf_func_proto bpf_get_prandom_u32_proto __weak;
 const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 124fd199ce5c..f19416dc8ad1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1273,6 +1273,28 @@ static const struct bpf_func_proto bpf_timer_start_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_0(bpf_rcu_read_lock) {
+	rcu_read_lock();
+	return 0;
+}
+
+const struct bpf_func_proto bpf_rcu_read_lock_proto = {
+	.func		= bpf_rcu_read_lock,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+};
+
+BPF_CALL_0(bpf_rcu_read_unlock) {
+	rcu_read_unlock();
+	return 0;
+}
+
+const struct bpf_func_proto bpf_rcu_read_unlock_proto = {
+	.func		= bpf_rcu_read_unlock,
+	.gpl_only	= false,
+	.ret_type	= RET_VOID,
+};
+
 static void drop_prog_refcnt(struct bpf_hrtimer *t)
 {
 	struct bpf_prog *prog = t->prog;
@@ -1627,6 +1649,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_spin_lock_proto;
 	case BPF_FUNC_spin_unlock:
 		return &bpf_spin_unlock_proto;
+	case BPF_FUNC_rcu_read_lock:
+		return &bpf_rcu_read_lock_proto;
+	case BPF_FUNC_rcu_read_unlock:
+		return &bpf_rcu_read_unlock_proto;
 	case BPF_FUNC_jiffies64:
 		return &bpf_jiffies64_proto;
 	case BPF_FUNC_per_cpu_ptr:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 94659f6b3395..e86389cd6133 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5481,6 +5481,18 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * void bpf_rcu_read_lock(void)
+ *	Description
+ *		Call kernel rcu_read_lock().
+ *	Return
+ *		None.
+ *
+ * void bpf_rcu_read_unlock(void)
+ *	Description
+ *		Call kernel rcu_read_unlock().
+ *	Return
+ *		None.
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
@@ -5695,6 +5707,8 @@ union bpf_attr {
 	FN(user_ringbuf_drain, 209, ##ctx)		\
 	FN(cgrp_storage_get, 210, ##ctx)		\
 	FN(cgrp_storage_delete, 211, ##ctx)		\
+	FN(rcu_read_lock, 212, ##ctx)			\
+	FN(rcu_read_unlock, 213, ##ctx)			\
 	/* */
 
 /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-03  7:21 [PATCH bpf-next 0/5] bpf: Add bpf_rcu_read_lock() support Yonghong Song
  2022-11-03  7:21 ` [PATCH bpf-next 1/5] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
  2022-11-03  7:21 ` [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper Yonghong Song
@ 2022-11-03  7:21 ` Yonghong Song
  2022-11-03 14:43   ` KP Singh
  2022-11-03 22:17   ` Kumar Kartikeya Dwivedi
  2022-11-03  7:21 ` [PATCH bpf-next 4/5] bpf: Enable sleeptable support for cgrp local storage Yonghong Song
  2022-11-03  7:21 ` [PATCH bpf-next 5/5] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
  4 siblings, 2 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-03  7:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
object access needing rcu_read_lock protection. The rcu protection
is not needed for non-sleepable program. So various verification
checking is only done for sleepable programs. In particular, only
the following insns can be inside bpf_rcu_read_lock() region:
  - any non call insns except BPF_ABS/BPF_IND
  - non sleepable helpers and kfuncs.
Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
allocation flag) should be GFP_ATOMIC.

If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
this pointer and the load which gets this pointer needs to be
protected by bpf_rcu_read_lock(). The following shows a couple
of examples:
  struct task_struct {
	...
	struct task_struct __rcu        *real_parent;
	struct css_set __rcu            *cgroups;
	...
  };
  struct css_set {
	...
	struct cgroup *dfl_cgrp;
	...
  }
  ...
  task = bpf_get_current_task_btf();
  cgroups = task->cgroups;
  dfl_cgroup = cgroups->dfl_cgrp;
  ... using dfl_cgroup ...

The bpf_rcu_read_lock/unlock() should be added like below to
avoid verification failures.
  task = bpf_get_current_task_btf();
  bpf_rcu_read_lock();
  cgroups = task->cgroups;
  dfl_cgroup = cgroups->dfl_cgrp;
  bpf_rcu_read_unlock();
  ... using dfl_cgroup ...

The following is another example for task->real_parent.
  task = bpf_get_current_task_btf();
  bpf_rcu_read_lock();
  real_parent = task->real_parent;
  ... bpf_task_storage_get(&map, real_parent, 0, 0);
  bpf_rcu_read_unlock();

There is another case observed in selftest bpf_iter_ipv6_route.c:
  struct fib6_info *rt = ctx->rt;
  ...
  fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
  ...
  if (rt->nh)
    fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
  ...
  ... using fib6_nh ...
Currently verification will fail with
  same insn cannot be used with different pointers
since the use of fib6_nh is tag with rcu in one path
but not in the other path. The above use case is a valid
one so the verifier is changed to ignore MEM_RCU type tag
in such cases.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h          |   3 +
 include/linux/bpf_verifier.h |   1 +
 kernel/bpf/btf.c             |  11 +++
 kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
 4 files changed, 133 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a9bda4c91fc7..f0d973c8d227 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -458,6 +458,9 @@ enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	/* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */
+	MEM_RCU			= BIT(11 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 1a32baa78ce2..d4e56f5a4b20 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -324,6 +324,7 @@ struct bpf_verifier_state {
 	u32 insn_idx;
 	u32 curframe;
 	u32 active_spin_lock;
+	u32 active_rcu_lock;
 	bool speculative;
 
 	/* first and last insn idx of this verifier state */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 35c07afac924..293d224a7f53 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -5527,6 +5527,8 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 			info->reg_type |= MEM_USER;
 		if (strcmp(tag_value, "percpu") == 0)
 			info->reg_type |= MEM_PERCPU;
+		if (strcmp(tag_value, "rcu") == 0)
+			info->reg_type |= MEM_RCU;
 	}
 
 	/* skip modifiers */
@@ -5765,6 +5767,9 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
 				/* check __percpu tag */
 				if (strcmp(tag_value, "percpu") == 0)
 					tmp_flag = MEM_PERCPU;
+				/* check __rcu tag */
+				if (strcmp(tag_value, "rcu") == 0)
+					tmp_flag = MEM_RCU;
 			}
 
 			stype = btf_type_skip_modifiers(btf, mtype->type, &id);
@@ -6560,6 +6565,12 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
+	if (sleepable && env->cur_state->active_rcu_lock) {
+		bpf_log(log, "kernel function %s is sleepable within rcu_read_lock region\n",
+			func_name);
+		return -EINVAL;
+	}
+
 	if (kfunc_meta && ref_obj_id)
 		kfunc_meta->ref_obj_id = ref_obj_id;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 82c07fe0bfb1..3c5afd3bc216 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -188,6 +188,9 @@ struct bpf_verifier_stack_elem {
 					  POISON_POINTER_DELTA))
 #define BPF_MAP_PTR(X)		((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
 
+/* Using insn->off = BPF_STORAGE_GET_CALL to mark bpf_*_storage_get() helper calls. */
+#define BPF_STORAGE_GET_CALL	1
+
 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
 
@@ -511,6 +514,22 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
 	return func_id == BPF_FUNC_dynptr_data;
 }
 
+static bool is_storage_get_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_sk_storage_get ||
+	       func_id == BPF_FUNC_inode_storage_get ||
+	       func_id == BPF_FUNC_task_storage_get ||
+	       func_id == BPF_FUNC_cgrp_storage_get;
+}
+
+static bool is_sleepable_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_copy_from_user ||
+	       func_id == BPF_FUNC_copy_from_user_task ||
+	       func_id == BPF_FUNC_ima_inode_hash ||
+	       func_id == BPF_FUNC_ima_file_hash;
+}
+
 static bool helper_multiple_ref_obj_use(enum bpf_func_id func_id,
 					const struct bpf_map *map)
 {
@@ -583,6 +602,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 		strncpy(prefix, "percpu_", 32);
 	if (type & PTR_UNTRUSTED)
 		strncpy(prefix, "untrusted_", 32);
+	if (type & MEM_RCU)
+		strncpy(prefix, "rcu_", 32);
 
 	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
 		 prefix, str[base_type(type)], postfix);
@@ -1201,6 +1222,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 	dst_state->speculative = src->speculative;
 	dst_state->curframe = src->curframe;
 	dst_state->active_spin_lock = src->active_spin_lock;
+	dst_state->active_rcu_lock = src->active_rcu_lock;
 	dst_state->branches = src->branches;
 	dst_state->parent = src->parent;
 	dst_state->first_insn_idx = src->first_insn_idx;
@@ -4536,6 +4558,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
+	if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
+	    !env->cur_state->active_rcu_lock) {
+		verbose(env,
+			"R%d is ptr_%s access rcu-protected memory with off=%d, not in rcu_read_lock region\n",
+			regno, tname, off);
+		return -EACCES;
+	}
+
 	if (env->ops->btf_struct_access) {
 		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
 						  off, size, atype, &btf_id, &flag);
@@ -4552,6 +4582,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	if (ret < 0)
 		return ret;
 
+	if ((flag & MEM_RCU) && env->prog->aux->sleepable &&
+	    !env->cur_state->active_rcu_lock) {
+		verbose(env,
+			"R%d is rcu dereference ptr_%s with off=%d, not in rcu_read_lock region\n",
+			regno, tname, off);
+		return -EACCES;
+	}
+
 	/* If this is an untrusted pointer, all pointers formed by walking it
 	 * also inherit the untrusted flag.
 	 */
@@ -5684,7 +5722,12 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
 static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
+static const struct bpf_reg_types btf_ptr_types = {
+	.types = {
+		PTR_TO_BTF_ID,
+		PTR_TO_BTF_ID | MEM_RCU,
+	}
+};
 static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
@@ -5758,6 +5801,20 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	if (arg_type & PTR_MAYBE_NULL)
 		type &= ~PTR_MAYBE_NULL;
 
+	/* If the reg type is marked as MEM_RCU, ensure the usage is in the rcu_read_lock
+	 * region, and remove MEM_RCU from the type since the arg_type won't encode
+	 * MEM_RCU.
+	 */
+	if (type & MEM_RCU) {
+		if (env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) {
+			verbose(env,
+				"R%d is arg type %s needs rcu protection\n",
+				regno, reg_type_str(env, reg->type));
+			return -EACCES;
+		}
+		type &= ~MEM_RCU;
+	}
+
 	for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
 		expected = compatible->types[i];
 		if (expected == NOT_INIT)
@@ -5774,7 +5831,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	return -EACCES;
 
 found:
-	if (reg->type == PTR_TO_BTF_ID) {
+	/* reg is already protected by rcu_read_lock(). Peel off MEM_RCU from reg->type. */
+	if ((reg->type & ~MEM_RCU) == PTR_TO_BTF_ID) {
 		/* For bpf_sk_release, it needs to match against first member
 		 * 'struct sock_common', hence make an exception for it. This
 		 * allows bpf_sk_release to work for multiple socket types.
@@ -5850,6 +5908,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID | MEM_RCU:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * it's fixed offset must be 0.	In the other cases, fixed offset
 		 * can be non-zero.
@@ -7289,6 +7348,26 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 	}
 
 	meta.func_id = func_id;
+
+	if (func_id == BPF_FUNC_rcu_read_lock)
+		env->cur_state->active_rcu_lock++;
+	if (func_id == BPF_FUNC_rcu_read_unlock) {
+		if (env->cur_state->active_rcu_lock == 0) {
+			verbose(env, "missing bpf_rcu_read_lock\n");
+			return -EINVAL;
+		}
+
+		env->cur_state->active_rcu_lock--;
+	}
+	if (env->cur_state->active_rcu_lock) {
+		if (is_sleepable_function(func_id))
+			verbose(env, "sleepable helper %s#%din rcu_read_lock region\n",
+				func_id_name(func_id), func_id);
+
+		if (env->prog->aux->sleepable && is_storage_get_function(func_id))
+			insn->off = BPF_STORAGE_GET_CALL;
+	}
+
 	/* check args */
 	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
 		err = check_func_arg(env, i, &meta, fn);
@@ -10470,6 +10549,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		return -EINVAL;
 	}
 
+	if (env->prog->aux->sleepable && env->cur_state->active_rcu_lock) {
+		verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_rcu_read_lock-ed region\n");
+		return -EINVAL;
+	}
+
 	if (regs[ctx_reg].type != PTR_TO_CTX) {
 		verbose(env,
 			"at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
@@ -11734,6 +11818,9 @@ static bool states_equal(struct bpf_verifier_env *env,
 	if (old->active_spin_lock != cur->active_spin_lock)
 		return false;
 
+	if (old->active_rcu_lock != cur->active_rcu_lock)
+		return false;
+
 	/* for states to be equal callsites have to be the same
 	 * and all frame states need to be equivalent
 	 */
@@ -12141,6 +12228,11 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
 			       !reg_type_mismatch_ok(prev));
 }
 
+static bool reg_type_mismatch_ignore_rcu(enum bpf_reg_type src, enum bpf_reg_type prev)
+{
+	return reg_type_mismatch(src & ~MEM_RCU, prev & ~MEM_RCU);
+}
+
 static int do_check(struct bpf_verifier_env *env)
 {
 	bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
@@ -12266,6 +12358,18 @@ static int do_check(struct bpf_verifier_env *env)
 
 			prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type;
 
+			/* For NOT_INIT *prev_src_type, ignore rcu type tag.
+			 * For code like below,
+			 *   struct foo *f;
+			 *   if (...)
+			 *     f = ...; // f with MEM_RCU type tag.
+			 *   else
+			 *     f = ...; // f without MEM_RCU type tag.
+			 *   ... f ...  // Here f could be with/without MEM_RCU
+			 *
+			 * It is safe to ignore MEM_RCU type tag here since
+			 * base types are the same.
+			 */
 			if (*prev_src_type == NOT_INIT) {
 				/* saw a valid insn
 				 * dst_reg = *(u32 *)(src_reg + off)
@@ -12273,7 +12377,7 @@ static int do_check(struct bpf_verifier_env *env)
 				 */
 				*prev_src_type = src_reg_type;
 
-			} else if (reg_type_mismatch(src_reg_type, *prev_src_type)) {
+			} else if (reg_type_mismatch_ignore_rcu(src_reg_type, *prev_src_type)) {
 				/* ABuser program is trying to use the same insn
 				 * dst_reg = *(u32*) (src_reg + off)
 				 * with different pointer types:
@@ -12412,6 +12516,11 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
+				if (env->cur_state->active_rcu_lock) {
+					verbose(env, "bpf_rcu_read_unlock is missing\n");
+					return -EINVAL;
+				}
+
 				/* We must do check_reference_leak here before
 				 * prepare_func_exit to handle the case when
 				 * state->curframe > 0, it may be a callback
@@ -13499,6 +13608,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
 			break;
 		case PTR_TO_BTF_ID:
+		case PTR_TO_BTF_ID | MEM_RCU:
 		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
 			if (type == BPF_READ) {
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
@@ -14148,11 +14258,11 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			goto patch_call_imm;
 		}
 
-		if (insn->imm == BPF_FUNC_task_storage_get ||
-		    insn->imm == BPF_FUNC_sk_storage_get ||
-		    insn->imm == BPF_FUNC_inode_storage_get ||
-		    insn->imm == BPF_FUNC_cgrp_storage_get) {
-			if (env->prog->aux->sleepable)
+		if (is_storage_get_function(insn->imm)) {
+			if (env->prog->aux->sleepable && insn->off) {
+				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
+				insn->off = 0;
+			} else if (env->prog->aux->sleepable)
 				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
 			else
 				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 4/5] bpf: Enable sleeptable support for cgrp local storage
  2022-11-03  7:21 [PATCH bpf-next 0/5] bpf: Add bpf_rcu_read_lock() support Yonghong Song
                   ` (2 preceding siblings ...)
  2022-11-03  7:21 ` [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support Yonghong Song
@ 2022-11-03  7:21 ` Yonghong Song
  2022-11-03 14:45   ` KP Singh
  2022-11-03  7:21 ` [PATCH bpf-next 5/5] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
  4 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2022-11-03  7:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

With proper bpf_rcu_read_lock() support, sleepable support for cgrp local
storage can be enabled as typical use case task->cgroups->dfl_cgrp
can be protected with bpf_rcu_read_lock().

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/verifier.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c5afd3bc216..47b897a28242 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12823,10 +12823,11 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		case BPF_MAP_TYPE_INODE_STORAGE:
 		case BPF_MAP_TYPE_SK_STORAGE:
 		case BPF_MAP_TYPE_TASK_STORAGE:
+		case BPF_MAP_TYPE_CGRP_STORAGE:
 			break;
 		default:
 			verbose(env,
-				"Sleepable programs can only use array, hash, and ringbuf maps\n");
+				"Sleepable programs can only use array, hash, ringbuf and local storage maps\n");
 			return -EINVAL;
 		}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 5/5] selftests/bpf: Add tests for bpf_rcu_read_lock()
  2022-11-03  7:21 [PATCH bpf-next 0/5] bpf: Add bpf_rcu_read_lock() support Yonghong Song
                   ` (3 preceding siblings ...)
  2022-11-03  7:21 ` [PATCH bpf-next 4/5] bpf: Enable sleeptable support for cgrp local storage Yonghong Song
@ 2022-11-03  7:21 ` Yonghong Song
  2022-11-03 14:54   ` KP Singh
  4 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2022-11-03  7:21 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add a few positive/negative tests to test bpf_rcu_read_lock()
and its corresponding verifier support.

  ./test_progs -t rcu_read_lock
  ...
  #145/1   rcu_read_lock/local_storage:OK
  #145/2   rcu_read_lock/runtime_diff_rcu_tag:OK
  #145/3   rcu_read_lock/negative_tests:OK
  #145     rcu_read_lock:OK
  Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/prog_tests/rcu_read_lock.c  | 101 ++++++++
 .../selftests/bpf/progs/rcu_read_lock.c       | 241 ++++++++++++++++++
 2 files changed, 342 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/rcu_read_lock.c

diff --git a/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
new file mode 100644
index 000000000000..46c02bdb1360
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates.*/
+
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <test_progs.h>
+#include "rcu_read_lock.skel.h"
+
+static void test_local_storage(void)
+{
+	struct rcu_read_lock *skel;
+	int err;
+
+        skel = rcu_read_lock__open();
+        if (!ASSERT_OK_PTR(skel, "skel_open"))
+                return;
+
+	skel->bss->target_pid = syscall(SYS_gettid);
+
+	bpf_program__set_autoload(skel->progs.cgrp_succ, true);
+	bpf_program__set_autoload(skel->progs.task_succ, true);
+	err = rcu_read_lock__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto done;
+
+	err = rcu_read_lock__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto done;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->result, 2, "result");
+done:
+	rcu_read_lock__destroy(skel);
+	return;
+}
+
+static void test_runtime_diff_rcu_tag(void)
+{
+	struct rcu_read_lock *skel;
+	int err;
+
+        skel = rcu_read_lock__open();
+        if (!ASSERT_OK_PTR(skel, "skel_open"))
+                return;
+
+	bpf_program__set_autoload(skel->progs.dump_ipv6_route, true);
+	err = rcu_read_lock__load(skel);
+	ASSERT_OK(err, "skel_load");
+	rcu_read_lock__destroy(skel);
+	return;
+}
+
+static void test_negative(void)
+{
+#define NUM_FAILED_PROGS	7
+	struct bpf_program *failed_progs[NUM_FAILED_PROGS];
+	struct rcu_read_lock *skel;
+	int i, err;
+
+        skel = rcu_read_lock__open();
+        if (!ASSERT_OK_PTR(skel, "skel_open"))
+                return;
+
+	failed_progs[0] = skel->progs.miss_lock;
+	failed_progs[1] = skel->progs.miss_unlock;
+	failed_progs[2] = skel->progs.cgrp_incorrect_rcu_region;
+	failed_progs[3] = skel->progs.task_incorrect_rcu_region1;
+	failed_progs[4] = skel->progs.task_incorrect_rcu_region2;
+	failed_progs[5] = skel->progs.inproper_sleepable_helper;
+	failed_progs[6] = skel->progs.inproper_sleepable_kfunc;
+	for (i = 0; i < NUM_FAILED_PROGS; i++) {
+		bpf_program__set_autoload(failed_progs[i], true);
+		err = rcu_read_lock__load(skel);
+		if (!ASSERT_ERR(err, "skel_load")) {
+			rcu_read_lock__destroy(skel);
+			return;
+		}
+		bpf_program__set_autoload(failed_progs[i], false);
+	}
+}
+
+void test_rcu_read_lock(void)
+{
+	int cgroup_fd;
+
+	cgroup_fd = test__join_cgroup("/rcu_read_lock");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /rcu_read_lock"))
+		return;
+
+	if (test__start_subtest("local_storage"))
+		test_local_storage();
+	if (test__start_subtest("runtime_diff_rcu_tag"))
+		test_runtime_diff_rcu_tag();
+	if (test__start_subtest("negative_tests"))
+		test_negative();
+
+	close(cgroup_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/rcu_read_lock.c b/tools/testing/selftests/bpf/progs/rcu_read_lock.c
new file mode 100644
index 000000000000..d7cf0a1bbac9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rcu_read_lock.c
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_tracing_net.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_CGRP_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, long);
+} map_a SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, long);
+} map_b SEC(".maps");
+
+__u32 user_data, key_serial, target_pid = 0;
+__u64 flags, result = 0;
+
+extern struct bpf_key *bpf_lookup_user_key(__u32 serial, __u64 flags) __ksym;
+extern void bpf_key_put(struct bpf_key *key) __ksym;
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int cgrp_succ(void *ctx)
+{
+	struct task_struct *task;
+	struct css_set *cgroups;
+	struct cgroup *dfl_cgrp;
+	long init_val = 2;
+	long *ptr;
+
+	task = bpf_get_current_task_btf();
+	if (task->pid != target_pid)
+		return 0;
+
+	bpf_rcu_read_lock();
+	cgroups = task->cgroups;
+	dfl_cgrp = cgroups->dfl_cgrp;
+	bpf_rcu_read_unlock();
+	ptr = bpf_cgrp_storage_get(&map_a, dfl_cgrp, &init_val,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!ptr)
+		return 0;
+	ptr = bpf_cgrp_storage_get(&map_a, dfl_cgrp, 0, 0);
+	if (!ptr)
+		return 0;
+	result = *ptr;
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
+int task_succ(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	task = bpf_get_current_task_btf();
+	if (task->pid != target_pid)
+		return 0;
+
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	(void)bpf_task_storage_get(&map_b, real_parent, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?iter.s/ipv6_route")
+int dump_ipv6_route(struct bpf_iter__ipv6_route *ctx)
+{
+	struct seq_file *seq = ctx->meta->seq;
+	struct fib6_info *rt = ctx->rt;
+	const struct net_device *dev;
+	struct fib6_nh *fib6_nh;
+	unsigned int flags;
+	struct nexthop *nh;
+
+	if (rt == (void *)0)
+		return 0;
+
+	fib6_nh = &rt->fib6_nh[0];
+	flags = rt->fib6_flags;
+
+	nh = rt->nh;
+	bpf_rcu_read_lock();
+	if (rt->nh)
+		fib6_nh = &nh->nh_info->fib6_nh;
+
+	if (fib6_nh->fib_nh_gw_family) {
+		flags |= RTF_GATEWAY;
+		BPF_SEQ_PRINTF(seq, "%pi6 ", &fib6_nh->fib_nh_gw6);
+	} else {
+		BPF_SEQ_PRINTF(seq, "00000000000000000000000000000000 ");
+	}
+
+	dev = fib6_nh->fib_nh_dev;
+	bpf_rcu_read_unlock();
+	if (dev)
+		BPF_SEQ_PRINTF(seq, "%08x %08x %08x %08x %8s\n", rt->fib6_metric,
+			       rt->fib6_ref.refs.counter, 0, flags, dev->name);
+	else
+		BPF_SEQ_PRINTF(seq, "%08x %08x %08x %08x\n", rt->fib6_metric,
+			       rt->fib6_ref.refs.counter, 0, flags);
+
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int miss_lock(void *ctx)
+{
+	struct task_struct *task;
+	struct css_set *cgroups;
+	struct cgroup *dfl_cgrp;
+
+	task = bpf_get_current_task_btf();
+	bpf_rcu_read_lock();
+	cgroups = task->cgroups;
+	bpf_rcu_read_unlock();
+	dfl_cgrp = cgroups->dfl_cgrp;
+	bpf_rcu_read_unlock();
+	(void)bpf_cgrp_storage_get(&map_a, dfl_cgrp, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int miss_unlock(void *ctx)
+{
+	struct task_struct *task;
+	struct css_set *cgroups;
+	struct cgroup *dfl_cgrp;
+
+	bpf_rcu_read_lock();
+	task = bpf_get_current_task_btf();
+	bpf_rcu_read_lock();
+	cgroups = task->cgroups;
+	bpf_rcu_read_unlock();
+	dfl_cgrp = cgroups->dfl_cgrp;
+	(void)bpf_cgrp_storage_get(&map_a, dfl_cgrp, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int cgrp_incorrect_rcu_region(void *ctx)
+{
+	struct task_struct *task;
+	struct css_set *cgroups;
+	struct cgroup *dfl_cgrp;
+
+	bpf_rcu_read_lock();
+	task = bpf_get_current_task_btf();
+	cgroups = task->cgroups;
+	bpf_rcu_read_unlock();
+	dfl_cgrp = cgroups->dfl_cgrp;
+	(void)bpf_cgrp_storage_get(&map_a, dfl_cgrp, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int task_incorrect_rcu_region1(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	bpf_rcu_read_unlock();
+	(void)bpf_task_storage_get(&map_b, real_parent, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int task_incorrect_rcu_region2(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	(void)bpf_task_storage_get(&map_b, real_parent, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (real_parent)
+		bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int inproper_sleepable_helper(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+	struct pt_regs *regs;
+	__u32 value = 0;
+	void *ptr;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	regs = (struct pt_regs *)bpf_task_pt_regs(real_parent);
+	if (!regs) {
+		bpf_rcu_read_unlock();
+		return 0;
+	}
+
+	ptr = (void *)PT_REGS_IP(regs);
+	(void)bpf_copy_from_user_task(&value, sizeof(uint32_t), ptr, task, 0);
+	user_data = value;
+	(void)bpf_task_storage_get(&map_b, real_parent, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?lsm.s/bpf")
+int BPF_PROG(inproper_sleepable_kfunc, int cmd, union bpf_attr *attr, unsigned int size)
+{
+	struct bpf_key *bkey;
+
+	bpf_rcu_read_lock();
+	bkey = bpf_lookup_user_key(key_serial, flags);
+	bpf_rcu_read_unlock();
+        if (!bkey)
+                return -1;
+        bpf_key_put(bkey);
+
+        return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-03  7:21 ` [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper Yonghong Song
@ 2022-11-03 14:28   ` KP Singh
  2022-11-03 21:00     ` Yonghong Song
  2022-11-03 22:18   ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 26+ messages in thread
From: KP Singh @ 2022-11-03 14:28 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>
> Add bpf_rcu_read_lock() and bpf_rcu_read_unlock() helpers.
> Both helpers are available to all program types with
> CAP_BPF capability.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h            |  2 ++
>  include/uapi/linux/bpf.h       | 14 ++++++++++++++
>  kernel/bpf/core.c              |  2 ++
>  kernel/bpf/helpers.c           | 26 ++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
>  5 files changed, 58 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 8d948bfcb984..a9bda4c91fc7 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2554,6 +2554,8 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
>  extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>  extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>  extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
> +extern const struct bpf_func_proto bpf_rcu_read_lock_proto;
> +extern const struct bpf_func_proto bpf_rcu_read_unlock_proto;
>
>  const struct bpf_func_proto *tracing_prog_func_proto(
>    enum bpf_func_id func_id, const struct bpf_prog *prog);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 94659f6b3395..e86389cd6133 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5481,6 +5481,18 @@ union bpf_attr {
>   *             0 on success.
>   *
>   *             **-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * void bpf_rcu_read_lock(void)
> + *     Description
> + *             Call kernel rcu_read_lock().

Simple wrapper around rcu_read_lock() and maybe explain where and how
it is supposed to
be used.

e.g. the verifier will check if __rcu pointers are being accessed with
bpf_rcu_read_lock in
sleepable programs.

Calling the helper from a non-sleepable program is inconsequential,
but maybe we can even
avoid exposing it to non-sleepable programs?

> + *     Return
> + *             None.
> + *
> + * void bpf_rcu_read_unlock(void)
> + *     Description
> + *             Call kernel rcu_read_unlock().
> + *     Return
> + *             None.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)                 \
>         FN(unspec, 0, ##ctx)                            \
> @@ -5695,6 +5707,8 @@ union bpf_attr {
>         FN(user_ringbuf_drain, 209, ##ctx)              \
>         FN(cgrp_storage_get, 210, ##ctx)                \
>         FN(cgrp_storage_delete, 211, ##ctx)             \
> +       FN(rcu_read_lock, 212, ##ctx)                   \
> +       FN(rcu_read_unlock, 213, ##ctx)                 \
>         /* */
>
>  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 9c16338bcbe8..26858b579dfc 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2627,6 +2627,8 @@ const struct bpf_func_proto bpf_map_lookup_percpu_elem_proto __weak;
>  const struct bpf_func_proto bpf_spin_lock_proto __weak;
>  const struct bpf_func_proto bpf_spin_unlock_proto __weak;
>  const struct bpf_func_proto bpf_jiffies64_proto __weak;
> +const struct bpf_func_proto bpf_rcu_read_lock_proto __weak;
> +const struct bpf_func_proto bpf_rcu_read_unlock_proto __weak;
>
>  const struct bpf_func_proto bpf_get_prandom_u32_proto __weak;
>  const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak;
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 124fd199ce5c..f19416dc8ad1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1273,6 +1273,28 @@ static const struct bpf_func_proto bpf_timer_start_proto = {
>         .arg3_type      = ARG_ANYTHING,
>  };
>
> +BPF_CALL_0(bpf_rcu_read_lock) {
> +       rcu_read_lock();
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_rcu_read_lock_proto = {
> +       .func           = bpf_rcu_read_lock,
> +       .gpl_only       = false,
> +       .ret_type       = RET_VOID,
> +};
> +
> +BPF_CALL_0(bpf_rcu_read_unlock) {
> +       rcu_read_unlock();
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_rcu_read_unlock_proto = {
> +       .func           = bpf_rcu_read_unlock,
> +       .gpl_only       = false,
> +       .ret_type       = RET_VOID,
> +};
> +
>  static void drop_prog_refcnt(struct bpf_hrtimer *t)
>  {
>         struct bpf_prog *prog = t->prog;
> @@ -1627,6 +1649,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>                 return &bpf_spin_lock_proto;
>         case BPF_FUNC_spin_unlock:
>                 return &bpf_spin_unlock_proto;
> +       case BPF_FUNC_rcu_read_lock:
> +               return &bpf_rcu_read_lock_proto;
> +       case BPF_FUNC_rcu_read_unlock:
> +               return &bpf_rcu_read_unlock_proto;
>         case BPF_FUNC_jiffies64:
>                 return &bpf_jiffies64_proto;
>         case BPF_FUNC_per_cpu_ptr:
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 94659f6b3395..e86389cd6133 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5481,6 +5481,18 @@ union bpf_attr {
>   *             0 on success.
>   *
>   *             **-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * void bpf_rcu_read_lock(void)
> + *     Description
> + *             Call kernel rcu_read_lock().
> + *     Return
> + *             None.
> + *
> + * void bpf_rcu_read_unlock(void)
> + *     Description
> + *             Call kernel rcu_read_unlock().
> + *     Return
> + *             None.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)                 \
>         FN(unspec, 0, ##ctx)                            \
> @@ -5695,6 +5707,8 @@ union bpf_attr {
>         FN(user_ringbuf_drain, 209, ##ctx)              \
>         FN(cgrp_storage_get, 210, ##ctx)                \
>         FN(cgrp_storage_delete, 211, ##ctx)             \
> +       FN(rcu_read_lock, 212, ##ctx)                   \
> +       FN(rcu_read_unlock, 213, ##ctx)                 \
>         /* */
>
>  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> --
> 2.30.2
>

On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>
> Add bpf_rcu_read_lock() and bpf_rcu_read_unlock() helpers.
> Both helpers are available to all program types with
> CAP_BPF capability.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h            |  2 ++
>  include/uapi/linux/bpf.h       | 14 ++++++++++++++
>  kernel/bpf/core.c              |  2 ++
>  kernel/bpf/helpers.c           | 26 ++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
>  5 files changed, 58 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 8d948bfcb984..a9bda4c91fc7 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2554,6 +2554,8 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
>  extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>  extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>  extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
> +extern const struct bpf_func_proto bpf_rcu_read_lock_proto;
> +extern const struct bpf_func_proto bpf_rcu_read_unlock_proto;
>
>  const struct bpf_func_proto *tracing_prog_func_proto(
>    enum bpf_func_id func_id, const struct bpf_prog *prog);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 94659f6b3395..e86389cd6133 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5481,6 +5481,18 @@ union bpf_attr {
>   *             0 on success.
>   *
>   *             **-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * void bpf_rcu_read_lock(void)
> + *     Description
> + *             Call kernel rcu_read_lock().
> + *     Return
> + *             None.
> + *
> + * void bpf_rcu_read_unlock(void)
> + *     Description
> + *             Call kernel rcu_read_unlock().
> + *     Return
> + *             None.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)                 \
>         FN(unspec, 0, ##ctx)                            \
> @@ -5695,6 +5707,8 @@ union bpf_attr {
>         FN(user_ringbuf_drain, 209, ##ctx)              \
>         FN(cgrp_storage_get, 210, ##ctx)                \
>         FN(cgrp_storage_delete, 211, ##ctx)             \
> +       FN(rcu_read_lock, 212, ##ctx)                   \
> +       FN(rcu_read_unlock, 213, ##ctx)                 \
>         /* */
>
>  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 9c16338bcbe8..26858b579dfc 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2627,6 +2627,8 @@ const struct bpf_func_proto bpf_map_lookup_percpu_elem_proto __weak;
>  const struct bpf_func_proto bpf_spin_lock_proto __weak;
>  const struct bpf_func_proto bpf_spin_unlock_proto __weak;
>  const struct bpf_func_proto bpf_jiffies64_proto __weak;
> +const struct bpf_func_proto bpf_rcu_read_lock_proto __weak;
> +const struct bpf_func_proto bpf_rcu_read_unlock_proto __weak;
>
>  const struct bpf_func_proto bpf_get_prandom_u32_proto __weak;
>  const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak;
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 124fd199ce5c..f19416dc8ad1 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1273,6 +1273,28 @@ static const struct bpf_func_proto bpf_timer_start_proto = {
>         .arg3_type      = ARG_ANYTHING,
>  };
>
> +BPF_CALL_0(bpf_rcu_read_lock) {
> +       rcu_read_lock();
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_rcu_read_lock_proto = {
> +       .func           = bpf_rcu_read_lock,
> +       .gpl_only       = false,
> +       .ret_type       = RET_VOID,
> +};
> +
> +BPF_CALL_0(bpf_rcu_read_unlock) {
> +       rcu_read_unlock();
> +       return 0;
> +}
> +
> +const struct bpf_func_proto bpf_rcu_read_unlock_proto = {
> +       .func           = bpf_rcu_read_unlock,
> +       .gpl_only       = false,
> +       .ret_type       = RET_VOID,
> +};
> +
>  static void drop_prog_refcnt(struct bpf_hrtimer *t)
>  {
>         struct bpf_prog *prog = t->prog;
> @@ -1627,6 +1649,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>                 return &bpf_spin_lock_proto;
>         case BPF_FUNC_spin_unlock:
>                 return &bpf_spin_unlock_proto;
> +       case BPF_FUNC_rcu_read_lock:
> +               return &bpf_rcu_read_lock_proto;
> +       case BPF_FUNC_rcu_read_unlock:
> +               return &bpf_rcu_read_unlock_proto;
>         case BPF_FUNC_jiffies64:
>                 return &bpf_jiffies64_proto;
>         case BPF_FUNC_per_cpu_ptr:
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 94659f6b3395..e86389cd6133 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5481,6 +5481,18 @@ union bpf_attr {
>   *             0 on success.
>   *
>   *             **-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * void bpf_rcu_read_lock(void)
> + *     Description
> + *             Call kernel rcu_read_lock().
> + *     Return
> + *             None.
> + *
> + * void bpf_rcu_read_unlock(void)
> + *     Description
> + *             Call kernel rcu_read_unlock().
> + *     Return
> + *             None.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)                 \
>         FN(unspec, 0, ##ctx)                            \
> @@ -5695,6 +5707,8 @@ union bpf_attr {
>         FN(user_ringbuf_drain, 209, ##ctx)              \
>         FN(cgrp_storage_get, 210, ##ctx)                \
>         FN(cgrp_storage_delete, 211, ##ctx)             \
> +       FN(rcu_read_lock, 212, ##ctx)                   \
> +       FN(rcu_read_unlock, 213, ##ctx)                 \
>         /* */
>
>  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 1/5] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu")))
  2022-11-03  7:21 ` [PATCH bpf-next 1/5] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
@ 2022-11-03 14:30   ` KP Singh
  0 siblings, 0 replies; 26+ messages in thread
From: KP Singh @ 2022-11-03 14:30 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>
> Currently, without rcu attribute info in BTF, the verifier treats
> rcu tagged pointer as a normal pointer. This might be a problem
> for sleepable program where rcu_read_lock()/unlock() is not available.
> For example, for a sleepable fentry program, if rcu protected memory
> access is interleaved with a sleepable helper/kfunc, it is possible
> the memory access after the sleepable helper/kfunc might be invalid
> since the object might have been freed then. To prevent such cases,
> introducing rcu tagging for memory accesses in verifier can help
> to reject such programs.
>
> To enable rcu tagging in BTF, during kernel compilation,
> define __rcu as attribute btf_type_tag("rcu") so __rcu information can
> be preserved in dwarf and btf, and later can be used for bpf prog verification.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Acked-by: KP Singh <kpsingh@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-03  7:21 ` [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support Yonghong Song
@ 2022-11-03 14:43   ` KP Singh
  2022-11-03 21:03     ` Yonghong Song
  2022-11-03 22:17   ` Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 26+ messages in thread
From: KP Singh @ 2022-11-03 14:43 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>
> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
> object access needing rcu_read_lock protection. The rcu protection
> is not needed for non-sleepable program. So various verification
> checking is only done for sleepable programs. In particular, only
> the following insns can be inside bpf_rcu_read_lock() region:
>   - any non call insns except BPF_ABS/BPF_IND
>   - non sleepable helpers and kfuncs.
> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
> allocation flag) should be GFP_ATOMIC.
>
> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
> this pointer and the load which gets this pointer needs to be
> protected by bpf_rcu_read_lock(). The following shows a couple
> of examples:
>   struct task_struct {
>         ...
>         struct task_struct __rcu        *real_parent;
>         struct css_set __rcu            *cgroups;
>         ...
>   };
>   struct css_set {
>         ...
>         struct cgroup *dfl_cgrp;
>         ...
>   }
>   ...
>   task = bpf_get_current_task_btf();
>   cgroups = task->cgroups;
>   dfl_cgroup = cgroups->dfl_cgrp;
>   ... using dfl_cgroup ...
>
> The bpf_rcu_read_lock/unlock() should be added like below to
> avoid verification failures.
>   task = bpf_get_current_task_btf();
>   bpf_rcu_read_lock();
>   cgroups = task->cgroups;
>   dfl_cgroup = cgroups->dfl_cgrp;
>   bpf_rcu_read_unlock();
>   ... using dfl_cgroup ...
>
> The following is another example for task->real_parent.
>   task = bpf_get_current_task_btf();
>   bpf_rcu_read_lock();
>   real_parent = task->real_parent;
>   ... bpf_task_storage_get(&map, real_parent, 0, 0);
>   bpf_rcu_read_unlock();
>
> There is another case observed in selftest bpf_iter_ipv6_route.c:
>   struct fib6_info *rt = ctx->rt;
>   ...
>   fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
>   ...
>   if (rt->nh)
>     fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
>   ...
>   ... using fib6_nh ...
> Currently verification will fail with
>   same insn cannot be used with different pointers
> since the use of fib6_nh is tag with rcu in one path
> but not in the other path. The above use case is a valid
> one so the verifier is changed to ignore MEM_RCU type tag
> in such cases.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h          |   3 +
>  include/linux/bpf_verifier.h |   1 +
>  kernel/bpf/btf.c             |  11 +++
>  kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
>  4 files changed, 133 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a9bda4c91fc7..f0d973c8d227 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -458,6 +458,9 @@ enum bpf_type_flag {
>         /* Size is known at compile time. */
>         MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
>
> +       /* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */
> +       MEM_RCU                 = BIT(11 + BPF_BASE_TYPE_BITS),
> +
>         __BPF_TYPE_FLAG_MAX,
>         __BPF_TYPE_LAST_FLAG    = __BPF_TYPE_FLAG_MAX - 1,
>  };
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 1a32baa78ce2..d4e56f5a4b20 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -324,6 +324,7 @@ struct bpf_verifier_state {
>         u32 insn_idx;
>         u32 curframe;
>         u32 active_spin_lock;
> +       u32 active_rcu_lock;
>         bool speculative;
>
>         /* first and last insn idx of this verifier state */
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 35c07afac924..293d224a7f53 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -5527,6 +5527,8 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>                         info->reg_type |= MEM_USER;
>                 if (strcmp(tag_value, "percpu") == 0)
>                         info->reg_type |= MEM_PERCPU;
> +               if (strcmp(tag_value, "rcu") == 0)
> +                       info->reg_type |= MEM_RCU;
>         }
>
>         /* skip modifiers */
> @@ -5765,6 +5767,9 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
>                                 /* check __percpu tag */
>                                 if (strcmp(tag_value, "percpu") == 0)
>                                         tmp_flag = MEM_PERCPU;
> +                               /* check __rcu tag */
> +                               if (strcmp(tag_value, "rcu") == 0)
> +                                       tmp_flag = MEM_RCU;
>                         }
>
>                         stype = btf_type_skip_modifiers(btf, mtype->type, &id);
> @@ -6560,6 +6565,12 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>                 return -EINVAL;
>         }
>
> +       if (sleepable && env->cur_state->active_rcu_lock) {
> +               bpf_log(log, "kernel function %s is sleepable within rcu_read_lock region\n",
> +                       func_name);
> +               return -EINVAL;
> +       }
> +
>         if (kfunc_meta && ref_obj_id)
>                 kfunc_meta->ref_obj_id = ref_obj_id;
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 82c07fe0bfb1..3c5afd3bc216 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -188,6 +188,9 @@ struct bpf_verifier_stack_elem {
>                                           POISON_POINTER_DELTA))
>  #define BPF_MAP_PTR(X)         ((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
>
> +/* Using insn->off = BPF_STORAGE_GET_CALL to mark bpf_*_storage_get() helper calls. */
> +#define BPF_STORAGE_GET_CALL   1
> +
>  static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
>  static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
>
> @@ -511,6 +514,22 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
>         return func_id == BPF_FUNC_dynptr_data;
>  }
>
> +static bool is_storage_get_function(enum bpf_func_id func_id)
> +{
> +       return func_id == BPF_FUNC_sk_storage_get ||
> +              func_id == BPF_FUNC_inode_storage_get ||
> +              func_id == BPF_FUNC_task_storage_get ||
> +              func_id == BPF_FUNC_cgrp_storage_get;
> +}
> +
> +static bool is_sleepable_function(enum bpf_func_id func_id)
> +{
> +       return func_id == BPF_FUNC_copy_from_user ||
> +              func_id == BPF_FUNC_copy_from_user_task ||
> +              func_id == BPF_FUNC_ima_inode_hash ||
> +              func_id == BPF_FUNC_ima_file_hash;
> +}

This is a bit concerning that these are in two different places in the kernel.
We expose a helper based on prog->aux->sleepable in different places and
it's worth doing some refactoring to keep all this logic in a single place.

> +
>  static bool helper_multiple_ref_obj_use(enum bpf_func_id func_id,
>                                         const struct bpf_map *map)
>  {
> @@ -583,6 +602,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
>                 strncpy(prefix, "percpu_", 32);
>         if (type & PTR_UNTRUSTED)
>                 strncpy(prefix, "untrusted_", 32);
> +       if (type & MEM_RCU)
> +               strncpy(prefix, "rcu_", 32);
>
>         snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
>                  prefix, str[base_type(type)], postfix);
> @@ -1201,6 +1222,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
>         dst_state->speculative = src->speculative;
>         dst_state->curframe = src->curframe;
>         dst_state->active_spin_lock = src->active_spin_lock;
> +       dst_state->active_rcu_lock = src->active_rcu_lock;
>         dst_state->branches = src->branches;
>         dst_state->parent = src->parent;
>         dst_state->first_insn_idx = src->first_insn_idx;
> @@ -4536,6 +4558,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>                 return -EACCES;
>         }
>
> +       if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
> +           !env->cur_state->active_rcu_lock) {
> +               verbose(env,
> +                       "R%d is ptr_%s access rcu-protected memory with off=%d, not in rcu_read_lock region\n",
> +                       regno, tname, off);
> +               return -EACCES;
> +       }
> +
>         if (env->ops->btf_struct_access) {
>                 ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
>                                                   off, size, atype, &btf_id, &flag);
> @@ -4552,6 +4582,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>         if (ret < 0)
>                 return ret;
>
> +       if ((flag & MEM_RCU) && env->prog->aux->sleepable &&
> +           !env->cur_state->active_rcu_lock) {
> +               verbose(env,
> +                       "R%d is rcu dereference ptr_%s with off=%d, not in rcu_read_lock region\n",
> +                       regno, tname, off);
> +               return -EACCES;
> +       }
> +
>         /* If this is an untrusted pointer, all pointers formed by walking it
>          * also inherit the untrusted flag.
>          */
> @@ -5684,7 +5722,12 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
>  static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
>  static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
>  static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
> -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
> +static const struct bpf_reg_types btf_ptr_types = {
> +       .types = {
> +               PTR_TO_BTF_ID,
> +               PTR_TO_BTF_ID | MEM_RCU,
> +       }
> +};
>  static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
>  static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
>  static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> @@ -5758,6 +5801,20 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>         if (arg_type & PTR_MAYBE_NULL)
>                 type &= ~PTR_MAYBE_NULL;
>
> +       /* If the reg type is marked as MEM_RCU, ensure the usage is in the rcu_read_lock
> +        * region, and remove MEM_RCU from the type since the arg_type won't encode
> +        * MEM_RCU.
> +        */
> +       if (type & MEM_RCU) {
> +               if (env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) {
> +                       verbose(env,
> +                               "R%d is arg type %s needs rcu protection\n",
> +                               regno, reg_type_str(env, reg->type));
> +                       return -EACCES;
> +               }
> +               type &= ~MEM_RCU;
> +       }
> +
>         for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
>                 expected = compatible->types[i];
>                 if (expected == NOT_INIT)
> @@ -5774,7 +5831,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>         return -EACCES;
>
>  found:
> -       if (reg->type == PTR_TO_BTF_ID) {
> +       /* reg is already protected by rcu_read_lock(). Peel off MEM_RCU from reg->type. */
> +       if ((reg->type & ~MEM_RCU) == PTR_TO_BTF_ID) {
>                 /* For bpf_sk_release, it needs to match against first member
>                  * 'struct sock_common', hence make an exception for it. This
>                  * allows bpf_sk_release to work for multiple socket types.
> @@ -5850,6 +5908,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>          * fixed offset.
>          */
>         case PTR_TO_BTF_ID:
> +       case PTR_TO_BTF_ID | MEM_RCU:
>                 /* When referenced PTR_TO_BTF_ID is passed to release function,
>                  * it's fixed offset must be 0. In the other cases, fixed offset
>                  * can be non-zero.
> @@ -7289,6 +7348,26 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>         }
>
>         meta.func_id = func_id;
> +
> +       if (func_id == BPF_FUNC_rcu_read_lock)
> +               env->cur_state->active_rcu_lock++;
> +       if (func_id == BPF_FUNC_rcu_read_unlock) {
> +               if (env->cur_state->active_rcu_lock == 0) {
> +                       verbose(env, "missing bpf_rcu_read_lock\n");
> +                       return -EINVAL;
> +               }
> +
> +               env->cur_state->active_rcu_lock--;
> +       }
> +       if (env->cur_state->active_rcu_lock) {
> +               if (is_sleepable_function(func_id))
> +                       verbose(env, "sleepable helper %s#%din rcu_read_lock region\n",
> +                               func_id_name(func_id), func_id);
> +
> +               if (env->prog->aux->sleepable && is_storage_get_function(func_id))
> +                       insn->off = BPF_STORAGE_GET_CALL;
> +       }
> +
>         /* check args */
>         for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
>                 err = check_func_arg(env, i, &meta, fn);
> @@ -10470,6 +10549,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
>                 return -EINVAL;
>         }
>
> +       if (env->prog->aux->sleepable && env->cur_state->active_rcu_lock) {
> +               verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_rcu_read_lock-ed region\n");
> +               return -EINVAL;
> +       }
> +
>         if (regs[ctx_reg].type != PTR_TO_CTX) {
>                 verbose(env,
>                         "at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
> @@ -11734,6 +11818,9 @@ static bool states_equal(struct bpf_verifier_env *env,
>         if (old->active_spin_lock != cur->active_spin_lock)
>                 return false;
>
> +       if (old->active_rcu_lock != cur->active_rcu_lock)
> +               return false;
> +
>         /* for states to be equal callsites have to be the same
>          * and all frame states need to be equivalent
>          */
> @@ -12141,6 +12228,11 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
>                                !reg_type_mismatch_ok(prev));
>  }
>
> +static bool reg_type_mismatch_ignore_rcu(enum bpf_reg_type src, enum bpf_reg_type prev)
> +{
> +       return reg_type_mismatch(src & ~MEM_RCU, prev & ~MEM_RCU);
> +}
> +
>  static int do_check(struct bpf_verifier_env *env)
>  {
>         bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
> @@ -12266,6 +12358,18 @@ static int do_check(struct bpf_verifier_env *env)
>
>                         prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type;
>
> +                       /* For NOT_INIT *prev_src_type, ignore rcu type tag.
> +                        * For code like below,
> +                        *   struct foo *f;
> +                        *   if (...)
> +                        *     f = ...; // f with MEM_RCU type tag.
> +                        *   else
> +                        *     f = ...; // f without MEM_RCU type tag.
> +                        *   ... f ...  // Here f could be with/without MEM_RCU
> +                        *
> +                        * It is safe to ignore MEM_RCU type tag here since
> +                        * base types are the same.
> +                        */
>                         if (*prev_src_type == NOT_INIT) {
>                                 /* saw a valid insn
>                                  * dst_reg = *(u32 *)(src_reg + off)
> @@ -12273,7 +12377,7 @@ static int do_check(struct bpf_verifier_env *env)
>                                  */
>                                 *prev_src_type = src_reg_type;
>
> -                       } else if (reg_type_mismatch(src_reg_type, *prev_src_type)) {
> +                       } else if (reg_type_mismatch_ignore_rcu(src_reg_type, *prev_src_type)) {
>                                 /* ABuser program is trying to use the same insn
>                                  * dst_reg = *(u32*) (src_reg + off)
>                                  * with different pointer types:
> @@ -12412,6 +12516,11 @@ static int do_check(struct bpf_verifier_env *env)
>                                         return -EINVAL;
>                                 }
>
> +                               if (env->cur_state->active_rcu_lock) {
> +                                       verbose(env, "bpf_rcu_read_unlock is missing\n");
> +                                       return -EINVAL;
> +                               }
> +
>                                 /* We must do check_reference_leak here before
>                                  * prepare_func_exit to handle the case when
>                                  * state->curframe > 0, it may be a callback
> @@ -13499,6 +13608,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>                         convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
>                         break;
>                 case PTR_TO_BTF_ID:
> +               case PTR_TO_BTF_ID | MEM_RCU:
>                 case PTR_TO_BTF_ID | PTR_UNTRUSTED:
>                         if (type == BPF_READ) {
>                                 insn->code = BPF_LDX | BPF_PROBE_MEM |
> @@ -14148,11 +14258,11 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>                         goto patch_call_imm;
>                 }
>
> -               if (insn->imm == BPF_FUNC_task_storage_get ||
> -                   insn->imm == BPF_FUNC_sk_storage_get ||
> -                   insn->imm == BPF_FUNC_inode_storage_get ||
> -                   insn->imm == BPF_FUNC_cgrp_storage_get) {
> -                       if (env->prog->aux->sleepable)
> +               if (is_storage_get_function(insn->imm)) {
> +                       if (env->prog->aux->sleepable && insn->off) {

I would recommend explicitly checking if insn->off == BPF_STORAGE_GET_CALL.

Also, your comment says you are marking BPF_STORAGE_GET_CALL but this is only
set when the call is in a classical RCU critical section and in a
sleepable program. The
The name and the comment above should reflect that.

BPF_STORAGE_GET_CALL_ATOMIC or something.


> +                               insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
> +                               insn->off = 0;
> +                       } else if (env->prog->aux->sleepable)
>                                 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
>                         else
>                                 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 4/5] bpf: Enable sleeptable support for cgrp local storage
  2022-11-03  7:21 ` [PATCH bpf-next 4/5] bpf: Enable sleeptable support for cgrp local storage Yonghong Song
@ 2022-11-03 14:45   ` KP Singh
  0 siblings, 0 replies; 26+ messages in thread
From: KP Singh @ 2022-11-03 14:45 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>
> With proper bpf_rcu_read_lock() support, sleepable support for cgrp local
> storage can be enabled as typical use case task->cgroups->dfl_cgrp
> can be protected with bpf_rcu_read_lock().
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  kernel/bpf/verifier.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>

nit: subject line: "Sleeptable -> Sleepable"

Acked-by: KP Singh <kpsingh@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: Add tests for bpf_rcu_read_lock()
  2022-11-03  7:21 ` [PATCH bpf-next 5/5] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
@ 2022-11-03 14:54   ` KP Singh
  2022-11-03 21:04     ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: KP Singh @ 2022-11-03 14:54 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>
> Add a few positive/negative tests to test bpf_rcu_read_lock()
> and its corresponding verifier support.
>
>   ./test_progs -t rcu_read_lock
>   ...
>   #145/1   rcu_read_lock/local_storage:OK
>   #145/2   rcu_read_lock/runtime_diff_rcu_tag:OK
>   #145/3   rcu_read_lock/negative_tests:OK
>   #145     rcu_read_lock:OK
>   Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  .../selftests/bpf/prog_tests/rcu_read_lock.c  | 101 ++++++++
>  .../selftests/bpf/progs/rcu_read_lock.c       | 241 ++++++++++++++++++
>  2 files changed, 342 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
>  create mode 100644 tools/testing/selftests/bpf/progs/rcu_read_lock.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
> new file mode 100644
> index 000000000000..46c02bdb1360
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates.*/
> +
> +#define _GNU_SOURCE
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#include <sys/types.h>

[...]

> +
> +       task = bpf_get_current_task_btf();
> +
> +       bpf_rcu_read_lock();
> +       real_parent = task->real_parent;
> +       bpf_rcu_read_unlock();

The tests are nice, It would be nice to add a comment on what actually is wrong.

e.g.

/* real_parent is accessed outside the RCU critical section */


> +       (void)bpf_task_storage_get(&map_b, real_parent, 0,
> +                                  BPF_LOCAL_STORAGE_GET_F_CREATE);
> +       return 0;
> +}
> +

[...]

> +       bpf_rcu_read_lock();
> +       bkey = bpf_lookup_user_key(key_serial, flags);
> +       bpf_rcu_read_unlock();
> +        if (!bkey)
> +                return -1;
> +        bpf_key_put(bkey);
> +
> +        return 0;

nit: Spaces here instead of tabs.

> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-03 14:28   ` KP Singh
@ 2022-11-03 21:00     ` Yonghong Song
  2022-11-03 22:29       ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2022-11-03 21:00 UTC (permalink / raw)
  To: KP Singh, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 11/3/22 7:28 AM, KP Singh wrote:
> On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>>
>> Add bpf_rcu_read_lock() and bpf_rcu_read_unlock() helpers.
>> Both helpers are available to all program types with
>> CAP_BPF capability.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/linux/bpf.h            |  2 ++
>>   include/uapi/linux/bpf.h       | 14 ++++++++++++++
>>   kernel/bpf/core.c              |  2 ++
>>   kernel/bpf/helpers.c           | 26 ++++++++++++++++++++++++++
>>   tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
>>   5 files changed, 58 insertions(+)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 8d948bfcb984..a9bda4c91fc7 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -2554,6 +2554,8 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
>>   extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>>   extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>>   extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
>> +extern const struct bpf_func_proto bpf_rcu_read_lock_proto;
>> +extern const struct bpf_func_proto bpf_rcu_read_unlock_proto;
>>
>>   const struct bpf_func_proto *tracing_prog_func_proto(
>>     enum bpf_func_id func_id, const struct bpf_prog *prog);
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 94659f6b3395..e86389cd6133 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -5481,6 +5481,18 @@ union bpf_attr {
>>    *             0 on success.
>>    *
>>    *             **-ENOENT** if the bpf_local_storage cannot be found.
>> + *
>> + * void bpf_rcu_read_lock(void)
>> + *     Description
>> + *             Call kernel rcu_read_lock().
> 
> Simple wrapper around rcu_read_lock() and maybe explain where and how
> it is supposed to
> be used.
> 
> e.g. the verifier will check if __rcu pointers are being accessed with
> bpf_rcu_read_lock in
> sleepable programs.

Okay, I can add more descriptions.

> 
> Calling the helper from a non-sleepable program is inconsequential,
> but maybe we can even
> avoid exposing it to non-sleepable programs?

I actually debated myself whether to make bpf_rcu_read_lock()/unlock()
to be sleepable only. Although it won't hurt for non-sleepable program,
I guess I can make it as sleepable only so users don't make mistake
to use them in non-sleepable programs.

> 
>> + *     Return
>> + *             None.
>> + *
>> + * void bpf_rcu_read_unlock(void)
>> + *     Description
>> + *             Call kernel rcu_read_unlock().
>> + *     Return
>> + *             None.
>>    */
>>   #define ___BPF_FUNC_MAPPER(FN, ctx...)                 \
>>          FN(unspec, 0, ##ctx)                            \
>> @@ -5695,6 +5707,8 @@ union bpf_attr {
>>          FN(user_ringbuf_drain, 209, ##ctx)              \
>>          FN(cgrp_storage_get, 210, ##ctx)                \
>>          FN(cgrp_storage_delete, 211, ##ctx)             \
>> +       FN(rcu_read_lock, 212, ##ctx)                   \
>> +       FN(rcu_read_unlock, 213, ##ctx)                 \
>>          /* */
>>
[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-03 14:43   ` KP Singh
@ 2022-11-03 21:03     ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-03 21:03 UTC (permalink / raw)
  To: KP Singh, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 11/3/22 7:43 AM, KP Singh wrote:
> On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>>
>> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
>> object access needing rcu_read_lock protection. The rcu protection
>> is not needed for non-sleepable program. So various verification
>> checking is only done for sleepable programs. In particular, only
>> the following insns can be inside bpf_rcu_read_lock() region:
>>    - any non call insns except BPF_ABS/BPF_IND
>>    - non sleepable helpers and kfuncs.
>> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
>> allocation flag) should be GFP_ATOMIC.
>>
>> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
>> this pointer and the load which gets this pointer needs to be
>> protected by bpf_rcu_read_lock(). The following shows a couple
>> of examples:
>>    struct task_struct {
>>          ...
>>          struct task_struct __rcu        *real_parent;
>>          struct css_set __rcu            *cgroups;
>>          ...
>>    };
>>    struct css_set {
>>          ...
>>          struct cgroup *dfl_cgrp;
>>          ...
>>    }
>>    ...
>>    task = bpf_get_current_task_btf();
>>    cgroups = task->cgroups;
>>    dfl_cgroup = cgroups->dfl_cgrp;
>>    ... using dfl_cgroup ...
>>
>> The bpf_rcu_read_lock/unlock() should be added like below to
>> avoid verification failures.
>>    task = bpf_get_current_task_btf();
>>    bpf_rcu_read_lock();
>>    cgroups = task->cgroups;
>>    dfl_cgroup = cgroups->dfl_cgrp;
>>    bpf_rcu_read_unlock();
>>    ... using dfl_cgroup ...
>>
>> The following is another example for task->real_parent.
>>    task = bpf_get_current_task_btf();
>>    bpf_rcu_read_lock();
>>    real_parent = task->real_parent;
>>    ... bpf_task_storage_get(&map, real_parent, 0, 0);
>>    bpf_rcu_read_unlock();
>>
>> There is another case observed in selftest bpf_iter_ipv6_route.c:
>>    struct fib6_info *rt = ctx->rt;
>>    ...
>>    fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
>>    ...
>>    if (rt->nh)
>>      fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
>>    ...
>>    ... using fib6_nh ...
>> Currently verification will fail with
>>    same insn cannot be used with different pointers
>> since the use of fib6_nh is tag with rcu in one path
>> but not in the other path. The above use case is a valid
>> one so the verifier is changed to ignore MEM_RCU type tag
>> in such cases.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/linux/bpf.h          |   3 +
>>   include/linux/bpf_verifier.h |   1 +
>>   kernel/bpf/btf.c             |  11 +++
>>   kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
>>   4 files changed, 133 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index a9bda4c91fc7..f0d973c8d227 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -458,6 +458,9 @@ enum bpf_type_flag {
>>          /* Size is known at compile time. */
>>          MEM_FIXED_SIZE          = BIT(10 + BPF_BASE_TYPE_BITS),
>>
>> +       /* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */
>> +       MEM_RCU                 = BIT(11 + BPF_BASE_TYPE_BITS),
>> +
>>          __BPF_TYPE_FLAG_MAX,
>>          __BPF_TYPE_LAST_FLAG    = __BPF_TYPE_FLAG_MAX - 1,
>>   };
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 1a32baa78ce2..d4e56f5a4b20 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -324,6 +324,7 @@ struct bpf_verifier_state {
>>          u32 insn_idx;
>>          u32 curframe;
>>          u32 active_spin_lock;
>> +       u32 active_rcu_lock;
>>          bool speculative;
>>
>>          /* first and last insn idx of this verifier state */
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 35c07afac924..293d224a7f53 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -5527,6 +5527,8 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>>                          info->reg_type |= MEM_USER;
>>                  if (strcmp(tag_value, "percpu") == 0)
>>                          info->reg_type |= MEM_PERCPU;
>> +               if (strcmp(tag_value, "rcu") == 0)
>> +                       info->reg_type |= MEM_RCU;
>>          }
>>
>>          /* skip modifiers */
>> @@ -5765,6 +5767,9 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
>>                                  /* check __percpu tag */
>>                                  if (strcmp(tag_value, "percpu") == 0)
>>                                          tmp_flag = MEM_PERCPU;
>> +                               /* check __rcu tag */
>> +                               if (strcmp(tag_value, "rcu") == 0)
>> +                                       tmp_flag = MEM_RCU;
>>                          }
>>
>>                          stype = btf_type_skip_modifiers(btf, mtype->type, &id);
>> @@ -6560,6 +6565,12 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>>                  return -EINVAL;
>>          }
>>
>> +       if (sleepable && env->cur_state->active_rcu_lock) {
>> +               bpf_log(log, "kernel function %s is sleepable within rcu_read_lock region\n",
>> +                       func_name);
>> +               return -EINVAL;
>> +       }
>> +
>>          if (kfunc_meta && ref_obj_id)
>>                  kfunc_meta->ref_obj_id = ref_obj_id;
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 82c07fe0bfb1..3c5afd3bc216 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -188,6 +188,9 @@ struct bpf_verifier_stack_elem {
>>                                            POISON_POINTER_DELTA))
>>   #define BPF_MAP_PTR(X)         ((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
>>
>> +/* Using insn->off = BPF_STORAGE_GET_CALL to mark bpf_*_storage_get() helper calls. */
>> +#define BPF_STORAGE_GET_CALL   1
>> +
>>   static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
>>   static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
>>
>> @@ -511,6 +514,22 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
>>          return func_id == BPF_FUNC_dynptr_data;
>>   }
>>
>> +static bool is_storage_get_function(enum bpf_func_id func_id)
>> +{
>> +       return func_id == BPF_FUNC_sk_storage_get ||
>> +              func_id == BPF_FUNC_inode_storage_get ||
>> +              func_id == BPF_FUNC_task_storage_get ||
>> +              func_id == BPF_FUNC_cgrp_storage_get;
>> +}
>> +
>> +static bool is_sleepable_function(enum bpf_func_id func_id)
>> +{
>> +       return func_id == BPF_FUNC_copy_from_user ||
>> +              func_id == BPF_FUNC_copy_from_user_task ||
>> +              func_id == BPF_FUNC_ima_inode_hash ||
>> +              func_id == BPF_FUNC_ima_file_hash;
>> +}
> 
> This is a bit concerning that these are in two different places in the kernel.
> We expose a helper based on prog->aux->sleepable in different places and
> it's worth doing some refactoring to keep all this logic in a single place.

Okay, let me do a restructing to have a central place to check
helpers for sleepable program only.

> 
>> +
>>   static bool helper_multiple_ref_obj_use(enum bpf_func_id func_id,
>>                                          const struct bpf_map *map)
>>   {
>> @@ -583,6 +602,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
>>                  strncpy(prefix, "percpu_", 32);
>>          if (type & PTR_UNTRUSTED)
>>                  strncpy(prefix, "untrusted_", 32);
>> +       if (type & MEM_RCU)
>> +               strncpy(prefix, "rcu_", 32);
[...]
>>
>> -               if (insn->imm == BPF_FUNC_task_storage_get ||
>> -                   insn->imm == BPF_FUNC_sk_storage_get ||
>> -                   insn->imm == BPF_FUNC_inode_storage_get ||
>> -                   insn->imm == BPF_FUNC_cgrp_storage_get) {
>> -                       if (env->prog->aux->sleepable)
>> +               if (is_storage_get_function(insn->imm)) {
>> +                       if (env->prog->aux->sleepable && insn->off) {
> 
> I would recommend explicitly checking if insn->off == BPF_STORAGE_GET_CALL.
> 
> Also, your comment says you are marking BPF_STORAGE_GET_CALL but this is only
> set when the call is in a classical RCU critical section and in a
> sleepable program. The
> The name and the comment above should reflect that.
> 
> BPF_STORAGE_GET_CALL_ATOMIC or something.

BPF_STORAGE_GET_CALL_ATOMIC sounds okay. Will use it unless I can come
up with a better name.

> 
> 
>> +                               insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
>> +                               insn->off = 0;
>> +                       } else if (env->prog->aux->sleepable)
>>                                  insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
>>                          else
>>                                  insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 5/5] selftests/bpf: Add tests for bpf_rcu_read_lock()
  2022-11-03 14:54   ` KP Singh
@ 2022-11-03 21:04     ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-03 21:04 UTC (permalink / raw)
  To: KP Singh, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 11/3/22 7:54 AM, KP Singh wrote:
> On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>>
>> Add a few positive/negative tests to test bpf_rcu_read_lock()
>> and its corresponding verifier support.
>>
>>    ./test_progs -t rcu_read_lock
>>    ...
>>    #145/1   rcu_read_lock/local_storage:OK
>>    #145/2   rcu_read_lock/runtime_diff_rcu_tag:OK
>>    #145/3   rcu_read_lock/negative_tests:OK
>>    #145     rcu_read_lock:OK
>>    Summary: 1/3 PASSED, 0 SKIPPED, 0 FAILED
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   .../selftests/bpf/prog_tests/rcu_read_lock.c  | 101 ++++++++
>>   .../selftests/bpf/progs/rcu_read_lock.c       | 241 ++++++++++++++++++
>>   2 files changed, 342 insertions(+)
>>   create mode 100644 tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
>>   create mode 100644 tools/testing/selftests/bpf/progs/rcu_read_lock.c
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
>> new file mode 100644
>> index 000000000000..46c02bdb1360
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
>> @@ -0,0 +1,101 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates.*/
>> +
>> +#define _GNU_SOURCE
>> +#include <unistd.h>
>> +#include <sys/syscall.h>
>> +#include <sys/types.h>
> 
> [...]
> 
>> +
>> +       task = bpf_get_current_task_btf();
>> +
>> +       bpf_rcu_read_lock();
>> +       real_parent = task->real_parent;
>> +       bpf_rcu_read_unlock();
> 
> The tests are nice, It would be nice to add a comment on what actually is wrong.
> 
> e.g.
> 
> /* real_parent is accessed outside the RCU critical section */

Sure. Will add some comments.

> 
> 
>> +       (void)bpf_task_storage_get(&map_b, real_parent, 0,
>> +                                  BPF_LOCAL_STORAGE_GET_F_CREATE);
>> +       return 0;
>> +}
>> +
> 
> [...]
> 
>> +       bpf_rcu_read_lock();
>> +       bkey = bpf_lookup_user_key(key_serial, flags);
>> +       bpf_rcu_read_unlock();
>> +        if (!bkey)
>> +                return -1;
>> +        bpf_key_put(bkey);
>> +
>> +        return 0;
> 
> nit: Spaces here instead of tabs.
> 
>> +}
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-03  7:21 ` [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support Yonghong Song
  2022-11-03 14:43   ` KP Singh
@ 2022-11-03 22:17   ` Kumar Kartikeya Dwivedi
  2022-11-04  0:28     ` Yonghong Song
  2022-11-04 12:13     ` KP Singh
  1 sibling, 2 replies; 26+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-03 22:17 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote:
> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
> object access needing rcu_read_lock protection. The rcu protection
> is not needed for non-sleepable program. So various verification
> checking is only done for sleepable programs. In particular, only
> the following insns can be inside bpf_rcu_read_lock() region:
>   - any non call insns except BPF_ABS/BPF_IND
>   - non sleepable helpers and kfuncs.
> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
> allocation flag) should be GFP_ATOMIC.
>
> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
> this pointer and the load which gets this pointer needs to be
> protected by bpf_rcu_read_lock(). The following shows a couple
> of examples:
>   struct task_struct {
> 	...
> 	struct task_struct __rcu        *real_parent;
> 	struct css_set __rcu            *cgroups;
> 	...
>   };
>   struct css_set {
> 	...
> 	struct cgroup *dfl_cgrp;
> 	...
>   }
>   ...
>   task = bpf_get_current_task_btf();
>   cgroups = task->cgroups;
>   dfl_cgroup = cgroups->dfl_cgrp;
>   ... using dfl_cgroup ...
>
> The bpf_rcu_read_lock/unlock() should be added like below to
> avoid verification failures.
>   task = bpf_get_current_task_btf();
>   bpf_rcu_read_lock();
>   cgroups = task->cgroups;
>   dfl_cgroup = cgroups->dfl_cgrp;
>   bpf_rcu_read_unlock();
>   ... using dfl_cgroup ...
>
> The following is another example for task->real_parent.
>   task = bpf_get_current_task_btf();
>   bpf_rcu_read_lock();
>   real_parent = task->real_parent;
>   ... bpf_task_storage_get(&map, real_parent, 0, 0);
>   bpf_rcu_read_unlock();
>
> There is another case observed in selftest bpf_iter_ipv6_route.c:
>   struct fib6_info *rt = ctx->rt;
>   ...
>   fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
>   ...
>   if (rt->nh)
>     fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
>   ...
>   ... using fib6_nh ...
> Currently verification will fail with
>   same insn cannot be used with different pointers
> since the use of fib6_nh is tag with rcu in one path
> but not in the other path. The above use case is a valid
> one so the verifier is changed to ignore MEM_RCU type tag
> in such cases.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h          |   3 +
>  include/linux/bpf_verifier.h |   1 +
>  kernel/bpf/btf.c             |  11 +++
>  kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
>  4 files changed, 133 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a9bda4c91fc7..f0d973c8d227 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -458,6 +458,9 @@ enum bpf_type_flag {
>  	/* Size is known at compile time. */
>  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
>
> +	/* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */
> +	MEM_RCU			= BIT(11 + BPF_BASE_TYPE_BITS),
> +

IMO, PTR_RCU would be better name for this, since it applied to a specific
pointer through which the access is done.

>  	__BPF_TYPE_FLAG_MAX,
>  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
>  };
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 1a32baa78ce2..d4e56f5a4b20 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -324,6 +324,7 @@ struct bpf_verifier_state {
>  	u32 insn_idx;
>  	u32 curframe;
>  	u32 active_spin_lock;
> +	u32 active_rcu_lock;
>  	bool speculative;
>
>  	/* first and last insn idx of this verifier state */
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 35c07afac924..293d224a7f53 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -5527,6 +5527,8 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>  			info->reg_type |= MEM_USER;
>  		if (strcmp(tag_value, "percpu") == 0)
>  			info->reg_type |= MEM_PERCPU;
> +		if (strcmp(tag_value, "rcu") == 0)
> +			info->reg_type |= MEM_RCU;
>  	}
>
>  	/* skip modifiers */
> @@ -5765,6 +5767,9 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
>  				/* check __percpu tag */
>  				if (strcmp(tag_value, "percpu") == 0)
>  					tmp_flag = MEM_PERCPU;
> +				/* check __rcu tag */
> +				if (strcmp(tag_value, "rcu") == 0)
> +					tmp_flag = MEM_RCU;
>  			}
>
>  			stype = btf_type_skip_modifiers(btf, mtype->type, &id);
> @@ -6560,6 +6565,12 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
>  		return -EINVAL;
>  	}
>
> +	if (sleepable && env->cur_state->active_rcu_lock) {
> +		bpf_log(log, "kernel function %s is sleepable within rcu_read_lock region\n",
> +			func_name);
> +		return -EINVAL;
> +	}
> +
>  	if (kfunc_meta && ref_obj_id)
>  		kfunc_meta->ref_obj_id = ref_obj_id;
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 82c07fe0bfb1..3c5afd3bc216 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -188,6 +188,9 @@ struct bpf_verifier_stack_elem {
>  					  POISON_POINTER_DELTA))
>  #define BPF_MAP_PTR(X)		((struct bpf_map *)((X) & ~BPF_MAP_PTR_UNPRIV))
>
> +/* Using insn->off = BPF_STORAGE_GET_CALL to mark bpf_*_storage_get() helper calls. */
> +#define BPF_STORAGE_GET_CALL	1
> +
>  static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx);
>  static int release_reference(struct bpf_verifier_env *env, int ref_obj_id);
>
> @@ -511,6 +514,22 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
>  	return func_id == BPF_FUNC_dynptr_data;
>  }
>
> +static bool is_storage_get_function(enum bpf_func_id func_id)
> +{
> +	return func_id == BPF_FUNC_sk_storage_get ||
> +	       func_id == BPF_FUNC_inode_storage_get ||
> +	       func_id == BPF_FUNC_task_storage_get ||
> +	       func_id == BPF_FUNC_cgrp_storage_get;
> +}
> +
> +static bool is_sleepable_function(enum bpf_func_id func_id)
> +{
> +	return func_id == BPF_FUNC_copy_from_user ||
> +	       func_id == BPF_FUNC_copy_from_user_task ||
> +	       func_id == BPF_FUNC_ima_inode_hash ||
> +	       func_id == BPF_FUNC_ima_file_hash;
> +}
> +
>  static bool helper_multiple_ref_obj_use(enum bpf_func_id func_id,
>  					const struct bpf_map *map)
>  {
> @@ -583,6 +602,8 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
>  		strncpy(prefix, "percpu_", 32);
>  	if (type & PTR_UNTRUSTED)
>  		strncpy(prefix, "untrusted_", 32);
> +	if (type & MEM_RCU)
> +		strncpy(prefix, "rcu_", 32);
>
>  	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
>  		 prefix, str[base_type(type)], postfix);
> @@ -1201,6 +1222,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
>  	dst_state->speculative = src->speculative;
>  	dst_state->curframe = src->curframe;
>  	dst_state->active_spin_lock = src->active_spin_lock;
> +	dst_state->active_rcu_lock = src->active_rcu_lock;
>  	dst_state->branches = src->branches;
>  	dst_state->parent = src->parent;
>  	dst_state->first_insn_idx = src->first_insn_idx;
> @@ -4536,6 +4558,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  		return -EACCES;
>  	}
>
> +	if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
> +	    !env->cur_state->active_rcu_lock) {
> +		verbose(env,
> +			"R%d is ptr_%s access rcu-protected memory with off=%d, not in rcu_read_lock region\n",
> +			regno, tname, off);
> +		return -EACCES;
> +	}
> +
>  	if (env->ops->btf_struct_access) {
>  		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
>  						  off, size, atype, &btf_id, &flag);
> @@ -4552,6 +4582,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>  	if (ret < 0)
>  		return ret;
>
> +	if ((flag & MEM_RCU) && env->prog->aux->sleepable &&
> +	    !env->cur_state->active_rcu_lock) {
> +		verbose(env,
> +			"R%d is rcu dereference ptr_%s with off=%d, not in rcu_read_lock region\n",
> +			regno, tname, off);
> +		return -EACCES;
> +	}
> +

This isn't right. Every load that obtains an RCU pointer needs to become tied to
the current RCU section, and needs to be invalidated once the RCU section ends.

So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access,
you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called.

Otherwise, with the current logic, the following would become possible:

bpf_rcu_read_lock();
p = rcu_dereference(foo->rcup);
bpf_rcu_read_unlock();

// p is possibly dead

bpf_rcu_read_lock();
// use p
bpf_rcu_read_unlock();

I have pretty much the same patchset lying locally in my tree (waiting for the
kfunc rework to get in before I post it), but I can also rebase other stuff
using explicit bpf_rcu_read_lock on top of yours.

>  	/* If this is an untrusted pointer, all pointers formed by walking it
>  	 * also inherit the untrusted flag.
>  	 */
> @@ -5684,7 +5722,12 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
>  static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
>  static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
>  static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
> -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
> +static const struct bpf_reg_types btf_ptr_types = {
> +	.types = {
> +		PTR_TO_BTF_ID,
> +		PTR_TO_BTF_ID | MEM_RCU,
> +	}
> +};
>  static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
>  static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
>  static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> @@ -5758,6 +5801,20 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  	if (arg_type & PTR_MAYBE_NULL)
>  		type &= ~PTR_MAYBE_NULL;
>
> +	/* If the reg type is marked as MEM_RCU, ensure the usage is in the rcu_read_lock
> +	 * region, and remove MEM_RCU from the type since the arg_type won't encode
> +	 * MEM_RCU.
> +	 */
> +	if (type & MEM_RCU) {
> +		if (env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) {
> +			verbose(env,
> +				"R%d is arg type %s needs rcu protection\n",
> +				regno, reg_type_str(env, reg->type));
> +			return -EACCES;
> +		}
> +		type &= ~MEM_RCU;
> +	}
> +
>  	for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
>  		expected = compatible->types[i];
>  		if (expected == NOT_INIT)
> @@ -5774,7 +5831,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  	return -EACCES;
>
>  found:
> -	if (reg->type == PTR_TO_BTF_ID) {
> +	/* reg is already protected by rcu_read_lock(). Peel off MEM_RCU from reg->type. */
> +	if ((reg->type & ~MEM_RCU) == PTR_TO_BTF_ID) {
>  		/* For bpf_sk_release, it needs to match against first member
>  		 * 'struct sock_common', hence make an exception for it. This
>  		 * allows bpf_sk_release to work for multiple socket types.
> @@ -5850,6 +5908,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>  	 * fixed offset.
>  	 */
>  	case PTR_TO_BTF_ID:
> +	case PTR_TO_BTF_ID | MEM_RCU:
>  		/* When referenced PTR_TO_BTF_ID is passed to release function,
>  		 * it's fixed offset must be 0.	In the other cases, fixed offset
>  		 * can be non-zero.
> @@ -7289,6 +7348,26 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>  	}
>
>  	meta.func_id = func_id;
> +
> +	if (func_id == BPF_FUNC_rcu_read_lock)
> +		env->cur_state->active_rcu_lock++;
> +	if (func_id == BPF_FUNC_rcu_read_unlock) {
> +		if (env->cur_state->active_rcu_lock == 0) {
> +			verbose(env, "missing bpf_rcu_read_lock\n");
> +			return -EINVAL;
> +		}
> +
> +		env->cur_state->active_rcu_lock--;
> +	}
> +	if (env->cur_state->active_rcu_lock) {
> +		if (is_sleepable_function(func_id))
> +			verbose(env, "sleepable helper %s#%din rcu_read_lock region\n",
> +				func_id_name(func_id), func_id);
> +
> +		if (env->prog->aux->sleepable && is_storage_get_function(func_id))
> +			insn->off = BPF_STORAGE_GET_CALL;

This is a bit ugly. Why not use bpf_insn_aux_data?

> +	}
> +
>  	/* check args */
>  	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
>  		err = check_func_arg(env, i, &meta, fn);
> @@ -10470,6 +10549,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  		return -EINVAL;
>  	}
>
> +	if (env->prog->aux->sleepable && env->cur_state->active_rcu_lock) {
> +		verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_rcu_read_lock-ed region\n");
> +		return -EINVAL;
> +	}
> +
>  	if (regs[ctx_reg].type != PTR_TO_CTX) {
>  		verbose(env,
>  			"at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
> @@ -11734,6 +11818,9 @@ static bool states_equal(struct bpf_verifier_env *env,
>  	if (old->active_spin_lock != cur->active_spin_lock)
>  		return false;
>
> +	if (old->active_rcu_lock != cur->active_rcu_lock)
> +		return false;
> +
>  	/* for states to be equal callsites have to be the same
>  	 * and all frame states need to be equivalent
>  	 */
> @@ -12141,6 +12228,11 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
>  			       !reg_type_mismatch_ok(prev));
>  }
>
> +static bool reg_type_mismatch_ignore_rcu(enum bpf_reg_type src, enum bpf_reg_type prev)
> +{
> +	return reg_type_mismatch(src & ~MEM_RCU, prev & ~MEM_RCU);
> +}
> +

See the discussion about this in David's set:
https://lore.kernel.org/bpf/CAP01T75FGW7F=Ho+oqoC6WgxK5uUir2=CUgiW_HwqNxmzmthBg@mail.gmail.com

>  static int do_check(struct bpf_verifier_env *env)
>  {
>  	bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
> @@ -12266,6 +12358,18 @@ static int do_check(struct bpf_verifier_env *env)
>
>  			prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type;
>
> +			/* For NOT_INIT *prev_src_type, ignore rcu type tag.
> +			 * For code like below,
> +			 *   struct foo *f;
> +			 *   if (...)
> +			 *     f = ...; // f with MEM_RCU type tag.
> +			 *   else
> +			 *     f = ...; // f without MEM_RCU type tag.
> +			 *   ... f ...  // Here f could be with/without MEM_RCU
> +			 *
> +			 * It is safe to ignore MEM_RCU type tag here since
> +			 * base types are the same.
> +			 */
>  			if (*prev_src_type == NOT_INIT) {
>  				/* saw a valid insn
>  				 * dst_reg = *(u32 *)(src_reg + off)
> @@ -12273,7 +12377,7 @@ static int do_check(struct bpf_verifier_env *env)
>  				 */
>  				*prev_src_type = src_reg_type;
>
> -			} else if (reg_type_mismatch(src_reg_type, *prev_src_type)) {
> +			} else if (reg_type_mismatch_ignore_rcu(src_reg_type, *prev_src_type)) {
>  				/* ABuser program is trying to use the same insn
>  				 * dst_reg = *(u32*) (src_reg + off)
>  				 * with different pointer types:
> @@ -12412,6 +12516,11 @@ static int do_check(struct bpf_verifier_env *env)
>  					return -EINVAL;
>  				}
>
> +				if (env->cur_state->active_rcu_lock) {
> +					verbose(env, "bpf_rcu_read_unlock is missing\n");
> +					return -EINVAL;
> +				}
> +
>  				/* We must do check_reference_leak here before
>  				 * prepare_func_exit to handle the case when
>  				 * state->curframe > 0, it may be a callback
> @@ -13499,6 +13608,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  			convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
>  			break;
>  		case PTR_TO_BTF_ID:
> +		case PTR_TO_BTF_ID | MEM_RCU:

This shouldn't be needed, right? If it is RCU protected, there shouldn't be a
need for handling faults (or it's a bug in the kernel).

>  		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
>  			if (type == BPF_READ) {
>  				insn->code = BPF_LDX | BPF_PROBE_MEM |
> @@ -14148,11 +14258,11 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>  			goto patch_call_imm;
>  		}
>
> -		if (insn->imm == BPF_FUNC_task_storage_get ||
> -		    insn->imm == BPF_FUNC_sk_storage_get ||
> -		    insn->imm == BPF_FUNC_inode_storage_get ||
> -		    insn->imm == BPF_FUNC_cgrp_storage_get) {
> -			if (env->prog->aux->sleepable)
> +		if (is_storage_get_function(insn->imm)) {
> +			if (env->prog->aux->sleepable && insn->off) {
> +				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
> +				insn->off = 0;
> +			} else if (env->prog->aux->sleepable)
>  				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
>  			else
>  				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-03  7:21 ` [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper Yonghong Song
  2022-11-03 14:28   ` KP Singh
@ 2022-11-03 22:18   ` Kumar Kartikeya Dwivedi
  2022-11-04  0:30     ` Yonghong Song
  1 sibling, 1 reply; 26+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-03 22:18 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Thu, Nov 03, 2022 at 12:51:13PM IST, Yonghong Song wrote:
> Add bpf_rcu_read_lock() and bpf_rcu_read_unlock() helpers.
> Both helpers are available to all program types with
> CAP_BPF capability.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h            |  2 ++
>  include/uapi/linux/bpf.h       | 14 ++++++++++++++
>  kernel/bpf/core.c              |  2 ++
>  kernel/bpf/helpers.c           | 26 ++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
>  5 files changed, 58 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 8d948bfcb984..a9bda4c91fc7 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -2554,6 +2554,8 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
>  extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>  extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>  extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
> +extern const struct bpf_func_proto bpf_rcu_read_lock_proto;
> +extern const struct bpf_func_proto bpf_rcu_read_unlock_proto;
>
>  const struct bpf_func_proto *tracing_prog_func_proto(
>    enum bpf_func_id func_id, const struct bpf_prog *prog);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 94659f6b3395..e86389cd6133 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5481,6 +5481,18 @@ union bpf_attr {
>   *		0 on success.
>   *
>   *		**-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * void bpf_rcu_read_lock(void)
> + *	Description
> + *		Call kernel rcu_read_lock().
> + *	Return
> + *		None.
> + *
> + * void bpf_rcu_read_unlock(void)
> + *	Description
> + *		Call kernel rcu_read_unlock().
> + *	Return
> + *		None.
>   */

It would be better to not bake these into UAPI and keep them unstable only IMO.

> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-03 21:00     ` Yonghong Song
@ 2022-11-03 22:29       ` Kumar Kartikeya Dwivedi
  2022-11-04  0:32         ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-03 22:29 UTC (permalink / raw)
  To: Yonghong Song
  Cc: KP Singh, Yonghong Song, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau

On Fri, Nov 04, 2022 at 02:30:55AM IST, Yonghong Song wrote:
>
>
> On 11/3/22 7:28 AM, KP Singh wrote:
> > On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
> > >
> > > Add bpf_rcu_read_lock() and bpf_rcu_read_unlock() helpers.
> > > Both helpers are available to all program types with
> > > CAP_BPF capability.
> > >
> > > Signed-off-by: Yonghong Song <yhs@fb.com>
> > > ---
> > >   include/linux/bpf.h            |  2 ++
> > >   include/uapi/linux/bpf.h       | 14 ++++++++++++++
> > >   kernel/bpf/core.c              |  2 ++
> > >   kernel/bpf/helpers.c           | 26 ++++++++++++++++++++++++++
> > >   tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
> > >   5 files changed, 58 insertions(+)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 8d948bfcb984..a9bda4c91fc7 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -2554,6 +2554,8 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
> > >   extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
> > >   extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
> > >   extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
> > > +extern const struct bpf_func_proto bpf_rcu_read_lock_proto;
> > > +extern const struct bpf_func_proto bpf_rcu_read_unlock_proto;
> > >
> > >   const struct bpf_func_proto *tracing_prog_func_proto(
> > >     enum bpf_func_id func_id, const struct bpf_prog *prog);
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 94659f6b3395..e86389cd6133 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -5481,6 +5481,18 @@ union bpf_attr {
> > >    *             0 on success.
> > >    *
> > >    *             **-ENOENT** if the bpf_local_storage cannot be found.
> > > + *
> > > + * void bpf_rcu_read_lock(void)
> > > + *     Description
> > > + *             Call kernel rcu_read_lock().
> >
> > Simple wrapper around rcu_read_lock() and maybe explain where and how
> > it is supposed to
> > be used.
> >
> > e.g. the verifier will check if __rcu pointers are being accessed with
> > bpf_rcu_read_lock in
> > sleepable programs.
>
> Okay, I can add more descriptions.
>
> >
> > Calling the helper from a non-sleepable program is inconsequential,
> > but maybe we can even
> > avoid exposing it to non-sleepable programs?
>
> I actually debated myself whether to make bpf_rcu_read_lock()/unlock()
> to be sleepable only. Although it won't hurt for non-sleepable program,
> I guess I can make it as sleepable only so users don't make mistake
> to use them in non-sleepable programs.
>

It's better to let it be a noop in non-sleepable programs but still allow
calling it. It allows writing common helper functions in the BPF program that
work in both sleepable and non-sleepable cases by holding the RCU read lock.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-03 22:17   ` Kumar Kartikeya Dwivedi
@ 2022-11-04  0:28     ` Yonghong Song
  2022-11-04 12:13     ` KP Singh
  1 sibling, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-04  0:28 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 11/3/22 3:17 PM, Kumar Kartikeya Dwivedi wrote:
> On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote:
>> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
>> object access needing rcu_read_lock protection. The rcu protection
>> is not needed for non-sleepable program. So various verification
>> checking is only done for sleepable programs. In particular, only
>> the following insns can be inside bpf_rcu_read_lock() region:
>>    - any non call insns except BPF_ABS/BPF_IND
>>    - non sleepable helpers and kfuncs.
>> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
>> allocation flag) should be GFP_ATOMIC.
>>
>> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
>> this pointer and the load which gets this pointer needs to be
>> protected by bpf_rcu_read_lock(). The following shows a couple
>> of examples:
>>    struct task_struct {
>> 	...
>> 	struct task_struct __rcu        *real_parent;
>> 	struct css_set __rcu            *cgroups;
>> 	...
>>    };
>>    struct css_set {
>> 	...
>> 	struct cgroup *dfl_cgrp;
>> 	...
>>    }
>>    ...
>>    task = bpf_get_current_task_btf();
>>    cgroups = task->cgroups;
>>    dfl_cgroup = cgroups->dfl_cgrp;
>>    ... using dfl_cgroup ...
>>
>> The bpf_rcu_read_lock/unlock() should be added like below to
>> avoid verification failures.
>>    task = bpf_get_current_task_btf();
>>    bpf_rcu_read_lock();
>>    cgroups = task->cgroups;
>>    dfl_cgroup = cgroups->dfl_cgrp;
>>    bpf_rcu_read_unlock();
>>    ... using dfl_cgroup ...
>>
>> The following is another example for task->real_parent.
>>    task = bpf_get_current_task_btf();
>>    bpf_rcu_read_lock();
>>    real_parent = task->real_parent;
>>    ... bpf_task_storage_get(&map, real_parent, 0, 0);
>>    bpf_rcu_read_unlock();
>>
>> There is another case observed in selftest bpf_iter_ipv6_route.c:
>>    struct fib6_info *rt = ctx->rt;
>>    ...
>>    fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
>>    ...
>>    if (rt->nh)
>>      fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
>>    ...
>>    ... using fib6_nh ...
>> Currently verification will fail with
>>    same insn cannot be used with different pointers
>> since the use of fib6_nh is tag with rcu in one path
>> but not in the other path. The above use case is a valid
>> one so the verifier is changed to ignore MEM_RCU type tag
>> in such cases.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/linux/bpf.h          |   3 +
>>   include/linux/bpf_verifier.h |   1 +
>>   kernel/bpf/btf.c             |  11 +++
>>   kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
>>   4 files changed, 133 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index a9bda4c91fc7..f0d973c8d227 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -458,6 +458,9 @@ enum bpf_type_flag {
>>   	/* Size is known at compile time. */
>>   	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
>>
>> +	/* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */
>> +	MEM_RCU			= BIT(11 + BPF_BASE_TYPE_BITS),
>> +
> 
> IMO, PTR_RCU would be better name for this, since it applied to a specific
> pointer through which the access is done.

I choose MEM_RCU since we have MEM_USER and MEM_PERCPU for __user and 
__percpu tagged memory/pointer before. So choosing MEM_RCU is for
consistency reason. If necessary, I guess if necessary we could
change all of MEM_USER/MEM_PERCPU/MEM_RCU to PTR_*. I don't have
strong preference.

> 
>>   	__BPF_TYPE_FLAG_MAX,
>>   	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
>>   };
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 1a32baa78ce2..d4e56f5a4b20 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -324,6 +324,7 @@ struct bpf_verifier_state {
>>   	u32 insn_idx;
>>   	u32 curframe;
>>   	u32 active_spin_lock;
>> +	u32 active_rcu_lock;
>>   	bool speculative;
>>
>>   	/* first and last insn idx of this verifier state */
[...]
>> @@ -4536,6 +4558,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>>   		return -EACCES;
>>   	}
>>
>> +	if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
>> +	    !env->cur_state->active_rcu_lock) {
>> +		verbose(env,
>> +			"R%d is ptr_%s access rcu-protected memory with off=%d, not in rcu_read_lock region\n",
>> +			regno, tname, off);
>> +		return -EACCES;
>> +	}
>> +
>>   	if (env->ops->btf_struct_access) {
>>   		ret = env->ops->btf_struct_access(&env->log, reg->btf, t,
>>   						  off, size, atype, &btf_id, &flag);
>> @@ -4552,6 +4582,14 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>>   	if (ret < 0)
>>   		return ret;
>>
>> +	if ((flag & MEM_RCU) && env->prog->aux->sleepable &&
>> +	    !env->cur_state->active_rcu_lock) {
>> +		verbose(env,
>> +			"R%d is rcu dereference ptr_%s with off=%d, not in rcu_read_lock region\n",
>> +			regno, tname, off);
>> +		return -EACCES;
>> +	}
>> +
> 
> This isn't right. Every load that obtains an RCU pointer needs to become tied to
> the current RCU section, and needs to be invalidated once the RCU section ends.
> 
> So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access,
> you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called.
> 
> Otherwise, with the current logic, the following would become possible:
> 
> bpf_rcu_read_lock();
> p = rcu_dereference(foo->rcup);
> bpf_rcu_read_unlock();
> 
> // p is possibly dead
> 
> bpf_rcu_read_lock();
> // use p
> bpf_rcu_read_unlock();

Thanks for catching this. Will fix it in the next revision.

> 
> I have pretty much the same patchset lying locally in my tree (waiting for the
> kfunc rework to get in before I post it), but I can also rebase other stuff
> using explicit bpf_rcu_read_lock on top of yours.
> 
>>   	/* If this is an untrusted pointer, all pointers formed by walking it
>>   	 * also inherit the untrusted flag.
>>   	 */
>> @@ -5684,7 +5722,12 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
>>   static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
>>   static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
>>   static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
>> -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
>> +static const struct bpf_reg_types btf_ptr_types = {
>> +	.types = {
>> +		PTR_TO_BTF_ID,
>> +		PTR_TO_BTF_ID | MEM_RCU,
>> +	}
>> +};
>>   static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
>>   static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
>>   static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
>> @@ -5758,6 +5801,20 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>>   	if (arg_type & PTR_MAYBE_NULL)
>>   		type &= ~PTR_MAYBE_NULL;
>>
>> +	/* If the reg type is marked as MEM_RCU, ensure the usage is in the rcu_read_lock
>> +	 * region, and remove MEM_RCU from the type since the arg_type won't encode
>> +	 * MEM_RCU.
>> +	 */
>> +	if (type & MEM_RCU) {
>> +		if (env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) {
>> +			verbose(env,
>> +				"R%d is arg type %s needs rcu protection\n",
>> +				regno, reg_type_str(env, reg->type));
>> +			return -EACCES;
>> +		}
>> +		type &= ~MEM_RCU;
>> +	}
>> +
>>   	for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
>>   		expected = compatible->types[i];
>>   		if (expected == NOT_INIT)
>> @@ -5774,7 +5831,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>>   	return -EACCES;
>>
>>   found:
>> -	if (reg->type == PTR_TO_BTF_ID) {
>> +	/* reg is already protected by rcu_read_lock(). Peel off MEM_RCU from reg->type. */
>> +	if ((reg->type & ~MEM_RCU) == PTR_TO_BTF_ID) {
>>   		/* For bpf_sk_release, it needs to match against first member
>>   		 * 'struct sock_common', hence make an exception for it. This
>>   		 * allows bpf_sk_release to work for multiple socket types.
>> @@ -5850,6 +5908,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>   	 * fixed offset.
>>   	 */
>>   	case PTR_TO_BTF_ID:
>> +	case PTR_TO_BTF_ID | MEM_RCU:
>>   		/* When referenced PTR_TO_BTF_ID is passed to release function,
>>   		 * it's fixed offset must be 0.	In the other cases, fixed offset
>>   		 * can be non-zero.
>> @@ -7289,6 +7348,26 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
>>   	}
>>
>>   	meta.func_id = func_id;
>> +
>> +	if (func_id == BPF_FUNC_rcu_read_lock)
>> +		env->cur_state->active_rcu_lock++;
>> +	if (func_id == BPF_FUNC_rcu_read_unlock) {
>> +		if (env->cur_state->active_rcu_lock == 0) {
>> +			verbose(env, "missing bpf_rcu_read_lock\n");
>> +			return -EINVAL;
>> +		}
>> +
>> +		env->cur_state->active_rcu_lock--;
>> +	}
>> +	if (env->cur_state->active_rcu_lock) {
>> +		if (is_sleepable_function(func_id))
>> +			verbose(env, "sleepable helper %s#%din rcu_read_lock region\n",
>> +				func_id_name(func_id), func_id);
>> +
>> +		if (env->prog->aux->sleepable && is_storage_get_function(func_id))
>> +			insn->off = BPF_STORAGE_GET_CALL;
> 
> This is a bit ugly. Why not use bpf_insn_aux_data?

Sure I can use bpf_insn_aux_data.

> 
>> +	}
>> +
>>   	/* check args */
>>   	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
>>   		err = check_func_arg(env, i, &meta, fn);
>> @@ -10470,6 +10549,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
>>   		return -EINVAL;
>>   	}
>>
>> +	if (env->prog->aux->sleepable && env->cur_state->active_rcu_lock) {
>> +		verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_rcu_read_lock-ed region\n");
>> +		return -EINVAL;
>> +	}
>> +
>>   	if (regs[ctx_reg].type != PTR_TO_CTX) {
>>   		verbose(env,
>>   			"at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
>> @@ -11734,6 +11818,9 @@ static bool states_equal(struct bpf_verifier_env *env,
>>   	if (old->active_spin_lock != cur->active_spin_lock)
>>   		return false;
>>
>> +	if (old->active_rcu_lock != cur->active_rcu_lock)
>> +		return false;
>> +
>>   	/* for states to be equal callsites have to be the same
>>   	 * and all frame states need to be equivalent
>>   	 */
>> @@ -12141,6 +12228,11 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
>>   			       !reg_type_mismatch_ok(prev));
>>   }
>>
>> +static bool reg_type_mismatch_ignore_rcu(enum bpf_reg_type src, enum bpf_reg_type prev)
>> +{
>> +	return reg_type_mismatch(src & ~MEM_RCU, prev & ~MEM_RCU);
>> +}
>> +
> 
> See the discussion about this in David's set:
> https://lore.kernel.org/bpf/CAP01T75FGW7F=Ho+oqoC6WgxK5uUir2=CUgiW_HwqNxmzmthBg@mail.gmail.com

Thanks for the pointer. Yes, the case is similar to what the code ties
to address in the above.

> 
>>   static int do_check(struct bpf_verifier_env *env)
>>   {
>>   	bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
>> @@ -12266,6 +12358,18 @@ static int do_check(struct bpf_verifier_env *env)
>>
>>   			prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type;
>>
>> +			/* For NOT_INIT *prev_src_type, ignore rcu type tag.
>> +			 * For code like below,
>> +			 *   struct foo *f;
>> +			 *   if (...)
>> +			 *     f = ...; // f with MEM_RCU type tag.
>> +			 *   else
>> +			 *     f = ...; // f without MEM_RCU type tag.
>> +			 *   ... f ...  // Here f could be with/without MEM_RCU
>> +			 *
>> +			 * It is safe to ignore MEM_RCU type tag here since
>> +			 * base types are the same.
>> +			 */
>>   			if (*prev_src_type == NOT_INIT) {
>>   				/* saw a valid insn
>>   				 * dst_reg = *(u32 *)(src_reg + off)
>> @@ -12273,7 +12377,7 @@ static int do_check(struct bpf_verifier_env *env)
>>   				 */
>>   				*prev_src_type = src_reg_type;
>>
>> -			} else if (reg_type_mismatch(src_reg_type, *prev_src_type)) {
>> +			} else if (reg_type_mismatch_ignore_rcu(src_reg_type, *prev_src_type)) {
>>   				/* ABuser program is trying to use the same insn
>>   				 * dst_reg = *(u32*) (src_reg + off)
>>   				 * with different pointer types:
>> @@ -12412,6 +12516,11 @@ static int do_check(struct bpf_verifier_env *env)
>>   					return -EINVAL;
>>   				}
>>
>> +				if (env->cur_state->active_rcu_lock) {
>> +					verbose(env, "bpf_rcu_read_unlock is missing\n");
>> +					return -EINVAL;
>> +				}
>> +
>>   				/* We must do check_reference_leak here before
>>   				 * prepare_func_exit to handle the case when
>>   				 * state->curframe > 0, it may be a callback
>> @@ -13499,6 +13608,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>>   			convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
>>   			break;
>>   		case PTR_TO_BTF_ID:
>> +		case PTR_TO_BTF_ID | MEM_RCU:
> 
> This shouldn't be needed, right? If it is RCU protected, there shouldn't be a
> need for handling faults (or it's a bug in the kernel).

I guess we still need to do this.
Yes, it is rcu protected, but rcu protected pointer could be a NULL, so 
possible fault should still be handled.

> 
>>   		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
>>   			if (type == BPF_READ) {
>>   				insn->code = BPF_LDX | BPF_PROBE_MEM |
>> @@ -14148,11 +14258,11 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
>>   			goto patch_call_imm;
>>   		}
>>
>> -		if (insn->imm == BPF_FUNC_task_storage_get ||
>> -		    insn->imm == BPF_FUNC_sk_storage_get ||
>> -		    insn->imm == BPF_FUNC_inode_storage_get ||
>> -		    insn->imm == BPF_FUNC_cgrp_storage_get) {
>> -			if (env->prog->aux->sleepable)
>> +		if (is_storage_get_function(insn->imm)) {
>> +			if (env->prog->aux->sleepable && insn->off) {
>> +				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
>> +				insn->off = 0;
>> +			} else if (env->prog->aux->sleepable)
>>   				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
>>   			else
>>   				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-03 22:18   ` Kumar Kartikeya Dwivedi
@ 2022-11-04  0:30     ` Yonghong Song
  2022-11-04  1:04       ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2022-11-04  0:30 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 11/3/22 3:18 PM, Kumar Kartikeya Dwivedi wrote:
> On Thu, Nov 03, 2022 at 12:51:13PM IST, Yonghong Song wrote:
>> Add bpf_rcu_read_lock() and bpf_rcu_read_unlock() helpers.
>> Both helpers are available to all program types with
>> CAP_BPF capability.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/linux/bpf.h            |  2 ++
>>   include/uapi/linux/bpf.h       | 14 ++++++++++++++
>>   kernel/bpf/core.c              |  2 ++
>>   kernel/bpf/helpers.c           | 26 ++++++++++++++++++++++++++
>>   tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
>>   5 files changed, 58 insertions(+)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 8d948bfcb984..a9bda4c91fc7 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -2554,6 +2554,8 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
>>   extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>>   extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>>   extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
>> +extern const struct bpf_func_proto bpf_rcu_read_lock_proto;
>> +extern const struct bpf_func_proto bpf_rcu_read_unlock_proto;
>>
>>   const struct bpf_func_proto *tracing_prog_func_proto(
>>     enum bpf_func_id func_id, const struct bpf_prog *prog);
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 94659f6b3395..e86389cd6133 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -5481,6 +5481,18 @@ union bpf_attr {
>>    *		0 on success.
>>    *
>>    *		**-ENOENT** if the bpf_local_storage cannot be found.
>> + *
>> + * void bpf_rcu_read_lock(void)
>> + *	Description
>> + *		Call kernel rcu_read_lock().
>> + *	Return
>> + *		None.
>> + *
>> + * void bpf_rcu_read_unlock(void)
>> + *	Description
>> + *		Call kernel rcu_read_unlock().
>> + *	Return
>> + *		None.
>>    */
> 
> It would be better to not bake these into UAPI and keep them unstable only IMO.

rcu_read_lock/unlock() are well known in kernel programming. I think
put them as stable UAPI should be fine. But I will reword the
description to remove any direct reference to kernel functions.

> 
>> [...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-03 22:29       ` Kumar Kartikeya Dwivedi
@ 2022-11-04  0:32         ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-04  0:32 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: KP Singh, Yonghong Song, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau



On 11/3/22 3:29 PM, Kumar Kartikeya Dwivedi wrote:
> On Fri, Nov 04, 2022 at 02:30:55AM IST, Yonghong Song wrote:
>>
>>
>> On 11/3/22 7:28 AM, KP Singh wrote:
>>> On Thu, Nov 3, 2022 at 8:21 AM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>> Add bpf_rcu_read_lock() and bpf_rcu_read_unlock() helpers.
>>>> Both helpers are available to all program types with
>>>> CAP_BPF capability.
>>>>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>>    include/linux/bpf.h            |  2 ++
>>>>    include/uapi/linux/bpf.h       | 14 ++++++++++++++
>>>>    kernel/bpf/core.c              |  2 ++
>>>>    kernel/bpf/helpers.c           | 26 ++++++++++++++++++++++++++
>>>>    tools/include/uapi/linux/bpf.h | 14 ++++++++++++++
>>>>    5 files changed, 58 insertions(+)
>>>>
>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>> index 8d948bfcb984..a9bda4c91fc7 100644
>>>> --- a/include/linux/bpf.h
>>>> +++ b/include/linux/bpf.h
>>>> @@ -2554,6 +2554,8 @@ extern const struct bpf_func_proto bpf_get_retval_proto;
>>>>    extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto;
>>>>    extern const struct bpf_func_proto bpf_cgrp_storage_get_proto;
>>>>    extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto;
>>>> +extern const struct bpf_func_proto bpf_rcu_read_lock_proto;
>>>> +extern const struct bpf_func_proto bpf_rcu_read_unlock_proto;
>>>>
>>>>    const struct bpf_func_proto *tracing_prog_func_proto(
>>>>      enum bpf_func_id func_id, const struct bpf_prog *prog);
>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>> index 94659f6b3395..e86389cd6133 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -5481,6 +5481,18 @@ union bpf_attr {
>>>>     *             0 on success.
>>>>     *
>>>>     *             **-ENOENT** if the bpf_local_storage cannot be found.
>>>> + *
>>>> + * void bpf_rcu_read_lock(void)
>>>> + *     Description
>>>> + *             Call kernel rcu_read_lock().
>>>
>>> Simple wrapper around rcu_read_lock() and maybe explain where and how
>>> it is supposed to
>>> be used.
>>>
>>> e.g. the verifier will check if __rcu pointers are being accessed with
>>> bpf_rcu_read_lock in
>>> sleepable programs.
>>
>> Okay, I can add more descriptions.
>>
>>>
>>> Calling the helper from a non-sleepable program is inconsequential,
>>> but maybe we can even
>>> avoid exposing it to non-sleepable programs?
>>
>> I actually debated myself whether to make bpf_rcu_read_lock()/unlock()
>> to be sleepable only. Although it won't hurt for non-sleepable program,
>> I guess I can make it as sleepable only so users don't make mistake
>> to use them in non-sleepable programs.
>>
> 
> It's better to let it be a noop in non-sleepable programs but still allow
> calling it. It allows writing common helper functions in the BPF program that
> work in both sleepable and non-sleepable cases by holding the RCU read lock.

yes, I can do it. The verifier can rewrite it to a nop for non-sleepable 
program to minimize runtime overhead.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-04  0:30     ` Yonghong Song
@ 2022-11-04  1:04       ` Alexei Starovoitov
  2022-11-04  2:37         ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2022-11-04  1:04 UTC (permalink / raw)
  To: Yonghong Song, Kumar Kartikeya Dwivedi, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On 11/3/22 5:30 PM, Yonghong Song wrote:
>>> index 94659f6b3395..e86389cd6133 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -5481,6 +5481,18 @@ union bpf_attr {
>>>    *        0 on success.
>>>    *
>>>    *        **-ENOENT** if the bpf_local_storage cannot be found.
>>> + *
>>> + * void bpf_rcu_read_lock(void)
>>> + *    Description
>>> + *        Call kernel rcu_read_lock().
>>> + *    Return
>>> + *        None.
>>> + *
>>> + * void bpf_rcu_read_unlock(void)
>>> + *    Description
>>> + *        Call kernel rcu_read_unlock().
>>> + *    Return
>>> + *        None.
>>>    */
>>
>> It would be better to not bake these into UAPI and keep them unstable 
>> only IMO.
> 
> rcu_read_lock/unlock() are well known in kernel programming. I think
> put them as stable UAPI should be fine. But I will reword the
> description to remove any direct reference to kernel functions.

tbh I also feel that kfunc is better here.
Sooner or later we will need srcu_read_lock,
then rcu_read_lock_task_trace, etc.
bpf shouldn't be a burden to RCU.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper
  2022-11-04  1:04       ` Alexei Starovoitov
@ 2022-11-04  2:37         ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-04  2:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Kumar Kartikeya Dwivedi, Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 11/3/22 6:04 PM, Alexei Starovoitov wrote:
> On 11/3/22 5:30 PM, Yonghong Song wrote:
>>>> index 94659f6b3395..e86389cd6133 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -5481,6 +5481,18 @@ union bpf_attr {
>>>>    *        0 on success.
>>>>    *
>>>>    *        **-ENOENT** if the bpf_local_storage cannot be found.
>>>> + *
>>>> + * void bpf_rcu_read_lock(void)
>>>> + *    Description
>>>> + *        Call kernel rcu_read_lock().
>>>> + *    Return
>>>> + *        None.
>>>> + *
>>>> + * void bpf_rcu_read_unlock(void)
>>>> + *    Description
>>>> + *        Call kernel rcu_read_unlock().
>>>> + *    Return
>>>> + *        None.
>>>>    */
>>>
>>> It would be better to not bake these into UAPI and keep them unstable 
>>> only IMO.
>>
>> rcu_read_lock/unlock() are well known in kernel programming. I think
>> put them as stable UAPI should be fine. But I will reword the
>> description to remove any direct reference to kernel functions.
> 
> tbh I also feel that kfunc is better here.
> Sooner or later we will need srcu_read_lock,
> then rcu_read_lock_task_trace, etc.
> bpf shouldn't be a burden to RCU.

Okay, I will kfunc then.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-03 22:17   ` Kumar Kartikeya Dwivedi
  2022-11-04  0:28     ` Yonghong Song
@ 2022-11-04 12:13     ` KP Singh
  2022-11-04 15:27       ` Alexei Starovoitov
  1 sibling, 1 reply; 26+ messages in thread
From: KP Singh @ 2022-11-04 12:13 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau

On Thu, Nov 3, 2022 at 11:17 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote:
> > A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
> > object access needing rcu_read_lock protection. The rcu protection
> > is not needed for non-sleepable program. So various verification
> > checking is only done for sleepable programs. In particular, only
> > the following insns can be inside bpf_rcu_read_lock() region:
> >   - any non call insns except BPF_ABS/BPF_IND
> >   - non sleepable helpers and kfuncs.
> > Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
> > allocation flag) should be GFP_ATOMIC.
> >
> > If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
> > this pointer and the load which gets this pointer needs to be
> > protected by bpf_rcu_read_lock(). The following shows a couple
> > of examples:
> >   struct task_struct {
> >       ...
> >       struct task_struct __rcu        *real_parent;
> >       struct css_set __rcu            *cgroups;
> >       ...
> >   };
> >   struct css_set {
> >       ...
> >       struct cgroup *dfl_cgrp;
> >       ...
> >   }
> >   ...
> >   task = bpf_get_current_task_btf();
> >   cgroups = task->cgroups;
> >   dfl_cgroup = cgroups->dfl_cgrp;
> >   ... using dfl_cgroup ...
> >
> > The bpf_rcu_read_lock/unlock() should be added like below to
> > avoid verification failures.
> >   task = bpf_get_current_task_btf();
> >   bpf_rcu_read_lock();
> >   cgroups = task->cgroups;
> >   dfl_cgroup = cgroups->dfl_cgrp;
> >   bpf_rcu_read_unlock();
> >   ... using dfl_cgroup ...
> >
> > The following is another example for task->real_parent.
> >   task = bpf_get_current_task_btf();
> >   bpf_rcu_read_lock();
> >   real_parent = task->real_parent;
> >   ... bpf_task_storage_get(&map, real_parent, 0, 0);
> >   bpf_rcu_read_unlock();
> >
> > There is another case observed in selftest bpf_iter_ipv6_route.c:
> >   struct fib6_info *rt = ctx->rt;
> >   ...
> >   fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
> >   ...
> >   if (rt->nh)
> >     fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
> >   ...
> >   ... using fib6_nh ...
> > Currently verification will fail with
> >   same insn cannot be used with different pointers
> > since the use of fib6_nh is tag with rcu in one path
> > but not in the other path. The above use case is a valid
> > one so the verifier is changed to ignore MEM_RCU type tag
> > in such cases.
> >
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> >  include/linux/bpf.h          |   3 +
> >  include/linux/bpf_verifier.h |   1 +
> >  kernel/bpf/btf.c             |  11 +++
> >  kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
> >  4 files changed, 133 insertions(+), 8 deletions(-)

[...]

> > +
>
> This isn't right. Every load that obtains an RCU pointer needs to become tied to
> the current RCU section, and needs to be invalidated once the RCU section ends.
>
> So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access,
> you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called.
>
> Otherwise, with the current logic, the following would become possible:
>
> bpf_rcu_read_lock();
> p = rcu_dereference(foo->rcup);
> bpf_rcu_read_unlock();
>
> // p is possibly dead
>
> bpf_rcu_read_lock();
> // use p
> bpf_rcu_read_unlock();
>

What do want to do about cases like:

bpf_rcu_read_lock();

q = rcu_derference(foo->rcup);

bpf_rcu_read_lock();

p = rcu_derference(foo->rcup);

bpf_rcu_read_unlock();

// Use q
// Use p
bpf_rcu_read_unlock();

I think this is probably implied in your statement but just making it clear,

The invalidation needs to happen only when the outermost bpf_rcu_read_unlock
is called. i.e. when active_rcu_lock goes back down to 0.


> bpf_rcu_read_lock();
> // use p
> bpf_rcu_read_unlock();


> I have pretty much the same patchset lying locally in my tree (waiting for the
> kfunc rework to get in before I post it), but I can also rebase other stuff
> using explicit bpf_rcu_read_lock on top of yours.
>
> >       /* If this is an untrusted pointer, all pointers formed by walking it
> >        * also inherit the untrusted flag.
> >        */
> > @@ -5684,7 +5722,12 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
> >  static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
> >  static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
> >  static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
> > -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
> > +static const struct bpf_reg_types btf_ptr_types = {
> > +     .types = {
> > +             PTR_TO_BTF_ID,
> > +             PTR_TO_BTF_ID | MEM_RCU,
> > +     }
> > +};
> >  static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
> >  static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
> >  static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> > @@ -5758,6 +5801,20 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >       if (arg_type & PTR_MAYBE_NULL)
> >               type &= ~PTR_MAYBE_NULL;
> >
> > +     /* If the reg type is marked as MEM_RCU, ensure the usage is in the rcu_read_lock
> > +      * region, and remove MEM_RCU from the type since the arg_type won't encode
> > +      * MEM_RCU.
> > +      */
> > +     if (type & MEM_RCU) {
> > +             if (env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) {
> > +                     verbose(env,
> > +                             "R%d is arg type %s needs rcu protection\n",
> > +                             regno, reg_type_str(env, reg->type));
> > +                     return -EACCES;
> > +             }
> > +             type &= ~MEM_RCU;
> > +     }
> > +
> >       for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
> >               expected = compatible->types[i];
> >               if (expected == NOT_INIT)
> > @@ -5774,7 +5831,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >       return -EACCES;
> >
> >  found:
> > -     if (reg->type == PTR_TO_BTF_ID) {
> > +     /* reg is already protected by rcu_read_lock(). Peel off MEM_RCU from reg->type. */
> > +     if ((reg->type & ~MEM_RCU) == PTR_TO_BTF_ID) {
> >               /* For bpf_sk_release, it needs to match against first member
> >                * 'struct sock_common', hence make an exception for it. This
> >                * allows bpf_sk_release to work for multiple socket types.
> > @@ -5850,6 +5908,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
> >        * fixed offset.
> >        */
> >       case PTR_TO_BTF_ID:
> > +     case PTR_TO_BTF_ID | MEM_RCU:
> >               /* When referenced PTR_TO_BTF_ID is passed to release function,
> >                * it's fixed offset must be 0. In the other cases, fixed offset
> >                * can be non-zero.
> > @@ -7289,6 +7348,26 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> >       }
> >
> >       meta.func_id = func_id;
> > +
> > +     if (func_id == BPF_FUNC_rcu_read_lock)
> > +             env->cur_state->active_rcu_lock++;
> > +     if (func_id == BPF_FUNC_rcu_read_unlock) {
> > +             if (env->cur_state->active_rcu_lock == 0) {
> > +                     verbose(env, "missing bpf_rcu_read_lock\n");
> > +                     return -EINVAL;
> > +             }
> > +
> > +             env->cur_state->active_rcu_lock--;
> > +     }
> > +     if (env->cur_state->active_rcu_lock) {
> > +             if (is_sleepable_function(func_id))
> > +                     verbose(env, "sleepable helper %s#%din rcu_read_lock region\n",
> > +                             func_id_name(func_id), func_id);
> > +
> > +             if (env->prog->aux->sleepable && is_storage_get_function(func_id))
> > +                     insn->off = BPF_STORAGE_GET_CALL;
>
> This is a bit ugly. Why not use bpf_insn_aux_data?
>
> > +     }
> > +
> >       /* check args */
> >       for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
> >               err = check_func_arg(env, i, &meta, fn);
> > @@ -10470,6 +10549,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >               return -EINVAL;
> >       }
> >
> > +     if (env->prog->aux->sleepable && env->cur_state->active_rcu_lock) {
> > +             verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_rcu_read_lock-ed region\n");
> > +             return -EINVAL;
> > +     }
> > +
> >       if (regs[ctx_reg].type != PTR_TO_CTX) {
> >               verbose(env,
> >                       "at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
> > @@ -11734,6 +11818,9 @@ static bool states_equal(struct bpf_verifier_env *env,
> >       if (old->active_spin_lock != cur->active_spin_lock)
> >               return false;
> >
> > +     if (old->active_rcu_lock != cur->active_rcu_lock)
> > +             return false;
> > +
> >       /* for states to be equal callsites have to be the same
> >        * and all frame states need to be equivalent
> >        */
> > @@ -12141,6 +12228,11 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
> >                              !reg_type_mismatch_ok(prev));
> >  }
> >
> > +static bool reg_type_mismatch_ignore_rcu(enum bpf_reg_type src, enum bpf_reg_type prev)
> > +{
> > +     return reg_type_mismatch(src & ~MEM_RCU, prev & ~MEM_RCU);
> > +}
> > +
>
> See the discussion about this in David's set:
> https://lore.kernel.org/bpf/CAP01T75FGW7F=Ho+oqoC6WgxK5uUir2=CUgiW_HwqNxmzmthBg@mail.gmail.com
>
> >  static int do_check(struct bpf_verifier_env *env)
> >  {
> >       bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
> > @@ -12266,6 +12358,18 @@ static int do_check(struct bpf_verifier_env *env)
> >
> >                       prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type;
> >
> > +                     /* For NOT_INIT *prev_src_type, ignore rcu type tag.
> > +                      * For code like below,
> > +                      *   struct foo *f;
> > +                      *   if (...)
> > +                      *     f = ...; // f with MEM_RCU type tag.
> > +                      *   else
> > +                      *     f = ...; // f without MEM_RCU type tag.
> > +                      *   ... f ...  // Here f could be with/without MEM_RCU
> > +                      *
> > +                      * It is safe to ignore MEM_RCU type tag here since
> > +                      * base types are the same.
> > +                      */
> >                       if (*prev_src_type == NOT_INIT) {
> >                               /* saw a valid insn
> >                                * dst_reg = *(u32 *)(src_reg + off)
> > @@ -12273,7 +12377,7 @@ static int do_check(struct bpf_verifier_env *env)
> >                                */
> >                               *prev_src_type = src_reg_type;
> >
> > -                     } else if (reg_type_mismatch(src_reg_type, *prev_src_type)) {
> > +                     } else if (reg_type_mismatch_ignore_rcu(src_reg_type, *prev_src_type)) {
> >                               /* ABuser program is trying to use the same insn
> >                                * dst_reg = *(u32*) (src_reg + off)
> >                                * with different pointer types:
> > @@ -12412,6 +12516,11 @@ static int do_check(struct bpf_verifier_env *env)
> >                                       return -EINVAL;
> >                               }
> >
> > +                             if (env->cur_state->active_rcu_lock) {
> > +                                     verbose(env, "bpf_rcu_read_unlock is missing\n");
> > +                                     return -EINVAL;
> > +                             }
> > +
> >                               /* We must do check_reference_leak here before
> >                                * prepare_func_exit to handle the case when
> >                                * state->curframe > 0, it may be a callback
> > @@ -13499,6 +13608,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> >                       convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
> >                       break;
> >               case PTR_TO_BTF_ID:
> > +             case PTR_TO_BTF_ID | MEM_RCU:
>
> This shouldn't be needed, right? If it is RCU protected, there shouldn't be a
> need for handling faults (or it's a bug in the kernel).
>
> >               case PTR_TO_BTF_ID | PTR_UNTRUSTED:
> >                       if (type == BPF_READ) {
> >                               insn->code = BPF_LDX | BPF_PROBE_MEM |
> > @@ -14148,11 +14258,11 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> >                       goto patch_call_imm;
> >               }
> >
> > -             if (insn->imm == BPF_FUNC_task_storage_get ||
> > -                 insn->imm == BPF_FUNC_sk_storage_get ||
> > -                 insn->imm == BPF_FUNC_inode_storage_get ||
> > -                 insn->imm == BPF_FUNC_cgrp_storage_get) {
> > -                     if (env->prog->aux->sleepable)
> > +             if (is_storage_get_function(insn->imm)) {
> > +                     if (env->prog->aux->sleepable && insn->off) {
> > +                             insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
> > +                             insn->off = 0;
> > +                     } else if (env->prog->aux->sleepable)
> >                               insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
> >                       else
> >                               insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-04 12:13     ` KP Singh
@ 2022-11-04 15:27       ` Alexei Starovoitov
  2022-11-04 15:32         ` KP Singh
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2022-11-04 15:27 UTC (permalink / raw)
  To: KP Singh, Kumar Kartikeya Dwivedi
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, kernel-team, Martin KaFai Lau

On 11/4/22 5:13 AM, KP Singh wrote:
> On Thu, Nov 3, 2022 at 11:17 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
>>
>> On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote:
>>> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
>>> object access needing rcu_read_lock protection. The rcu protection
>>> is not needed for non-sleepable program. So various verification
>>> checking is only done for sleepable programs. In particular, only
>>> the following insns can be inside bpf_rcu_read_lock() region:
>>>    - any non call insns except BPF_ABS/BPF_IND
>>>    - non sleepable helpers and kfuncs.
>>> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
>>> allocation flag) should be GFP_ATOMIC.
>>>
>>> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
>>> this pointer and the load which gets this pointer needs to be
>>> protected by bpf_rcu_read_lock(). The following shows a couple
>>> of examples:
>>>    struct task_struct {
>>>        ...
>>>        struct task_struct __rcu        *real_parent;
>>>        struct css_set __rcu            *cgroups;
>>>        ...
>>>    };
>>>    struct css_set {
>>>        ...
>>>        struct cgroup *dfl_cgrp;
>>>        ...
>>>    }
>>>    ...
>>>    task = bpf_get_current_task_btf();
>>>    cgroups = task->cgroups;
>>>    dfl_cgroup = cgroups->dfl_cgrp;
>>>    ... using dfl_cgroup ...
>>>
>>> The bpf_rcu_read_lock/unlock() should be added like below to
>>> avoid verification failures.
>>>    task = bpf_get_current_task_btf();
>>>    bpf_rcu_read_lock();
>>>    cgroups = task->cgroups;
>>>    dfl_cgroup = cgroups->dfl_cgrp;
>>>    bpf_rcu_read_unlock();
>>>    ... using dfl_cgroup ...
>>>
>>> The following is another example for task->real_parent.
>>>    task = bpf_get_current_task_btf();
>>>    bpf_rcu_read_lock();
>>>    real_parent = task->real_parent;
>>>    ... bpf_task_storage_get(&map, real_parent, 0, 0);
>>>    bpf_rcu_read_unlock();
>>>
>>> There is another case observed in selftest bpf_iter_ipv6_route.c:
>>>    struct fib6_info *rt = ctx->rt;
>>>    ...
>>>    fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
>>>    ...
>>>    if (rt->nh)
>>>      fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
>>>    ...
>>>    ... using fib6_nh ...
>>> Currently verification will fail with
>>>    same insn cannot be used with different pointers
>>> since the use of fib6_nh is tag with rcu in one path
>>> but not in the other path. The above use case is a valid
>>> one so the verifier is changed to ignore MEM_RCU type tag
>>> in such cases.
>>>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>   include/linux/bpf.h          |   3 +
>>>   include/linux/bpf_verifier.h |   1 +
>>>   kernel/bpf/btf.c             |  11 +++
>>>   kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
>>>   4 files changed, 133 insertions(+), 8 deletions(-)
> 
> [...]
> 
>>> +
>>
>> This isn't right. Every load that obtains an RCU pointer needs to become tied to
>> the current RCU section, and needs to be invalidated once the RCU section ends.
>>
>> So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access,
>> you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called.
>>
>> Otherwise, with the current logic, the following would become possible:
>>
>> bpf_rcu_read_lock();
>> p = rcu_dereference(foo->rcup);
>> bpf_rcu_read_unlock();
>>
>> // p is possibly dead
>>
>> bpf_rcu_read_lock();
>> // use p
>> bpf_rcu_read_unlock();
>>
> 
> What do want to do about cases like:
> 
> bpf_rcu_read_lock();
> 
> q = rcu_derference(foo->rcup);
> 
> bpf_rcu_read_lock();

This one should be rejected for simplicity.
Let's not complicated things with nested cs-s.

> 
> p = rcu_derference(foo->rcup);
> 
> bpf_rcu_read_unlock();
> 
> // Use q
> // Use p
> bpf_rcu_read_unlock();
> 
> I think this is probably implied in your statement but just making it clear,
> 
> The invalidation needs to happen only when the outermost bpf_rcu_read_unlock
> is called. i.e. when active_rcu_lock goes back down to 0.
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-04 15:27       ` Alexei Starovoitov
@ 2022-11-04 15:32         ` KP Singh
  2022-11-04 16:27           ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: KP Singh @ 2022-11-04 15:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, Yonghong Song, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau

On Fri, Nov 4, 2022 at 4:27 PM Alexei Starovoitov <ast@meta.com> wrote:
>
> On 11/4/22 5:13 AM, KP Singh wrote:
> > On Thu, Nov 3, 2022 at 11:17 PM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> >>
> >> On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote:
> >>> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
> >>> object access needing rcu_read_lock protection. The rcu protection
> >>> is not needed for non-sleepable program. So various verification
> >>> checking is only done for sleepable programs. In particular, only
> >>> the following insns can be inside bpf_rcu_read_lock() region:
> >>>    - any non call insns except BPF_ABS/BPF_IND
> >>>    - non sleepable helpers and kfuncs.
> >>> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
> >>> allocation flag) should be GFP_ATOMIC.
> >>>
> >>> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
> >>> this pointer and the load which gets this pointer needs to be
> >>> protected by bpf_rcu_read_lock(). The following shows a couple
> >>> of examples:
> >>>    struct task_struct {
> >>>        ...
> >>>        struct task_struct __rcu        *real_parent;
> >>>        struct css_set __rcu            *cgroups;
> >>>        ...
> >>>    };
> >>>    struct css_set {
> >>>        ...
> >>>        struct cgroup *dfl_cgrp;
> >>>        ...
> >>>    }
> >>>    ...
> >>>    task = bpf_get_current_task_btf();
> >>>    cgroups = task->cgroups;
> >>>    dfl_cgroup = cgroups->dfl_cgrp;
> >>>    ... using dfl_cgroup ...
> >>>
> >>> The bpf_rcu_read_lock/unlock() should be added like below to
> >>> avoid verification failures.
> >>>    task = bpf_get_current_task_btf();
> >>>    bpf_rcu_read_lock();
> >>>    cgroups = task->cgroups;
> >>>    dfl_cgroup = cgroups->dfl_cgrp;
> >>>    bpf_rcu_read_unlock();
> >>>    ... using dfl_cgroup ...
> >>>
> >>> The following is another example for task->real_parent.
> >>>    task = bpf_get_current_task_btf();
> >>>    bpf_rcu_read_lock();
> >>>    real_parent = task->real_parent;
> >>>    ... bpf_task_storage_get(&map, real_parent, 0, 0);
> >>>    bpf_rcu_read_unlock();
> >>>
> >>> There is another case observed in selftest bpf_iter_ipv6_route.c:
> >>>    struct fib6_info *rt = ctx->rt;
> >>>    ...
> >>>    fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
> >>>    ...
> >>>    if (rt->nh)
> >>>      fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
> >>>    ...
> >>>    ... using fib6_nh ...
> >>> Currently verification will fail with
> >>>    same insn cannot be used with different pointers
> >>> since the use of fib6_nh is tag with rcu in one path
> >>> but not in the other path. The above use case is a valid
> >>> one so the verifier is changed to ignore MEM_RCU type tag
> >>> in such cases.
> >>>
> >>> Signed-off-by: Yonghong Song <yhs@fb.com>
> >>> ---
> >>>   include/linux/bpf.h          |   3 +
> >>>   include/linux/bpf_verifier.h |   1 +
> >>>   kernel/bpf/btf.c             |  11 +++
> >>>   kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
> >>>   4 files changed, 133 insertions(+), 8 deletions(-)
> >
> > [...]
> >
> >>> +
> >>
> >> This isn't right. Every load that obtains an RCU pointer needs to become tied to
> >> the current RCU section, and needs to be invalidated once the RCU section ends.
> >>
> >> So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access,
> >> you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called.
> >>
> >> Otherwise, with the current logic, the following would become possible:
> >>
> >> bpf_rcu_read_lock();
> >> p = rcu_dereference(foo->rcup);
> >> bpf_rcu_read_unlock();
> >>
> >> // p is possibly dead
> >>
> >> bpf_rcu_read_lock();
> >> // use p
> >> bpf_rcu_read_unlock();
> >>
> >
> > What do want to do about cases like:
> >
> > bpf_rcu_read_lock();
> >
> > q = rcu_derference(foo->rcup);
> >
> > bpf_rcu_read_lock();
>
> This one should be rejected for simplicity.
> Let's not complicated things with nested cs-s.

Agreed, the current logic tries to count the number of active
critical sections and the verifier should just reject if there
is a nested bpf_rcu_read_lock() call.

>
> >
> > p = rcu_derference(foo->rcup);
> >
> > bpf_rcu_read_unlock();
> >
> > // Use q
> > // Use p
> > bpf_rcu_read_unlock();
> >
> > I think this is probably implied in your statement but just making it clear,
> >
> > The invalidation needs to happen only when the outermost bpf_rcu_read_unlock
> > is called. i.e. when active_rcu_lock goes back down to 0.
> >
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support
  2022-11-04 15:32         ` KP Singh
@ 2022-11-04 16:27           ` Yonghong Song
  0 siblings, 0 replies; 26+ messages in thread
From: Yonghong Song @ 2022-11-04 16:27 UTC (permalink / raw)
  To: KP Singh, Alexei Starovoitov
  Cc: Kumar Kartikeya Dwivedi, Yonghong Song, bpf, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, kernel-team, Martin KaFai Lau



On 11/4/22 8:32 AM, KP Singh wrote:
> On Fri, Nov 4, 2022 at 4:27 PM Alexei Starovoitov <ast@meta.com> wrote:
>>
>> On 11/4/22 5:13 AM, KP Singh wrote:
>>> On Thu, Nov 3, 2022 at 11:17 PM Kumar Kartikeya Dwivedi
>>> <memxor@gmail.com> wrote:
>>>>
>>>> On Thu, Nov 03, 2022 at 12:51:18PM IST, Yonghong Song wrote:
>>>>> A new bpf_type_flag MEM_RCU is added to indicate a PTR_TO_BTF_ID
>>>>> object access needing rcu_read_lock protection. The rcu protection
>>>>> is not needed for non-sleepable program. So various verification
>>>>> checking is only done for sleepable programs. In particular, only
>>>>> the following insns can be inside bpf_rcu_read_lock() region:
>>>>>     - any non call insns except BPF_ABS/BPF_IND
>>>>>     - non sleepable helpers and kfuncs.
>>>>> Also, bpf_*_storage_get() helper's 5th hidden argument (for memory
>>>>> allocation flag) should be GFP_ATOMIC.
>>>>>
>>>>> If a pointer (PTR_TO_BTF_ID) is marked as rcu, then any use of
>>>>> this pointer and the load which gets this pointer needs to be
>>>>> protected by bpf_rcu_read_lock(). The following shows a couple
>>>>> of examples:
>>>>>     struct task_struct {
>>>>>         ...
>>>>>         struct task_struct __rcu        *real_parent;
>>>>>         struct css_set __rcu            *cgroups;
>>>>>         ...
>>>>>     };
>>>>>     struct css_set {
>>>>>         ...
>>>>>         struct cgroup *dfl_cgrp;
>>>>>         ...
>>>>>     }
>>>>>     ...
>>>>>     task = bpf_get_current_task_btf();
>>>>>     cgroups = task->cgroups;
>>>>>     dfl_cgroup = cgroups->dfl_cgrp;
>>>>>     ... using dfl_cgroup ...
>>>>>
>>>>> The bpf_rcu_read_lock/unlock() should be added like below to
>>>>> avoid verification failures.
>>>>>     task = bpf_get_current_task_btf();
>>>>>     bpf_rcu_read_lock();
>>>>>     cgroups = task->cgroups;
>>>>>     dfl_cgroup = cgroups->dfl_cgrp;
>>>>>     bpf_rcu_read_unlock();
>>>>>     ... using dfl_cgroup ...
>>>>>
>>>>> The following is another example for task->real_parent.
>>>>>     task = bpf_get_current_task_btf();
>>>>>     bpf_rcu_read_lock();
>>>>>     real_parent = task->real_parent;
>>>>>     ... bpf_task_storage_get(&map, real_parent, 0, 0);
>>>>>     bpf_rcu_read_unlock();
>>>>>
>>>>> There is another case observed in selftest bpf_iter_ipv6_route.c:
>>>>>     struct fib6_info *rt = ctx->rt;
>>>>>     ...
>>>>>     fib6_nh = &rt->fib6_nh[0]; // Not rcu protected
>>>>>     ...
>>>>>     if (rt->nh)
>>>>>       fib6_nh = &nh->nh_info->fib6_nh; // rcu protected
>>>>>     ...
>>>>>     ... using fib6_nh ...
>>>>> Currently verification will fail with
>>>>>     same insn cannot be used with different pointers
>>>>> since the use of fib6_nh is tag with rcu in one path
>>>>> but not in the other path. The above use case is a valid
>>>>> one so the verifier is changed to ignore MEM_RCU type tag
>>>>> in such cases.
>>>>>
>>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>>> ---
>>>>>    include/linux/bpf.h          |   3 +
>>>>>    include/linux/bpf_verifier.h |   1 +
>>>>>    kernel/bpf/btf.c             |  11 +++
>>>>>    kernel/bpf/verifier.c        | 126 ++++++++++++++++++++++++++++++++---
>>>>>    4 files changed, 133 insertions(+), 8 deletions(-)
>>>
>>> [...]
>>>
>>>>> +
>>>>
>>>> This isn't right. Every load that obtains an RCU pointer needs to become tied to
>>>> the current RCU section, and needs to be invalidated once the RCU section ends.
>>>>
>>>> So in addition to checking that bpf_rcu_read_lock is held around MEM_RCU access,
>>>> you need to invalidate all MEM_RCU pointers when bpf_rcu_read_unlock is called.
>>>>
>>>> Otherwise, with the current logic, the following would become possible:
>>>>
>>>> bpf_rcu_read_lock();
>>>> p = rcu_dereference(foo->rcup);
>>>> bpf_rcu_read_unlock();
>>>>
>>>> // p is possibly dead
>>>>
>>>> bpf_rcu_read_lock();
>>>> // use p
>>>> bpf_rcu_read_unlock();
>>>>
>>>
>>> What do want to do about cases like:
>>>
>>> bpf_rcu_read_lock();
>>>
>>> q = rcu_derference(foo->rcup);
>>>
>>> bpf_rcu_read_lock();
>>
>> This one should be rejected for simplicity.
>> Let's not complicated things with nested cs-s.
> 
> Agreed, the current logic tries to count the number of active
> critical sections and the verifier should just reject if there
> is a nested bpf_rcu_read_lock() call.

Okay, will not allow nested bpf_rcu_read_lock() for now.

> 
>>
>>>
>>> p = rcu_derference(foo->rcup);
>>>
>>> bpf_rcu_read_unlock();
>>>
>>> // Use q
>>> // Use p
>>> bpf_rcu_read_unlock();
>>>
>>> I think this is probably implied in your statement but just making it clear,
>>>
>>> The invalidation needs to happen only when the outermost bpf_rcu_read_unlock
>>> is called. i.e. when active_rcu_lock goes back down to 0.
>>>
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-11-04 16:29 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-03  7:21 [PATCH bpf-next 0/5] bpf: Add bpf_rcu_read_lock() support Yonghong Song
2022-11-03  7:21 ` [PATCH bpf-next 1/5] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
2022-11-03 14:30   ` KP Singh
2022-11-03  7:21 ` [PATCH bpf-next 2/5] bpf: Add bpf_rcu_read_lock/unlock helper Yonghong Song
2022-11-03 14:28   ` KP Singh
2022-11-03 21:00     ` Yonghong Song
2022-11-03 22:29       ` Kumar Kartikeya Dwivedi
2022-11-04  0:32         ` Yonghong Song
2022-11-03 22:18   ` Kumar Kartikeya Dwivedi
2022-11-04  0:30     ` Yonghong Song
2022-11-04  1:04       ` Alexei Starovoitov
2022-11-04  2:37         ` Yonghong Song
2022-11-03  7:21 ` [PATCH bpf-next 3/5] bpf: Add rcu btf_type_tag verifier support Yonghong Song
2022-11-03 14:43   ` KP Singh
2022-11-03 21:03     ` Yonghong Song
2022-11-03 22:17   ` Kumar Kartikeya Dwivedi
2022-11-04  0:28     ` Yonghong Song
2022-11-04 12:13     ` KP Singh
2022-11-04 15:27       ` Alexei Starovoitov
2022-11-04 15:32         ` KP Singh
2022-11-04 16:27           ` Yonghong Song
2022-11-03  7:21 ` [PATCH bpf-next 4/5] bpf: Enable sleeptable support for cgrp local storage Yonghong Song
2022-11-03 14:45   ` KP Singh
2022-11-03  7:21 ` [PATCH bpf-next 5/5] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
2022-11-03 14:54   ` KP Singh
2022-11-03 21:04     ` Yonghong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.