All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v7 0/4] bpf: Add bpf_rcu_read_lock() support
@ 2022-11-21 17:05 Yonghong Song
  2022-11-21 17:05 ` [PATCH bpf-next v7 1/4] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Yonghong Song @ 2022-11-21 17:05 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Currently, without rcu attribute info in BTF, the verifier treats
rcu tagged pointer as a normal pointer. This might be a problem
for sleepable program where rcu_read_lock()/unlock() is not available.
For example, for a sleepable fentry program, if rcu protected memory
access is interleaved with a sleepable helper/kfunc, it is possible
the memory access after the sleepable helper/kfunc might be invalid
since the object might have been freed then. Even without
a sleepable helper/kfunc, without rcu_read_lock() protection,
it is possible that the rcu protected object might be release
in the middle of bpf program execution which may cause incorrect
result.

To prevent above cases, enable btf_type_tag("rcu") attributes,
introduce new bpf_rcu_read_lock/unlock() kfuncs and add verifier support.

In the rest of patch set, Patch 1 enabled btf_type_tag for __rcu
attribute. Patche 2 is a refactoring patch. Patch 3 added new
bpf_rcu_read_lock/unlock() kfuncs and verifier support.
Patch 4 added some tests for these two new kfuncs.

Changelogs:
  v6 -> v7:
    . rebase on top of bpf-next.
    . remove the patch which enables sleepable program using
      cgrp_local_storage map. This is orthogonal to this patch set
      and will be addressed separately.
    . mark the rcu pointer dereference result as UNTRUSTED if inside
      a bpf_rcu_read_lock() region.
  v5 -> v6: 
    . fix selftest prog miss_unlock which tested nested locking.
    . add comments in selftest prog cgrp_succ to explain how to handle
      nested memory access after rcu memory load.
  v4 -> v5:
    . add new test to aarch64 deny list.
  v3 -> v4:
    . fix selftest failures when built with gcc. gcc doesn't support
      btf_type_tag yet and some tests relies on that. skip these
      tests if vmlinux BTF does not have btf_type_tag("rcu").
  v2 -> v3:
    . went back to MEM_RCU approach with invalidate rcu ptr registers
      at bpf_rcu_read_unlock() place.
    . remove KF_RCU_LOCK/UNLOCK flag and compare btf_id at verification
      time instead.
  v1 -> v2:
    . use kfunc instead of helper for bpf_rcu_read_lock/unlock.
    . not use MEM_RCU bpf_type_flag, instead use active_rcu_lock
      in reg state to identify rcu ptr's.
    . Add more self tests.
    . add new test to s390x deny list.

Yonghong Song (4):
  compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu")))
  bpf: Abstract out functions to check sleepable helpers
  bpf: Add kfunc bpf_rcu_read_lock/unlock()
  selftests/bpf: Add tests for bpf_rcu_read_lock()

 include/linux/bpf.h                           |   3 +
 include/linux/bpf_lsm.h                       |   6 +
 include/linux/bpf_verifier.h                  |   2 +
 include/linux/compiler_types.h                |   3 +-
 include/linux/trace_events.h                  |   8 +
 kernel/bpf/bpf_lsm.c                          |  20 +-
 kernel/bpf/btf.c                              |   3 +
 kernel/bpf/helpers.c                          |  12 +
 kernel/bpf/verifier.c                         | 156 ++++++++-
 kernel/trace/bpf_trace.c                      |  22 +-
 tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/rcu_read_lock.c  | 153 +++++++++
 .../selftests/bpf/progs/rcu_read_lock.c       | 325 ++++++++++++++++++
 14 files changed, 695 insertions(+), 20 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/rcu_read_lock.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v7 1/4] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu")))
  2022-11-21 17:05 [PATCH bpf-next v7 0/4] bpf: Add bpf_rcu_read_lock() support Yonghong Song
@ 2022-11-21 17:05 ` Yonghong Song
  2022-11-21 17:05 ` [PATCH bpf-next v7 2/4] bpf: Abstract out functions to check sleepable helpers Yonghong Song
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2022-11-21 17:05 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, KP Singh

Currently, without rcu attribute info in BTF, the verifier treats
rcu tagged pointer as a normal pointer. This might be a problem
for sleepable program where rcu_read_lock()/unlock() is not available.
For example, for a sleepable fentry program, if rcu protected memory
access is interleaved with a sleepable helper/kfunc, it is possible
the memory access after the sleepable helper/kfunc might be invalid
since the object might have been freed then. To prevent such cases,
introducing rcu tagging for memory accesses in verifier can help
to reject such programs.

To enable rcu tagging in BTF, during kernel compilation,
define __rcu as attribute btf_type_tag("rcu") so __rcu information can
be preserved in dwarf and btf, and later can be used for bpf prog verification.

Acked-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/compiler_types.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index eb0466236661..7c1afe0f4129 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -49,7 +49,8 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { }
 # endif
 # define __iomem
 # define __percpu	BTF_TYPE_TAG(percpu)
-# define __rcu
+# define __rcu		BTF_TYPE_TAG(rcu)
+
 # define __chk_user_ptr(x)	(void)0
 # define __chk_io_ptr(x)	(void)0
 /* context/locking */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v7 2/4] bpf: Abstract out functions to check sleepable helpers
  2022-11-21 17:05 [PATCH bpf-next v7 0/4] bpf: Add bpf_rcu_read_lock() support Yonghong Song
  2022-11-21 17:05 ` [PATCH bpf-next v7 1/4] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
@ 2022-11-21 17:05 ` Yonghong Song
  2022-11-22 17:06   ` KP Singh
  2022-11-21 17:05 ` [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock() Yonghong Song
  2022-11-21 17:05 ` [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
  3 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2022-11-21 17:05 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Abstract out two functions to check whether a particular helper
is sleepable or not for bpf_lsm and bpf_trace. These two
functions will be used later to check whether a helper is
sleepable or not in verifier. There is no functionality
change.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf_lsm.h      |  6 ++++++
 include/linux/trace_events.h |  8 ++++++++
 kernel/bpf/bpf_lsm.c         | 20 ++++++++++++++++----
 kernel/trace/bpf_trace.c     | 22 ++++++++++++++++++----
 4 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index 4bcf76a9bb06..d99b1caf118e 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -28,6 +28,7 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 			const struct bpf_prog *prog);
 
 bool bpf_lsm_is_sleepable_hook(u32 btf_id);
+const struct bpf_func_proto *bpf_lsm_sleepable_func_proto(enum bpf_func_id func_id);
 
 static inline struct bpf_storage_blob *bpf_inode(
 	const struct inode *inode)
@@ -50,6 +51,11 @@ static inline bool bpf_lsm_is_sleepable_hook(u32 btf_id)
 {
 	return false;
 }
+static inline const struct bpf_func_proto *
+bpf_lsm_sleepable_func_proto(enum bpf_func_id func_id)
+{
+	return NULL;
+}
 
 static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 				      const struct bpf_prog *prog)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 20749bd9db71..c3eb4fb78ea5 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -9,6 +9,7 @@
 #include <linux/hardirq.h>
 #include <linux/perf_event.h>
 #include <linux/tracepoint.h>
+#include <uapi/linux/bpf.h>
 
 struct trace_array;
 struct array_buffer;
@@ -16,6 +17,7 @@ struct tracer;
 struct dentry;
 struct bpf_prog;
 union bpf_attr;
+struct bpf_func_proto;
 
 const char *trace_print_flags_seq(struct trace_seq *p, const char *delim,
 				  unsigned long flags,
@@ -748,6 +750,7 @@ int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
 			    u32 *fd_type, const char **buf,
 			    u64 *probe_offset, u64 *probe_addr);
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+const struct bpf_func_proto *bpf_tracing_sleepable_func_proto(enum bpf_func_id func_id);
 #else
 static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 {
@@ -794,6 +797,11 @@ bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	return -EOPNOTSUPP;
 }
+static inline const struct bpf_func_proto *
+bpf_tracing_sleepable_func_proto(enum bpf_func_id func_id)
+{
+	return NULL;
+}
 #endif
 
 enum {
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index d6c9b3705f24..2f993a003389 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -192,6 +192,18 @@ static const struct bpf_func_proto bpf_get_attach_cookie_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
+const struct bpf_func_proto *bpf_lsm_sleepable_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	case BPF_FUNC_ima_inode_hash:
+		return &bpf_ima_inode_hash_proto;
+	case BPF_FUNC_ima_file_hash:
+		return &bpf_ima_file_hash_proto;
+	default:
+		return NULL;
+	}
+}
+
 static const struct bpf_func_proto *
 bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -203,6 +215,10 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 			return func_proto;
 	}
 
+	func_proto = bpf_lsm_sleepable_func_proto(func_id);
+	if (func_proto)
+		return prog->aux->sleepable ? func_proto : NULL;
+
 	switch (func_id) {
 	case BPF_FUNC_inode_storage_get:
 		return &bpf_inode_storage_get_proto;
@@ -220,10 +236,6 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_spin_unlock_proto;
 	case BPF_FUNC_bprm_opts_set:
 		return &bpf_bprm_opts_set_proto;
-	case BPF_FUNC_ima_inode_hash:
-		return prog->aux->sleepable ? &bpf_ima_inode_hash_proto : NULL;
-	case BPF_FUNC_ima_file_hash:
-		return prog->aux->sleepable ? &bpf_ima_file_hash_proto : NULL;
 	case BPF_FUNC_get_attach_cookie:
 		return bpf_prog_has_trampoline(prog) ? &bpf_get_attach_cookie_proto : NULL;
 #ifdef CONFIG_NET
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 5b9008bc597b..5e4b1566a174 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1383,9 +1383,27 @@ static int __init bpf_key_sig_kfuncs_init(void)
 late_initcall(bpf_key_sig_kfuncs_init);
 #endif /* CONFIG_KEYS */
 
+const struct bpf_func_proto *bpf_tracing_sleepable_func_proto(enum bpf_func_id func_id)
+{
+	switch (func_id) {
+	case BPF_FUNC_copy_from_user:
+		return &bpf_copy_from_user_proto;
+	case BPF_FUNC_copy_from_user_task:
+		return &bpf_copy_from_user_task_proto;
+	default:
+		return NULL;
+	}
+}
+
 static const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
+	const struct bpf_func_proto *func_proto;
+
+	func_proto = bpf_tracing_sleepable_func_proto(func_id);
+	if (func_proto)
+		return prog->aux->sleepable ? func_proto : NULL;
+
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
 		return &bpf_map_lookup_elem_proto;
@@ -1484,10 +1502,6 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_jiffies64_proto;
 	case BPF_FUNC_get_task_stack:
 		return &bpf_get_task_stack_proto;
-	case BPF_FUNC_copy_from_user:
-		return prog->aux->sleepable ? &bpf_copy_from_user_proto : NULL;
-	case BPF_FUNC_copy_from_user_task:
-		return prog->aux->sleepable ? &bpf_copy_from_user_task_proto : NULL;
 	case BPF_FUNC_snprintf_btf:
 		return &bpf_snprintf_btf_proto;
 	case BPF_FUNC_per_cpu_ptr:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-21 17:05 [PATCH bpf-next v7 0/4] bpf: Add bpf_rcu_read_lock() support Yonghong Song
  2022-11-21 17:05 ` [PATCH bpf-next v7 1/4] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
  2022-11-21 17:05 ` [PATCH bpf-next v7 2/4] bpf: Abstract out functions to check sleepable helpers Yonghong Song
@ 2022-11-21 17:05 ` Yonghong Song
  2022-11-21 19:41   ` Martin KaFai Lau
  2022-11-22  5:48   ` Alexei Starovoitov
  2022-11-21 17:05 ` [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
  3 siblings, 2 replies; 16+ messages in thread
From: Yonghong Song @ 2022-11-21 17:05 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add two kfunc's bpf_rcu_read_lock() and bpf_rcu_read_unlock(). These two kfunc's
can be used for all program types. The following is an example about how
rcu pointer are used w.r.t. bpf_rcu_read_lock()/bpf_rcu_read_unlock().

  struct task_struct {
    ...
    struct task_struct              *last_wakee;
    struct task_struct __rcu        *real_parent;
    ...
  };

Let us say prog does 'task = bpf_get_current_task_btf()' to get a
'task' pointer. The basic rules are:
  - 'real_parent = task->real_parent' should be inside bpf_rcu_read_lock
    region.  this is to simulate rcu_dereference() operation.
  - 'last_wakee = real_parent->last_wakee' should be iinside bpf_rcu_read_lock
    region since it tries to access rcu protected memory.
  - the ptr 'last_wakee' will be marked as PTR_UNTRUSTED since in general
    it is not clear whether the object pointed by 'last_wakee' is valid or
    not even inside bpf_rcu_read_lock region.

Note that for non-sleepable progs, it is permitted that rcu pointer load
and accessing rcu protected memory can be outside rcu read lock region.
But verification will be reported for sleepable progs.

To prevent rcu pointer leaks outside the rcu read lock region.
The verifier will clear all rcu pointer register state to unknown, i.e.,
scalar_value, so later dereference becomes impossible.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h          |   3 +
 include/linux/bpf_verifier.h |   2 +
 kernel/bpf/btf.c             |   3 +
 kernel/bpf/helpers.c         |  12 +++
 kernel/bpf/verifier.c        | 156 ++++++++++++++++++++++++++++++++---
 5 files changed, 165 insertions(+), 11 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c9eafa67f2a2..d0fbb7f0f13b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -572,6 +572,9 @@ enum bpf_type_flag {
 	 */
 	PTR_TRUSTED		= BIT(12 + BPF_BASE_TYPE_BITS),
 
+	/* MEM is tagged with rcu and memory access needs rcu_read_lock protection. */
+	MEM_RCU			= BIT(13 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 545152ac136c..748431578902 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -344,6 +344,7 @@ struct bpf_verifier_state {
 		u32 id;
 	} active_lock;
 	bool speculative;
+	bool active_rcu_lock;
 
 	/* first and last insn idx of this verifier state */
 	u32 first_insn_idx;
@@ -445,6 +446,7 @@ struct bpf_insn_aux_data {
 	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
 	bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */
 	bool zext_dst; /* this insn zero extends dst reg */
+	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
 	u8 alu_state; /* used in combination with alu_limit */
 
 	/* below fields are initialized once */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 1a59cc7ad730..68df0df27302 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6237,6 +6237,9 @@ static int btf_struct_walk(struct bpf_verifier_log *log, const struct btf *btf,
 				/* check __percpu tag */
 				if (strcmp(tag_value, "percpu") == 0)
 					tmp_flag = MEM_PERCPU;
+				/* check __rcu tag */
+				if (strcmp(tag_value, "rcu") == 0)
+					tmp_flag = MEM_RCU;
 			}
 
 			stype = btf_type_skip_modifiers(btf, mtype->type, &id);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9e8a5557ea8d..3268be36a319 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1889,6 +1889,16 @@ void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k)
 	return obj__ign;
 }
 
+void bpf_rcu_read_lock(void)
+{
+	rcu_read_lock();
+}
+
+void bpf_rcu_read_unlock(void)
+{
+	rcu_read_unlock();
+}
+
 __diag_pop();
 
 BTF_SET8_START(generic_btf_ids)
@@ -1919,6 +1929,8 @@ BTF_ID(func, bpf_task_release)
 BTF_SET8_START(common_btf_ids)
 BTF_ID_FLAGS(func, bpf_cast_to_kern_ctx)
 BTF_ID_FLAGS(func, bpf_rdonly_cast)
+BTF_ID_FLAGS(func, bpf_rcu_read_lock)
+BTF_ID_FLAGS(func, bpf_rcu_read_unlock)
 BTF_SET8_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9528a066cfa5..d39ab0e969dd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -23,6 +23,7 @@
 #include <linux/error-injection.h>
 #include <linux/bpf_lsm.h>
 #include <linux/btf_ids.h>
+#include <linux/trace_events.h>
 #include <linux/poison.h>
 
 #include "disasm.h"
@@ -527,6 +528,14 @@ static bool is_callback_calling_function(enum bpf_func_id func_id)
 	       func_id == BPF_FUNC_user_ringbuf_drain;
 }
 
+static bool is_storage_get_function(enum bpf_func_id func_id)
+{
+	return func_id == BPF_FUNC_sk_storage_get ||
+	       func_id == BPF_FUNC_inode_storage_get ||
+	       func_id == BPF_FUNC_task_storage_get ||
+	       func_id == BPF_FUNC_cgrp_storage_get;
+}
+
 static bool helper_multiple_ref_obj_use(enum bpf_func_id func_id,
 					const struct bpf_map *map)
 {
@@ -589,11 +598,12 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
 			strncpy(postfix, "_or_null", 16);
 	}
 
-	snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s",
+	snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s%s",
 		 type & MEM_RDONLY ? "rdonly_" : "",
 		 type & MEM_RINGBUF ? "ringbuf_" : "",
 		 type & MEM_USER ? "user_" : "",
 		 type & MEM_PERCPU ? "percpu_" : "",
+		 type & MEM_RCU ? "rcu_" : "",
 		 type & PTR_UNTRUSTED ? "untrusted_" : "",
 		 type & PTR_TRUSTED ? "trusted_" : ""
 	);
@@ -1220,6 +1230,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
 		dst_state->frame[i] = NULL;
 	}
 	dst_state->speculative = src->speculative;
+	dst_state->active_rcu_lock = src->active_rcu_lock;
 	dst_state->curframe = src->curframe;
 	dst_state->active_lock.ptr = src->active_lock.ptr;
 	dst_state->active_lock.id = src->active_lock.id;
@@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		return -EACCES;
 	}
 
+	/* Access rcu protected memory */
+	if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
+	    !env->cur_state->active_rcu_lock) {
+		verbose(env,
+			"R%d is ptr_%s access rcu-protected memory with off=%d, not rcu protected\n",
+			regno, tname, off);
+		return -EACCES;
+	}
+
 	if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
 		if (!btf_is_kernel(reg->btf)) {
 			verbose(env, "verifier internal error: reg->btf must be kernel btf\n");
@@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	if (ret < 0)
 		return ret;
 
+	/* The value is a rcu pointer. The load needs to be in a rcu lock region,
+	 * similar to rcu_dereference().
+	 */
+	if ((flag & MEM_RCU) && env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) {
+		verbose(env,
+			"R%d is rcu dereference ptr_%s with off=%d, not in rcu_read_lock region\n",
+			regno, tname, off);
+		return -EACCES;
+	}
+
 	/* If this is an untrusted pointer, all pointers formed by walking it
 	 * also inherit the untrusted flag.
 	 */
 	if (type_flag(reg->type) & PTR_UNTRUSTED)
 		flag |= PTR_UNTRUSTED;
 
+	/* Mark the pointee of the rcu protected memory access as PTR_UNTRUSTED */
+	if (env->cur_state->active_rcu_lock && !(flag & PTR_UNTRUSTED) &&
+	    (reg->type & MEM_RCU) && !(flag & MEM_RCU))
+		flag |= PTR_UNTRUSTED;
+
 	/* Any pointer obtained from walking a trusted pointer is no longer trusted. */
 	flag &= ~PTR_TRUSTED;
 
@@ -5896,6 +5931,7 @@ static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_T
 static const struct bpf_reg_types btf_ptr_types = {
 	.types = {
 		PTR_TO_BTF_ID,
+		PTR_TO_BTF_ID | MEM_RCU,
 		PTR_TO_BTF_ID | PTR_TRUSTED,
 	},
 };
@@ -5976,6 +6012,20 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	if (arg_type & PTR_MAYBE_NULL)
 		type &= ~PTR_MAYBE_NULL;
 
+	/* If the reg type is marked as MEM_RCU, ensure the usage is in the rcu_read_lock
+	 * region, and remove MEM_RCU from the type since the arg_type won't encode
+	 * MEM_RCU.
+	 */
+	if (type & MEM_RCU) {
+		if (env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) {
+			verbose(env,
+				"R%d is arg type %s needs rcu protection\n",
+				regno, reg_type_str(env, reg->type));
+			return -EACCES;
+		}
+		type &= ~MEM_RCU;
+	}
+
 	for (i = 0; i < ARRAY_SIZE(compatible->types); i++) {
 		expected = compatible->types[i];
 		if (expected == NOT_INIT)
@@ -5992,7 +6042,8 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 	return -EACCES;
 
 found:
-	if (reg->type == PTR_TO_BTF_ID || reg->type & PTR_TRUSTED) {
+	/* reg is already protected by rcu_read_lock(). Peel off MEM_RCU from reg->type. */
+	if ((reg->type & ~MEM_RCU) == PTR_TO_BTF_ID || reg->type & PTR_TRUSTED) {
 		/* For bpf_sk_release, it needs to match against first member
 		 * 'struct sock_common', hence make an exception for it. This
 		 * allows bpf_sk_release to work for multiple socket types.
@@ -6073,6 +6124,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID | MEM_RCU:
 	case PTR_TO_BTF_ID | MEM_ALLOC:
 	case PTR_TO_BTF_ID | PTR_TRUSTED:
 	case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED:
@@ -7534,6 +7586,18 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		return err;
 	}
 
+	if (env->cur_state->active_rcu_lock) {
+		if (bpf_lsm_sleepable_func_proto(func_id) ||
+		    bpf_tracing_sleepable_func_proto(func_id)) {
+			verbose(env, "sleepable helper %s#%din rcu_read_lock region\n",
+				func_id_name(func_id), func_id);
+			return -EINVAL;
+		}
+
+		if (env->prog->aux->sleepable && is_storage_get_function(func_id))
+			env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
+	}
+
 	meta.func_id = func_id;
 	/* check args */
 	for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) {
@@ -8158,6 +8222,8 @@ enum special_kfunc_type {
 	KF_bpf_list_pop_back,
 	KF_bpf_cast_to_kern_ctx,
 	KF_bpf_rdonly_cast,
+	KF_bpf_rcu_read_lock,
+	KF_bpf_rcu_read_unlock,
 };
 
 BTF_SET_START(special_kfunc_set)
@@ -8180,6 +8246,18 @@ BTF_ID(func, bpf_list_pop_front)
 BTF_ID(func, bpf_list_pop_back)
 BTF_ID(func, bpf_cast_to_kern_ctx)
 BTF_ID(func, bpf_rdonly_cast)
+BTF_ID(func, bpf_rcu_read_lock)
+BTF_ID(func, bpf_rcu_read_unlock)
+
+static bool is_kfunc_bpf_rcu_read_lock(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_lock];
+}
+
+static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock];
+}
 
 static enum kfunc_ptr_arg_type
 get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
@@ -8812,6 +8890,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	const struct btf_type *t, *func, *func_proto, *ptr_type;
 	struct bpf_reg_state *regs = cur_regs(env);
 	const char *func_name, *ptr_type_name;
+	bool sleepable, rcu_lock, rcu_unlock;
 	struct bpf_kfunc_call_arg_meta meta;
 	u32 i, nargs, func_id, ptr_type_id;
 	int err, insn_idx = *insn_idx_p;
@@ -8853,11 +8932,38 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		return -EACCES;
 	}
 
-	if (is_kfunc_sleepable(&meta) && !env->prog->aux->sleepable) {
+	sleepable = is_kfunc_sleepable(&meta);
+	if (sleepable && !env->prog->aux->sleepable) {
 		verbose(env, "program must be sleepable to call sleepable kfunc %s\n", func_name);
 		return -EACCES;
 	}
 
+	rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta);
+	rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta);
+	if (env->cur_state->active_rcu_lock) {
+		struct bpf_func_state *state;
+		struct bpf_reg_state *reg;
+
+		if (rcu_lock) {
+			verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
+			return -EINVAL;
+		} else if (rcu_unlock) {
+			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
+				if (reg->type & MEM_RCU)
+					__mark_reg_unknown(env, reg);
+			}));
+			env->cur_state->active_rcu_lock = false;
+		} else if (sleepable) {
+			verbose(env, "kernel func %s is sleepable within rcu_read_lock region\n", func_name);
+			return -EACCES;
+		}
+	} else if (rcu_lock) {
+		env->cur_state->active_rcu_lock = true;
+	} else if (rcu_unlock) {
+		verbose(env, "unmatched rcu read unlock (kernel function %s)\n", func_name);
+		return -EINVAL;
+	}
+
 	/* Check the arguments */
 	err = check_kfunc_args(env, &meta);
 	if (err < 0)
@@ -11749,6 +11855,11 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		return -EINVAL;
 	}
 
+	if (env->prog->aux->sleepable && env->cur_state->active_rcu_lock) {
+		verbose(env, "BPF_LD_[ABS|IND] cannot be used inside bpf_rcu_read_lock-ed region\n");
+		return -EINVAL;
+	}
+
 	if (regs[ctx_reg].type != PTR_TO_CTX) {
 		verbose(env,
 			"at the time of BPF_LD_ABS|IND R6 != pointer to skb\n");
@@ -13014,6 +13125,9 @@ static bool states_equal(struct bpf_verifier_env *env,
 	    old->active_lock.id != cur->active_lock.id)
 		return false;
 
+	if (old->active_rcu_lock != cur->active_rcu_lock)
+		return false;
+
 	/* for states to be equal callsites have to be the same
 	 * and all frame states need to be equivalent
 	 */
@@ -13427,6 +13541,11 @@ static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
 			       !reg_type_mismatch_ok(prev));
 }
 
+static bool reg_type_mismatch_ignore_rcu(enum bpf_reg_type src, enum bpf_reg_type prev)
+{
+	return reg_type_mismatch(src & ~MEM_RCU, prev & ~MEM_RCU);
+}
+
 static int do_check(struct bpf_verifier_env *env)
 {
 	bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
@@ -13552,6 +13671,17 @@ static int do_check(struct bpf_verifier_env *env)
 
 			prev_src_type = &env->insn_aux_data[env->insn_idx].ptr_type;
 
+			/* For code like below,
+			 *   struct foo *f;
+			 *   if (...)
+			 *     f = ...; // f with MEM_RCU type tag.
+			 *   else
+			 *     f = ...; // f without MEM_RCU type tag.
+			 *   ... f ...  // Here f could be with/without MEM_RCU
+			 *
+			 * It is safe to ignore MEM_RCU type tag here since
+			 * base types are the same.
+			 */
 			if (*prev_src_type == NOT_INIT) {
 				/* saw a valid insn
 				 * dst_reg = *(u32 *)(src_reg + off)
@@ -13559,7 +13689,7 @@ static int do_check(struct bpf_verifier_env *env)
 				 */
 				*prev_src_type = src_reg_type;
 
-			} else if (reg_type_mismatch(src_reg_type, *prev_src_type)) {
+			} else if (reg_type_mismatch_ignore_rcu(src_reg_type, *prev_src_type)) {
 				/* ABuser program is trying to use the same insn
 				 * dst_reg = *(u32*) (src_reg + off)
 				 * with different pointer types:
@@ -13701,6 +13831,11 @@ static int do_check(struct bpf_verifier_env *env)
 					return -EINVAL;
 				}
 
+				if (env->cur_state->active_rcu_lock) {
+					verbose(env, "bpf_rcu_read_unlock is missing\n");
+					return -EINVAL;
+				}
+
 				/* We must do check_reference_leak here before
 				 * prepare_func_exit to handle the case when
 				 * state->curframe > 0, it may be a callback
@@ -14795,6 +14930,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
 			break;
 		case PTR_TO_BTF_ID:
+		case PTR_TO_BTF_ID | MEM_RCU:
 		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
 		case PTR_TO_BTF_ID | PTR_TRUSTED:
 		/* PTR_TO_BTF_ID | MEM_ALLOC always has a valid lifetime, unlike
@@ -15489,14 +15625,12 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 			goto patch_call_imm;
 		}
 
-		if (insn->imm == BPF_FUNC_task_storage_get ||
-		    insn->imm == BPF_FUNC_sk_storage_get ||
-		    insn->imm == BPF_FUNC_inode_storage_get ||
-		    insn->imm == BPF_FUNC_cgrp_storage_get) {
-			if (env->prog->aux->sleepable)
-				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
-			else
+		if (is_storage_get_function(insn->imm)) {
+			if (!env->prog->aux->sleepable ||
+			    env->insn_aux_data[i + delta].storage_get_func_atomic)
 				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
+			else
+				insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
 			insn_buf[1] = *insn;
 			cnt = 2;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock()
  2022-11-21 17:05 [PATCH bpf-next v7 0/4] bpf: Add bpf_rcu_read_lock() support Yonghong Song
                   ` (2 preceding siblings ...)
  2022-11-21 17:05 ` [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock() Yonghong Song
@ 2022-11-21 17:05 ` Yonghong Song
  2022-11-22  1:59   ` John Fastabend
  3 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2022-11-21 17:05 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Add a few positive/negative tests to test bpf_rcu_read_lock()
and its corresponding verifier support. The new test will fail
on s390x and aarch64, so an entry is added to each of their
respective deny lists.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/DENYLIST.aarch64  |   1 +
 tools/testing/selftests/bpf/DENYLIST.s390x    |   1 +
 .../selftests/bpf/prog_tests/rcu_read_lock.c  | 153 +++++++++
 .../selftests/bpf/progs/rcu_read_lock.c       | 325 ++++++++++++++++++
 4 files changed, 480 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
 create mode 100644 tools/testing/selftests/bpf/progs/rcu_read_lock.c

diff --git a/tools/testing/selftests/bpf/DENYLIST.aarch64 b/tools/testing/selftests/bpf/DENYLIST.aarch64
index affc5aebbf0f..8e77515d56f6 100644
--- a/tools/testing/selftests/bpf/DENYLIST.aarch64
+++ b/tools/testing/selftests/bpf/DENYLIST.aarch64
@@ -45,6 +45,7 @@ modify_return                                    # modify_return__attach failed
 module_attach                                    # skel_attach skeleton attach failed: -524
 mptcp/base                                       # run_test mptcp unexpected error: -524 (errno 524)
 netcnt                                           # packets unexpected packets: actual 10001 != expected 10000
+rcu_read_lock                                    # failed to attach: ERROR: strerror_r(-524)=22
 recursion                                        # skel_attach unexpected error: -524 (errno 524)
 ringbuf                                          # skel_attach skeleton attachment failed: -1
 setget_sockopt                                   # attach_cgroup unexpected error: -524
diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index 12cf2159975e..77aabc4dc64c 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -42,6 +42,7 @@ module_attach                            # skel_attach skeleton attach failed: -
 mptcp
 netcnt                                   # failed to load BPF skeleton 'netcnt_prog': -7                               (?)
 probe_user                               # check_kprobe_res wrong kprobe res from probe read                           (?)
+rcu_read_lock                            # failed to find kernel BTF type ID of '__x64_sys_getpgid': -3                (?)
 recursion                                # skel_attach unexpected error: -524                                          (trampoline)
 ringbuf                                  # skel_load skeleton load failed                                              (?)
 select_reuseport                         # intermittently fails on new s390x setup
diff --git a/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
new file mode 100644
index 000000000000..58d5fe3da223
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates.*/
+
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "rcu_read_lock.skel.h"
+#include "cgroup_helpers.h"
+
+static unsigned long long cgroup_id;
+
+static void test_local_storage(void)
+{
+	struct rcu_read_lock *skel;
+	int err;
+
+	skel = rcu_read_lock__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	skel->bss->target_pid = syscall(SYS_gettid);
+
+	bpf_program__set_autoload(skel->progs.get_cgroup_id, true);
+	bpf_program__set_autoload(skel->progs.task_succ, true);
+	bpf_program__set_autoload(skel->progs.two_regions, true);
+	bpf_program__set_autoload(skel->progs.non_sleepable_1, true);
+	bpf_program__set_autoload(skel->progs.non_sleepable_2, true);
+	err = rcu_read_lock__load(skel);
+	if (!ASSERT_OK(err, "skel_load"))
+		goto done;
+
+	err = rcu_read_lock__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto done;
+
+	syscall(SYS_getpgid);
+
+	ASSERT_EQ(skel->bss->task_storage_val, 2, "task_storage_val");
+	ASSERT_EQ(skel->bss->cgroup_id, cgroup_id, "cgroup_id");
+done:
+	rcu_read_lock__destroy(skel);
+}
+
+static void test_runtime_diff_rcu_tag(void)
+{
+	struct rcu_read_lock *skel;
+	int err;
+
+	skel = rcu_read_lock__open();
+	if (!ASSERT_OK_PTR(skel, "skel_open"))
+		return;
+
+	bpf_program__set_autoload(skel->progs.dump_ipv6_route, true);
+	err = rcu_read_lock__load(skel);
+	ASSERT_OK(err, "skel_load");
+	rcu_read_lock__destroy(skel);
+}
+
+static const char * const inproper_region_tests[] = {
+	"miss_lock",
+	"miss_unlock",
+	"non_sleepable_rcu_mismatch",
+	"inproper_sleepable_helper",
+	"inproper_sleepable_kfunc",
+	"nested_rcu_region",
+};
+
+static void test_inproper_region(void)
+{
+	struct rcu_read_lock *skel;
+	struct bpf_program *prog;
+	int i, err;
+
+	for (i = 0; i < ARRAY_SIZE(inproper_region_tests); i++) {
+		skel = rcu_read_lock__open();
+		if (!ASSERT_OK_PTR(skel, "skel_open"))
+			return;
+
+		prog = bpf_object__find_program_by_name(skel->obj, inproper_region_tests[i]);
+		if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+			goto out;
+		bpf_program__set_autoload(prog, true);
+		err = rcu_read_lock__load(skel);
+		ASSERT_ERR(err, "skel_load");
+out:
+		rcu_read_lock__destroy(skel);
+	}
+}
+
+static const char * const rcuptr_misuse_tests[] = {
+	"task_untrusted_ptr",
+	"task_incorrect_rcu_region1",
+	"task_incorrect_rcu_region2",
+	"cross_rcu_region",
+};
+
+static void test_rcuptr_misuse(void)
+{
+	struct rcu_read_lock *skel;
+	struct bpf_program *prog;
+	struct btf *vmlinux_btf;
+	int i, err, type_id;
+
+	vmlinux_btf = btf__load_vmlinux_btf();
+	if (!ASSERT_OK_PTR(vmlinux_btf, "could not load vmlinux BTF"))
+		return;
+
+	/* skip the test if btf_type_tag("rcu") is not present in vmlinux */
+	type_id = btf__find_by_name_kind(vmlinux_btf, "rcu", BTF_KIND_TYPE_TAG);
+	if (type_id < 0) {
+		test__skip();
+		return;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(rcuptr_misuse_tests); i++) {
+		skel = rcu_read_lock__open();
+		if (!ASSERT_OK_PTR(skel, "skel_open"))
+			return;
+
+		prog = bpf_object__find_program_by_name(skel->obj, rcuptr_misuse_tests[i]);
+		if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+			goto out;
+		bpf_program__set_autoload(prog, true);
+		err = rcu_read_lock__load(skel);
+		ASSERT_ERR(err, "skel_load");
+out:
+		rcu_read_lock__destroy(skel);
+	}
+}
+
+void test_rcu_read_lock(void)
+{
+	int cgroup_fd;
+
+	cgroup_fd = test__join_cgroup("/rcu_read_lock");
+	if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /rcu_read_lock"))
+		return;
+
+	cgroup_id = get_cgroup_id("/rcu_read_lock");
+	if (test__start_subtest("local_storage"))
+		test_local_storage();
+	if (test__start_subtest("runtime_diff_rcu_tag"))
+		test_runtime_diff_rcu_tag();
+	if (test__start_subtest("negative_tests_inproper_region"))
+		test_inproper_region();
+	if (test__start_subtest("negative_tests_rcuptr_misuse"))
+		test_rcuptr_misuse();
+
+	close(cgroup_fd);
+}
diff --git a/tools/testing/selftests/bpf/progs/rcu_read_lock.c b/tools/testing/selftests/bpf/progs/rcu_read_lock.c
new file mode 100644
index 000000000000..add30d1becb1
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/rcu_read_lock.c
@@ -0,0 +1,325 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_tracing_net.h"
+#include "bpf_misc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, long);
+} map_a SEC(".maps");
+
+__u32 user_data, key_serial, target_pid = 0;
+__u64 flags, task_storage_val, cgroup_id;
+
+struct bpf_key *bpf_lookup_user_key(__u32 serial, __u64 flags) __ksym;
+void bpf_key_put(struct bpf_key *key) __ksym;
+void bpf_rcu_read_lock(void) __ksym;
+void bpf_rcu_read_unlock(void) __ksym;
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int get_cgroup_id(void *ctx)
+{
+	struct task_struct *task;
+
+	task = bpf_get_current_task_btf();
+	if (task->pid != target_pid)
+		return 0;
+
+	/* simulate bpf_get_current_cgroup_id() helper */
+	bpf_rcu_read_lock();
+	cgroup_id = task->cgroups->dfl_cgrp->kn->id;
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int task_succ(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+	long init_val = 2;
+	long *ptr;
+
+	task = bpf_get_current_task_btf();
+	if (task->pid != target_pid)
+		return 0;
+
+	bpf_rcu_read_lock();
+	/* region including helper using rcu ptr real_parent */
+	real_parent = task->real_parent;
+	ptr = bpf_task_storage_get(&map_a, real_parent, &init_val,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!ptr)
+		goto out;
+	ptr = bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	if (!ptr)
+		goto out;
+	task_storage_val = *ptr;
+out:
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
+int two_regions(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	/* two regions */
+	task = bpf_get_current_task_btf();
+	bpf_rcu_read_lock();
+	bpf_rcu_read_unlock();
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int non_sleepable_1(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int non_sleepable_2(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	bpf_rcu_read_lock();
+	task = bpf_get_current_task_btf();
+	bpf_rcu_read_unlock();
+
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?iter.s/ipv6_route")
+int dump_ipv6_route(struct bpf_iter__ipv6_route *ctx)
+{
+	struct seq_file *seq = ctx->meta->seq;
+	struct fib6_info *rt = ctx->rt;
+	const struct net_device *dev;
+	struct fib6_nh *fib6_nh;
+	unsigned int flags;
+	struct nexthop *nh;
+
+	if (rt == (void *)0)
+		return 0;
+
+	/* fib6_nh is not a rcu ptr */
+	fib6_nh = &rt->fib6_nh[0];
+	flags = rt->fib6_flags;
+
+	nh = rt->nh;
+	bpf_rcu_read_lock();
+	if (rt->nh)
+		/* fib6_nh is a rcu ptr */
+		fib6_nh = &nh->nh_info->fib6_nh;
+
+	/* fib6_nh could be a rcu or non-rcu ptr */
+	if (fib6_nh->fib_nh_gw_family) {
+		flags |= RTF_GATEWAY;
+		BPF_SEQ_PRINTF(seq, "%pi6 ", &fib6_nh->fib_nh_gw6);
+	} else {
+		BPF_SEQ_PRINTF(seq, "00000000000000000000000000000000 ");
+	}
+
+	dev = fib6_nh->fib_nh_dev;
+	bpf_rcu_read_unlock();
+	if (dev)
+		BPF_SEQ_PRINTF(seq, "%08x %08x %08x %08x %8s\n", rt->fib6_metric,
+			       rt->fib6_ref.refs.counter, 0, flags, dev->name);
+	else
+		BPF_SEQ_PRINTF(seq, "%08x %08x %08x %08x\n", rt->fib6_metric,
+			       rt->fib6_ref.refs.counter, 0, flags);
+
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int miss_lock(void *ctx)
+{
+	struct task_struct *task;
+	struct css_set *cgroups;
+	struct cgroup *dfl_cgrp;
+
+	/* missing bpf_rcu_read_lock() */
+	task = bpf_get_current_task_btf();
+	bpf_rcu_read_lock();
+	(void)bpf_task_storage_get(&map_a, task, 0, 0);
+	bpf_rcu_read_unlock();
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int miss_unlock(void *ctx)
+{
+	struct task_struct *task;
+	struct css_set *cgroups;
+	struct cgroup *dfl_cgrp;
+
+	/* missing bpf_rcu_read_unlock() */
+	task = bpf_get_current_task_btf();
+	bpf_rcu_read_lock();
+	(void)bpf_task_storage_get(&map_a, task, 0, 0);
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_getpgid")
+int non_sleepable_rcu_mismatch(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	task = bpf_get_current_task_btf();
+
+	/* non-sleepable: missing bpf_rcu_read_unlock() in one path */
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	if (real_parent)
+		bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int inproper_sleepable_helper(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+	struct pt_regs *regs;
+	__u32 value = 0;
+	void *ptr;
+
+	task = bpf_get_current_task_btf();
+
+	/* sleepable helper in rcu read lock region */
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	regs = (struct pt_regs *)bpf_task_pt_regs(real_parent);
+	if (!regs) {
+		bpf_rcu_read_unlock();
+		return 0;
+	}
+
+	ptr = (void *)PT_REGS_IP(regs);
+	(void)bpf_copy_from_user_task(&value, sizeof(uint32_t), ptr, task, 0);
+	user_data = value;
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?lsm.s/bpf")
+int BPF_PROG(inproper_sleepable_kfunc, int cmd, union bpf_attr *attr, unsigned int size)
+{
+	struct bpf_key *bkey;
+
+	/* sleepable kfunc in rcu read lock region */
+	bpf_rcu_read_lock();
+	bkey = bpf_lookup_user_key(key_serial, flags);
+	bpf_rcu_read_unlock();
+	if (!bkey)
+		return -1;
+	bpf_key_put(bkey);
+
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
+int nested_rcu_region(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	/* nested rcu read lock regions */
+	task = bpf_get_current_task_btf();
+	bpf_rcu_read_lock();
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	bpf_rcu_read_unlock();
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int task_untrusted_ptr(void *ctx)
+{
+	struct task_struct *task, *last_wakee;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rcu_read_lock();
+	/* the pointer last_wakee marked as untrusted */
+	last_wakee = task->real_parent->last_wakee;
+	(void)bpf_task_storage_get(&map_a, last_wakee, 0, 0);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int task_incorrect_rcu_region1(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	task = bpf_get_current_task_btf();
+
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	bpf_rcu_read_unlock();
+	/* helper use of rcu ptr outside the rcu read lock region */
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
+int task_incorrect_rcu_region2(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	task = bpf_get_current_task_btf();
+
+	/* missing bpf_rcu_read_unlock() in one path */
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	if (real_parent)
+		bpf_rcu_read_unlock();
+	return 0;
+}
+
+SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
+int cross_rcu_region(void *ctx)
+{
+	struct task_struct *task, *real_parent;
+
+	/* rcu ptr define/use in different regions */
+	task = bpf_get_current_task_btf();
+	bpf_rcu_read_lock();
+	real_parent = task->real_parent;
+	bpf_rcu_read_unlock();
+	bpf_rcu_read_lock();
+	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
+	bpf_rcu_read_unlock();
+	return 0;
+}
+
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-21 17:05 ` [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock() Yonghong Song
@ 2022-11-21 19:41   ` Martin KaFai Lau
  2022-11-21 20:01     ` Yonghong Song
  2022-11-22  5:48   ` Alexei Starovoitov
  1 sibling, 1 reply; 16+ messages in thread
From: Martin KaFai Lau @ 2022-11-21 19:41 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, bpf

On 11/21/22 9:05 AM, Yonghong Song wrote:
> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>   		return -EACCES;
>   	}
>   
> +	/* Access rcu protected memory */
> +	if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
> +	    !env->cur_state->active_rcu_lock) {
> +		verbose(env,
> +			"R%d is ptr_%s access rcu-protected memory with off=%d, not rcu protected\n",
> +			regno, tname, off);
> +		return -EACCES;
> +	}
> +
>   	if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
>   		if (!btf_is_kernel(reg->btf)) {
>   			verbose(env, "verifier internal error: reg->btf must be kernel btf\n");
> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>   	if (ret < 0)
>   		return ret;
>   
> +	/* The value is a rcu pointer. The load needs to be in a rcu lock region,
> +	 * similar to rcu_dereference().
> +	 */
> +	if ((flag & MEM_RCU) && env->prog->aux->sleepable && !env->cur_state->active_rcu_lock) {
> +		verbose(env,
> +			"R%d is rcu dereference ptr_%s with off=%d, not in rcu_read_lock region\n",
> +			regno, tname, off);
> +		return -EACCES;
> +	}

Would this make the existing rdonly use case fail?

SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
int task_real_parent(void *ctx)
{
	struct task_struct *task, *real_parent;

	task = bpf_get_current_task_btf();
         real_parent = task->real_parent;
         bpf_printk("pid %u\n", real_parent->pid);
         return 0;
}



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-21 19:41   ` Martin KaFai Lau
@ 2022-11-21 20:01     ` Yonghong Song
  2022-11-21 22:56       ` Martin KaFai Lau
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2022-11-21 20:01 UTC (permalink / raw)
  To: Martin KaFai Lau, Yonghong Song
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, bpf



On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
> On 11/21/22 9:05 AM, Yonghong Song wrote:
>> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct 
>> bpf_verifier_env *env,
>>           return -EACCES;
>>       }
>> +    /* Access rcu protected memory */
>> +    if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
>> +        !env->cur_state->active_rcu_lock) {
>> +        verbose(env,
>> +            "R%d is ptr_%s access rcu-protected memory with off=%d, 
>> not rcu protected\n",
>> +            regno, tname, off);
>> +        return -EACCES;
>> +    }
>> +
>>       if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
>>           if (!btf_is_kernel(reg->btf)) {
>>               verbose(env, "verifier internal error: reg->btf must be 
>> kernel btf\n");
>> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct 
>> bpf_verifier_env *env,
>>       if (ret < 0)
>>           return ret;
>> +    /* The value is a rcu pointer. The load needs to be in a rcu lock 
>> region,
>> +     * similar to rcu_dereference().
>> +     */
>> +    if ((flag & MEM_RCU) && env->prog->aux->sleepable && 
>> !env->cur_state->active_rcu_lock) {
>> +        verbose(env,
>> +            "R%d is rcu dereference ptr_%s with off=%d, not in 
>> rcu_read_lock region\n",
>> +            regno, tname, off);
>> +        return -EACCES;
>> +    }
> 
> Would this make the existing rdonly use case fail?
> 
> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
> int task_real_parent(void *ctx)
> {
>      struct task_struct *task, *real_parent;
> 
>      task = bpf_get_current_task_btf();
>          real_parent = task->real_parent;
>          bpf_printk("pid %u\n", real_parent->pid);
>          return 0;
> }

Right, it will fail. To fix the issue, user can do
    bpf_rcu_read_lock();
    real_parent = task->real_parent;
    bpf_printk("pid %u\n", real_parent->pid);
    bpf_rcu_read_unlock();

But this raised a good question. How do we deal with
legacy sleepable programs with newly-added rcu tagging
capabilities.

My current option is to error out if rcu usage is not right.
But this might break existing sleepable programs.

Another option intends to not break existing, like above,
codes. In this case, MEM_RCU will not tagged if it is
not inside bpf_rcu_read_lock() region. In this case,
the above non-rcu-protected code should work. And the
following should work as well although it is a little
bit awkward.
    real_parent = task->real_parent; // real_parent not tagged with rcu
    bpf_rcu_read_lock();
    bpf_printk("pid %u\n", real_parent->pid);
    bpf_rcu_read_unlock();

Maybe we should take the second choice in the above instead?

> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-21 20:01     ` Yonghong Song
@ 2022-11-21 22:56       ` Martin KaFai Lau
  2022-11-21 23:42         ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Martin KaFai Lau @ 2022-11-21 22:56 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, bpf

On 11/21/22 12:01 PM, Yonghong Song wrote:
> 
> 
> On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
>> On 11/21/22 9:05 AM, Yonghong Song wrote:
>>> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct 
>>> bpf_verifier_env *env,
>>>           return -EACCES;
>>>       }
>>> +    /* Access rcu protected memory */
>>> +    if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
>>> +        !env->cur_state->active_rcu_lock) {
>>> +        verbose(env,
>>> +            "R%d is ptr_%s access rcu-protected memory with off=%d, not rcu 
>>> protected\n",
>>> +            regno, tname, off);
>>> +        return -EACCES;
>>> +    }
>>> +
>>>       if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
>>>           if (!btf_is_kernel(reg->btf)) {
>>>               verbose(env, "verifier internal error: reg->btf must be kernel 
>>> btf\n");
>>> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct 
>>> bpf_verifier_env *env,
>>>       if (ret < 0)
>>>           return ret;
>>> +    /* The value is a rcu pointer. The load needs to be in a rcu lock region,
>>> +     * similar to rcu_dereference().
>>> +     */
>>> +    if ((flag & MEM_RCU) && env->prog->aux->sleepable && 
>>> !env->cur_state->active_rcu_lock) {
>>> +        verbose(env,
>>> +            "R%d is rcu dereference ptr_%s with off=%d, not in rcu_read_lock 
>>> region\n",
>>> +            regno, tname, off);
>>> +        return -EACCES;
>>> +    }
>>
>> Would this make the existing rdonly use case fail?
>>
>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>> int task_real_parent(void *ctx)
>> {
>>      struct task_struct *task, *real_parent;
>>
>>      task = bpf_get_current_task_btf();
>>          real_parent = task->real_parent;
>>          bpf_printk("pid %u\n", real_parent->pid);
>>          return 0;
>> }
> 
> Right, it will fail. To fix the issue, user can do
>     bpf_rcu_read_lock();
>     real_parent = task->real_parent;
>     bpf_printk("pid %u\n", real_parent->pid);
>     bpf_rcu_read_unlock();
> 
> But this raised a good question. How do we deal with
> legacy sleepable programs with newly-added rcu tagging
> capabilities.
> 
> My current option is to error out if rcu usage is not right.
> But this might break existing sleepable programs.
> 
> Another option intends to not break existing, like above,
> codes. In this case, MEM_RCU will not tagged if it is
> not inside bpf_rcu_read_lock() region.

hmm.... it is to make MEM_RCU to mean a reg is protected by the current 
active_rcu_lock or not?

> In this case, the above non-rcu-protected code should work. And the
> following should work as well although it is a little
> bit awkward.
>     real_parent = task->real_parent; // real_parent not tagged with rcu
>     bpf_rcu_read_lock();
>     bpf_printk("pid %u\n", real_parent->pid);
>     bpf_rcu_read_unlock();

I think it should be fine.  bpf_rcu_read_lock() just not useful in this example 
but nothing break or crash.  Also, after bpf_rcu_read_unlock(), real_parent will 
continue to be readable because the MEM_RCU is not set?

On top of the active_rcu_lock, should MEM_RCU be set only when it is 
dereferenced from a PTR_TRUSTED ptr (or with ref_obj_id != 0)?
I am thinking about the following more common case:

	/* bpf_get_current_task_btf() may need to be changed
	 * to set PTR_TRUSTED at the retval?
	 */
	/* task: PTR_TO_BTF_ID | PTR_TRUSTED */
	task = bpf_get_current_task_btf();

	bpf_rcu_read_lock();

	/* real_parent: PTR_TO_BTF_ID | PTR_TRUSTED | MEM_RCU */
	real_parent = task->real_parent;

         /* bpf_task_acquire() needs to change to use refcount_inc_not_zero */
	real_parent = bpf_task_acquire(real_parent);

	bpf_rcu_read_unlock();

	/* real_parent is accessible here (after checking NULL) and
	 * can be passed to kfunc
	 */



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-21 22:56       ` Martin KaFai Lau
@ 2022-11-21 23:42         ` Yonghong Song
  2022-11-22  2:03           ` John Fastabend
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2022-11-21 23:42 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, bpf



On 11/21/22 2:56 PM, Martin KaFai Lau wrote:
> On 11/21/22 12:01 PM, Yonghong Song wrote:
>>
>>
>> On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
>>> On 11/21/22 9:05 AM, Yonghong Song wrote:
>>>> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct 
>>>> bpf_verifier_env *env,
>>>>           return -EACCES;
>>>>       }
>>>> +    /* Access rcu protected memory */
>>>> +    if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
>>>> +        !env->cur_state->active_rcu_lock) {
>>>> +        verbose(env,
>>>> +            "R%d is ptr_%s access rcu-protected memory with off=%d, 
>>>> not rcu protected\n",
>>>> +            regno, tname, off);
>>>> +        return -EACCES;
>>>> +    }
>>>> +
>>>>       if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
>>>>           if (!btf_is_kernel(reg->btf)) {
>>>>               verbose(env, "verifier internal error: reg->btf must 
>>>> be kernel btf\n");
>>>> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct 
>>>> bpf_verifier_env *env,
>>>>       if (ret < 0)
>>>>           return ret;
>>>> +    /* The value is a rcu pointer. The load needs to be in a rcu 
>>>> lock region,
>>>> +     * similar to rcu_dereference().
>>>> +     */
>>>> +    if ((flag & MEM_RCU) && env->prog->aux->sleepable && 
>>>> !env->cur_state->active_rcu_lock) {
>>>> +        verbose(env,
>>>> +            "R%d is rcu dereference ptr_%s with off=%d, not in 
>>>> rcu_read_lock region\n",
>>>> +            regno, tname, off);
>>>> +        return -EACCES;
>>>> +    }
>>>
>>> Would this make the existing rdonly use case fail?
>>>
>>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>>> int task_real_parent(void *ctx)
>>> {
>>>      struct task_struct *task, *real_parent;
>>>
>>>      task = bpf_get_current_task_btf();
>>>          real_parent = task->real_parent;
>>>          bpf_printk("pid %u\n", real_parent->pid);
>>>          return 0;
>>> }
>>
>> Right, it will fail. To fix the issue, user can do
>>     bpf_rcu_read_lock();
>>     real_parent = task->real_parent;
>>     bpf_printk("pid %u\n", real_parent->pid);
>>     bpf_rcu_read_unlock();
>>
>> But this raised a good question. How do we deal with
>> legacy sleepable programs with newly-added rcu tagging
>> capabilities.
>>
>> My current option is to error out if rcu usage is not right.
>> But this might break existing sleepable programs.
>>
>> Another option intends to not break existing, like above,
>> codes. In this case, MEM_RCU will not tagged if it is
>> not inside bpf_rcu_read_lock() region.
> 
> hmm.... it is to make MEM_RCU to mean a reg is protected by the current 
> active_rcu_lock or not?

Yes, for example, in 'real_parent = task->real_parent' where
'real_parent' in task_struct is tagged with __rcu in the struct
definition. So the 'real_parent' variable in the above assignment
will be tagged with MEM_RCU.

> 
>> In this case, the above non-rcu-protected code should work. And the
>> following should work as well although it is a little
>> bit awkward.
>>     real_parent = task->real_parent; // real_parent not tagged with rcu
>>     bpf_rcu_read_lock();
>>     bpf_printk("pid %u\n", real_parent->pid);
>>     bpf_rcu_read_unlock();
> 
> I think it should be fine.  bpf_rcu_read_lock() just not useful in this 
> example but nothing break or crash.  Also, after bpf_rcu_read_unlock(), 
> real_parent will continue to be readable because the MEM_RCU is not set?

That is correct. the variable real_parent is not tagged with MEM_RCU
and it will stay that way for the rest of its life cycle.

With new PTR_TRUSTED mechanism, real_parent will be marked as normal
PTR_TO_BTF_ID and it is not marked as PTR_UNTRUSTED for backward
compatibility. So in the above code, real_parent->pid is just a normal
load (not related to rcu/trusted/untrusted). People may think it
is okay, but actually it does not okay. Verifier could add more state
to issue proper warnings, but I am not sure whether it is worthwhile
or not. As you mentioned, nothing breaks. It is just the current
existing way. So we should be able to live with this.

> 
> On top of the active_rcu_lock, should MEM_RCU be set only when it is 
> dereferenced from a PTR_TRUSTED ptr (or with ref_obj_id != 0)?

I didn't consider PTR_TRUSTED because it is just introduced yesterday...

My current implementation inherits the old ptr_to_btf_id way where by
default any ptr_to_btf_id is trusted. But since we have PTR_TRUSTED
we should be able to use it for a stronger guarantee.

> I am thinking about the following more common case:
> 
>      /* bpf_get_current_task_btf() may need to be changed
>       * to set PTR_TRUSTED at the retval?
>       */
>      /* task: PTR_TO_BTF_ID | PTR_TRUSTED */
>      task = bpf_get_current_task_btf();
> 
>      bpf_rcu_read_lock();
> 
>      /* real_parent: PTR_TO_BTF_ID | PTR_TRUSTED | MEM_RCU */
>      real_parent = task->real_parent;
> 
>          /* bpf_task_acquire() needs to change to use 
> refcount_inc_not_zero */
>      real_parent = bpf_task_acquire(real_parent);
> 
>      bpf_rcu_read_unlock();
> 
>      /* real_parent is accessible here (after checking NULL) and
>       * can be passed to kfunc
>       */
> 

Yes, the above is a typical use case. Or alternatively after
     real_parent = task->real_parent;
     /* use real_parent inside the bpf_rcu_read_lock() region */

I will try to utilize PTR_TRUSTED concept in the next revision.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock()
  2022-11-21 17:05 ` [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
@ 2022-11-22  1:59   ` John Fastabend
  2022-11-22  4:09     ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: John Fastabend @ 2022-11-22  1:59 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

Yonghong Song wrote:
> Add a few positive/negative tests to test bpf_rcu_read_lock()
> and its corresponding verifier support. The new test will fail
> on s390x and aarch64, so an entry is added to each of their
> respective deny lists.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>

[...]

> +SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
> +int nested_rcu_region(void *ctx)
> +{
> +	struct task_struct *task, *real_parent;
> +
> +	/* nested rcu read lock regions */
> +	task = bpf_get_current_task_btf();
> +	bpf_rcu_read_lock();
> +	bpf_rcu_read_lock();
> +	real_parent = task->real_parent;
> +	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
> +	bpf_rcu_read_unlock();
> +	bpf_rcu_read_unlock();
> +	return 0;
> +}

I think you also need the nested imbalance case is this
handled? It looks like the active_rcu is just a bool?

 +SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
 +int nested_rcu_region(void *ctx)
 +{
 +	struct task_struct *task, *real_parent;
 +
 +	/* nested rcu read lock regions */
 +	task = bpf_get_current_task_btf();
 +	bpf_rcu_read_lock();
 +	bpf_rcu_read_lock();
 +	real_parent = task->real_parent;
 +	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
 +      // imbalance unlock()
 +	bpf_rcu_read_unlock();
 +	return 0;
 +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-21 23:42         ` Yonghong Song
@ 2022-11-22  2:03           ` John Fastabend
  2022-11-22  4:16             ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: John Fastabend @ 2022-11-22  2:03 UTC (permalink / raw)
  To: Yonghong Song, Martin KaFai Lau
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, bpf

Yonghong Song wrote:
> 
> 
> On 11/21/22 2:56 PM, Martin KaFai Lau wrote:
> > On 11/21/22 12:01 PM, Yonghong Song wrote:
> >>
> >>
> >> On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
> >>> On 11/21/22 9:05 AM, Yonghong Song wrote:
> >>>> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct 
> >>>> bpf_verifier_env *env,
> >>>>           return -EACCES;
> >>>>       }
> >>>> +    /* Access rcu protected memory */
> >>>> +    if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
> >>>> +        !env->cur_state->active_rcu_lock) {
> >>>> +        verbose(env,
> >>>> +            "R%d is ptr_%s access rcu-protected memory with off=%d, 
> >>>> not rcu protected\n",
> >>>> +            regno, tname, off);
> >>>> +        return -EACCES;
> >>>> +    }
> >>>> +
> >>>>       if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
> >>>>           if (!btf_is_kernel(reg->btf)) {
> >>>>               verbose(env, "verifier internal error: reg->btf must 
> >>>> be kernel btf\n");
> >>>> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct 
> >>>> bpf_verifier_env *env,
> >>>>       if (ret < 0)
> >>>>           return ret;
> >>>> +    /* The value is a rcu pointer. The load needs to be in a rcu 
> >>>> lock region,
> >>>> +     * similar to rcu_dereference().
> >>>> +     */
> >>>> +    if ((flag & MEM_RCU) && env->prog->aux->sleepable && 
> >>>> !env->cur_state->active_rcu_lock) {
> >>>> +        verbose(env,
> >>>> +            "R%d is rcu dereference ptr_%s with off=%d, not in 
> >>>> rcu_read_lock region\n",
> >>>> +            regno, tname, off);
> >>>> +        return -EACCES;
> >>>> +    }
> >>>
> >>> Would this make the existing rdonly use case fail?
> >>>
> >>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
> >>> int task_real_parent(void *ctx)
> >>> {
> >>>      struct task_struct *task, *real_parent;
> >>>
> >>>      task = bpf_get_current_task_btf();
> >>>          real_parent = task->real_parent;
> >>>          bpf_printk("pid %u\n", real_parent->pid);
> >>>          return 0;
> >>> }
> >>
> >> Right, it will fail. To fix the issue, user can do
> >>     bpf_rcu_read_lock();
> >>     real_parent = task->real_parent;
> >>     bpf_printk("pid %u\n", real_parent->pid);
> >>     bpf_rcu_read_unlock();
> >>
> >> But this raised a good question. How do we deal with
> >> legacy sleepable programs with newly-added rcu tagging
> >> capabilities.
> >>
> >> My current option is to error out if rcu usage is not right.
> >> But this might break existing sleepable programs.
> >>
> >> Another option intends to not break existing, like above,
> >> codes. In this case, MEM_RCU will not tagged if it is
> >> not inside bpf_rcu_read_lock() region.
> > 
> > hmm.... it is to make MEM_RCU to mean a reg is protected by the current 
> > active_rcu_lock or not?
> 
> Yes, for example, in 'real_parent = task->real_parent' where
> 'real_parent' in task_struct is tagged with __rcu in the struct
> definition. So the 'real_parent' variable in the above assignment
> will be tagged with MEM_RCU.
> 
> > 
> >> In this case, the above non-rcu-protected code should work. And the
> >> following should work as well although it is a little
> >> bit awkward.
> >>     real_parent = task->real_parent; // real_parent not tagged with rcu
> >>     bpf_rcu_read_lock();
> >>     bpf_printk("pid %u\n", real_parent->pid);
> >>     bpf_rcu_read_unlock();
> > 
> > I think it should be fine.  bpf_rcu_read_lock() just not useful in this 
> > example but nothing break or crash.  Also, after bpf_rcu_read_unlock(), 
> > real_parent will continue to be readable because the MEM_RCU is not set?
> 
> That is correct. the variable real_parent is not tagged with MEM_RCU
> and it will stay that way for the rest of its life cycle.
> 
> With new PTR_TRUSTED mechanism, real_parent will be marked as normal
> PTR_TO_BTF_ID and it is not marked as PTR_UNTRUSTED for backward
> compatibility. So in the above code, real_parent->pid is just a normal
> load (not related to rcu/trusted/untrusted). People may think it
> is okay, but actually it does not okay. Verifier could add more state
> to issue proper warnings, but I am not sure whether it is worthwhile
> or not. As you mentioned, nothing breaks. It is just the current
> existing way. So we should be able to live with this.
> 
> > 
> > On top of the active_rcu_lock, should MEM_RCU be set only when it is 
> > dereferenced from a PTR_TRUSTED ptr (or with ref_obj_id != 0)?
> 
> I didn't consider PTR_TRUSTED because it is just introduced yesterday...
> 
> My current implementation inherits the old ptr_to_btf_id way where by
> default any ptr_to_btf_id is trusted. But since we have PTR_TRUSTED
> we should be able to use it for a stronger guarantee.
> 
> > I am thinking about the following more common case:
> > 
> >      /* bpf_get_current_task_btf() may need to be changed
> >       * to set PTR_TRUSTED at the retval?
> >       */
> >      /* task: PTR_TO_BTF_ID | PTR_TRUSTED */
> >      task = bpf_get_current_task_btf();
> > 
> >      bpf_rcu_read_lock();
> > 
> >      /* real_parent: PTR_TO_BTF_ID | PTR_TRUSTED | MEM_RCU */
> >      real_parent = task->real_parent;
> > 
> >          /* bpf_task_acquire() needs to change to use 
> > refcount_inc_not_zero */
> >      real_parent = bpf_task_acquire(real_parent);
> > 
> >      bpf_rcu_read_unlock();
> > 
> >      /* real_parent is accessible here (after checking NULL) and
> >       * can be passed to kfunc
> >       */
> > 
> 
> Yes, the above is a typical use case. Or alternatively after
>      real_parent = task->real_parent;
>      /* use real_parent inside the bpf_rcu_read_lock() region */
> 
> I will try to utilize PTR_TRUSTED concept in the next revision.

Also perhaps interesting is when task is read out of a map
with reference already pinned. I think you should clear
the MEM_RCU tag on all referenced objects?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock()
  2022-11-22  1:59   ` John Fastabend
@ 2022-11-22  4:09     ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2022-11-22  4:09 UTC (permalink / raw)
  To: John Fastabend, Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 11/21/22 5:59 PM, John Fastabend wrote:
> Yonghong Song wrote:
>> Add a few positive/negative tests to test bpf_rcu_read_lock()
>> and its corresponding verifier support. The new test will fail
>> on s390x and aarch64, so an entry is added to each of their
>> respective deny lists.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> [...]
> 
>> +SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
>> +int nested_rcu_region(void *ctx)
>> +{
>> +	struct task_struct *task, *real_parent;
>> +
>> +	/* nested rcu read lock regions */
>> +	task = bpf_get_current_task_btf();
>> +	bpf_rcu_read_lock();
>> +	bpf_rcu_read_lock();
>> +	real_parent = task->real_parent;
>> +	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
>> +	bpf_rcu_read_unlock();
>> +	bpf_rcu_read_unlock();
>> +	return 0;
>> +}
> 
> I think you also need the nested imbalance case is this
> handled? It looks like the active_rcu is just a bool?

Currently we don't support nested bpf_rcu_read_lock()
regions. So the error will appear for the second
bpf_rcu_read_lock() in the above code, regardless of
the eventual balance or not-balance.

> 
>   +SEC("?fentry.s/" SYS_PREFIX "sys_nanosleep")
>   +int nested_rcu_region(void *ctx)
>   +{
>   +	struct task_struct *task, *real_parent;
>   +
>   +	/* nested rcu read lock regions */
>   +	task = bpf_get_current_task_btf();
>   +	bpf_rcu_read_lock();
>   +	bpf_rcu_read_lock();
>   +	real_parent = task->real_parent;
>   +	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
>   +      // imbalance unlock()
>   +	bpf_rcu_read_unlock();
>   +	return 0;
>   +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-22  2:03           ` John Fastabend
@ 2022-11-22  4:16             ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2022-11-22  4:16 UTC (permalink / raw)
  To: John Fastabend, Martin KaFai Lau
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau, bpf



On 11/21/22 6:03 PM, John Fastabend wrote:
> Yonghong Song wrote:
>>
>>
>> On 11/21/22 2:56 PM, Martin KaFai Lau wrote:
>>> On 11/21/22 12:01 PM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 11/21/22 11:41 AM, Martin KaFai Lau wrote:
>>>>> On 11/21/22 9:05 AM, Yonghong Song wrote:
>>>>>> @@ -4704,6 +4715,15 @@ static int check_ptr_to_btf_access(struct
>>>>>> bpf_verifier_env *env,
>>>>>>            return -EACCES;
>>>>>>        }
>>>>>> +    /* Access rcu protected memory */
>>>>>> +    if ((reg->type & MEM_RCU) && env->prog->aux->sleepable &&
>>>>>> +        !env->cur_state->active_rcu_lock) {
>>>>>> +        verbose(env,
>>>>>> +            "R%d is ptr_%s access rcu-protected memory with off=%d,
>>>>>> not rcu protected\n",
>>>>>> +            regno, tname, off);
>>>>>> +        return -EACCES;
>>>>>> +    }
>>>>>> +
>>>>>>        if (env->ops->btf_struct_access && !type_is_alloc(reg->type)) {
>>>>>>            if (!btf_is_kernel(reg->btf)) {
>>>>>>                verbose(env, "verifier internal error: reg->btf must
>>>>>> be kernel btf\n");
>>>>>> @@ -4731,12 +4751,27 @@ static int check_ptr_to_btf_access(struct
>>>>>> bpf_verifier_env *env,
>>>>>>        if (ret < 0)
>>>>>>            return ret;
>>>>>> +    /* The value is a rcu pointer. The load needs to be in a rcu
>>>>>> lock region,
>>>>>> +     * similar to rcu_dereference().
>>>>>> +     */
>>>>>> +    if ((flag & MEM_RCU) && env->prog->aux->sleepable &&
>>>>>> !env->cur_state->active_rcu_lock) {
>>>>>> +        verbose(env,
>>>>>> +            "R%d is rcu dereference ptr_%s with off=%d, not in
>>>>>> rcu_read_lock region\n",
>>>>>> +            regno, tname, off);
>>>>>> +        return -EACCES;
>>>>>> +    }
>>>>>
>>>>> Would this make the existing rdonly use case fail?
>>>>>
>>>>> SEC("fentry.s/" SYS_PREFIX "sys_getpgid")
>>>>> int task_real_parent(void *ctx)
>>>>> {
>>>>>       struct task_struct *task, *real_parent;
>>>>>
>>>>>       task = bpf_get_current_task_btf();
>>>>>           real_parent = task->real_parent;
>>>>>           bpf_printk("pid %u\n", real_parent->pid);
>>>>>           return 0;
>>>>> }
>>>>
>>>> Right, it will fail. To fix the issue, user can do
>>>>      bpf_rcu_read_lock();
>>>>      real_parent = task->real_parent;
>>>>      bpf_printk("pid %u\n", real_parent->pid);
>>>>      bpf_rcu_read_unlock();
>>>>
>>>> But this raised a good question. How do we deal with
>>>> legacy sleepable programs with newly-added rcu tagging
>>>> capabilities.
>>>>
>>>> My current option is to error out if rcu usage is not right.
>>>> But this might break existing sleepable programs.
>>>>
>>>> Another option intends to not break existing, like above,
>>>> codes. In this case, MEM_RCU will not tagged if it is
>>>> not inside bpf_rcu_read_lock() region.
>>>
>>> hmm.... it is to make MEM_RCU to mean a reg is protected by the current
>>> active_rcu_lock or not?
>>
>> Yes, for example, in 'real_parent = task->real_parent' where
>> 'real_parent' in task_struct is tagged with __rcu in the struct
>> definition. So the 'real_parent' variable in the above assignment
>> will be tagged with MEM_RCU.
>>
>>>
>>>> In this case, the above non-rcu-protected code should work. And the
>>>> following should work as well although it is a little
>>>> bit awkward.
>>>>      real_parent = task->real_parent; // real_parent not tagged with rcu
>>>>      bpf_rcu_read_lock();
>>>>      bpf_printk("pid %u\n", real_parent->pid);
>>>>      bpf_rcu_read_unlock();
>>>
>>> I think it should be fine.  bpf_rcu_read_lock() just not useful in this
>>> example but nothing break or crash.  Also, after bpf_rcu_read_unlock(),
>>> real_parent will continue to be readable because the MEM_RCU is not set?
>>
>> That is correct. the variable real_parent is not tagged with MEM_RCU
>> and it will stay that way for the rest of its life cycle.
>>
>> With new PTR_TRUSTED mechanism, real_parent will be marked as normal
>> PTR_TO_BTF_ID and it is not marked as PTR_UNTRUSTED for backward
>> compatibility. So in the above code, real_parent->pid is just a normal
>> load (not related to rcu/trusted/untrusted). People may think it
>> is okay, but actually it does not okay. Verifier could add more state
>> to issue proper warnings, but I am not sure whether it is worthwhile
>> or not. As you mentioned, nothing breaks. It is just the current
>> existing way. So we should be able to live with this.
>>
>>>
>>> On top of the active_rcu_lock, should MEM_RCU be set only when it is
>>> dereferenced from a PTR_TRUSTED ptr (or with ref_obj_id != 0)?
>>
>> I didn't consider PTR_TRUSTED because it is just introduced yesterday...
>>
>> My current implementation inherits the old ptr_to_btf_id way where by
>> default any ptr_to_btf_id is trusted. But since we have PTR_TRUSTED
>> we should be able to use it for a stronger guarantee.
>>
>>> I am thinking about the following more common case:
>>>
>>>       /* bpf_get_current_task_btf() may need to be changed
>>>        * to set PTR_TRUSTED at the retval?
>>>        */
>>>       /* task: PTR_TO_BTF_ID | PTR_TRUSTED */
>>>       task = bpf_get_current_task_btf();
>>>
>>>       bpf_rcu_read_lock();
>>>
>>>       /* real_parent: PTR_TO_BTF_ID | PTR_TRUSTED | MEM_RCU */
>>>       real_parent = task->real_parent;
>>>
>>>           /* bpf_task_acquire() needs to change to use
>>> refcount_inc_not_zero */
>>>       real_parent = bpf_task_acquire(real_parent);
>>>
>>>       bpf_rcu_read_unlock();
>>>
>>>       /* real_parent is accessible here (after checking NULL) and
>>>        * can be passed to kfunc
>>>        */
>>>
>>
>> Yes, the above is a typical use case. Or alternatively after
>>       real_parent = task->real_parent;
>>       /* use real_parent inside the bpf_rcu_read_lock() region */
>>
>> I will try to utilize PTR_TRUSTED concept in the next revision.
> 
> Also perhaps interesting is when task is read out of a map
> with reference already pinned. I think you should clear
> the MEM_RCU tag on all referenced objects?

The register tagged with MEM_RCU will not be a referenced obj.
MEM_RCU tag only appears to a register inside the rcu read lock
region as the rcu_reference() result. So the obj tagged with
MEM_RCU is protected with rcu read lock and it is valid and
trusted and there is no need to acquire additional reference.
If user calls another kfunc to acquire a reference, then
the resulted ptr will not have MEM_RCU tag but with non-zero
ref_obj_id.

The MEM_RCU reg will be invalidated when seeing bpf_rcu_read_unlock()
to prevent rcu-protected ptr to leak out of the rcu read lock region.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-21 17:05 ` [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock() Yonghong Song
  2022-11-21 19:41   ` Martin KaFai Lau
@ 2022-11-22  5:48   ` Alexei Starovoitov
  2022-11-22  6:32     ` Yonghong Song
  1 sibling, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2022-11-22  5:48 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On 11/21/22 9:05 AM, Yonghong Song wrote:
> +	if (env->cur_state->active_rcu_lock) {
> +		if (bpf_lsm_sleepable_func_proto(func_id) ||
> +		    bpf_tracing_sleepable_func_proto(func_id)) {
> +			verbose(env, "sleepable helper %s#%din rcu_read_lock region\n",
> +				func_id_name(func_id), func_id);
> +			return -EINVAL;
> +		}
> +

Even after patch 2 refactoring the above bit is still quite fragile.
Ex: bpf_d_path is not included, but it should be.

How about we add 'bool might_sleep' to bpf_func_proto and mark existing
5 functions with it and refactor patch 2 differently.
We won't be doing prog->aux->sleepable ? in bpf_tracing_func_proto() 
anymore.
Those cbs will be returning func_proto-s,
but the verifier later will check might_sleep flag.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock()
  2022-11-22  5:48   ` Alexei Starovoitov
@ 2022-11-22  6:32     ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2022-11-22  6:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau



On 11/21/22 9:48 PM, Alexei Starovoitov wrote:
> On 11/21/22 9:05 AM, Yonghong Song wrote:
>> +    if (env->cur_state->active_rcu_lock) {
>> +        if (bpf_lsm_sleepable_func_proto(func_id) ||
>> +            bpf_tracing_sleepable_func_proto(func_id)) {
>> +            verbose(env, "sleepable helper %s#%din rcu_read_lock 
>> region\n",
>> +                func_id_name(func_id), func_id);
>> +            return -EINVAL;
>> +        }
>> +
> 
> Even after patch 2 refactoring the above bit is still quite fragile.
> Ex: bpf_d_path is not included, but it should be.
> 
> How about we add 'bool might_sleep' to bpf_func_proto and mark existing
> 5 functions with it and refactor patch 2 differently.
> We won't be doing prog->aux->sleepable ? in bpf_tracing_func_proto() 
> anymore.
> Those cbs will be returning func_proto-s,
> but the verifier later will check might_sleep flag.

Ya, bpf_func_proto->might_sleep indeed better. I could do that.
The only problem is bpf_d_path.

static bool bpf_d_path_allowed(const struct bpf_prog *prog)
{
         if (prog->type == BPF_PROG_TYPE_TRACING &&
             prog->expected_attach_type == BPF_TRACE_ITER)
                 return true;

         if (prog->type == BPF_PROG_TYPE_LSM)
                 return bpf_lsm_is_sleepable_hook(prog->aux->attach_btf_id);

         return btf_id_set_contains(&btf_allowlist_d_path,
                                    prog->aux->attach_btf_id);
}

If I understand correctly, bpf_d_path helper doesn't mean
the helper itself will be sleepable. For example, bpf_d_path can only 
appear in sleepable programs if program type is BPF_PROG_TYPE_LSM,
from 6f100640ca5b ("bpf: Expose bpf_d_path helper to sleepable LSM 
hooks") it looks like the reason is those sleepable lsm programs
provide better context so bpf_d_path won't have potential lock
or other issues. So essentially, bpf_d_path helper itself
won't be a helper causing the prog to sleep, right? If this is
the case, we only assign might_sleepable to the other 4 helpers.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v7 2/4] bpf: Abstract out functions to check sleepable helpers
  2022-11-21 17:05 ` [PATCH bpf-next v7 2/4] bpf: Abstract out functions to check sleepable helpers Yonghong Song
@ 2022-11-22 17:06   ` KP Singh
  0 siblings, 0 replies; 16+ messages in thread
From: KP Singh @ 2022-11-22 17:06 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, Martin KaFai Lau

On Mon, Nov 21, 2022 at 6:05 PM Yonghong Song <yhs@fb.com> wrote:
>
> Abstract out two functions to check whether a particular helper
> is sleepable or not for bpf_lsm and bpf_trace. These two
> functions will be used later to check whether a helper is
> sleepable or not in verifier. There is no functionality
> change.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>

Acked-by: KP Singh <kpsingh@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-11-22 17:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-21 17:05 [PATCH bpf-next v7 0/4] bpf: Add bpf_rcu_read_lock() support Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 1/4] compiler_types: Define __rcu as __attribute__((btf_type_tag("rcu"))) Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 2/4] bpf: Abstract out functions to check sleepable helpers Yonghong Song
2022-11-22 17:06   ` KP Singh
2022-11-21 17:05 ` [PATCH bpf-next v7 3/4] bpf: Add kfunc bpf_rcu_read_lock/unlock() Yonghong Song
2022-11-21 19:41   ` Martin KaFai Lau
2022-11-21 20:01     ` Yonghong Song
2022-11-21 22:56       ` Martin KaFai Lau
2022-11-21 23:42         ` Yonghong Song
2022-11-22  2:03           ` John Fastabend
2022-11-22  4:16             ` Yonghong Song
2022-11-22  5:48   ` Alexei Starovoitov
2022-11-22  6:32     ` Yonghong Song
2022-11-21 17:05 ` [PATCH bpf-next v7 4/4] selftests/bpf: Add tests for bpf_rcu_read_lock() Yonghong Song
2022-11-22  1:59   ` John Fastabend
2022-11-22  4:09     ` Yonghong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.