[PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement
@ 2023-04-17 15:47 Yafang Shao
  2023-04-17 15:47 ` [PATCH bpf-next 1/6] bpf: Add __rcu_read_{lock,unlock} into btf id deny list Yafang Shao
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Yafang Shao @ 2023-04-17 15:47 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat
  Cc: bpf, linux-trace-kernel, linux-kernel, Yafang Shao

Currently we use prog->active to prevent tracing recursion, but it has
some downsides,

- It can't identify different contexts
  That said, if a process context is interrupted by a irq context and
  the irq context runs the same code path, it will be considered as
  recursion. For example,
    normal:
      this_cpu_inc_return(*(prog->active)) == 1 <- OK

      irq:
        this_cpu_inc_return(*(prog->active)) == 1 <- FAIL!
        [ Considered as recusion ]

- It has to maintain a percpu area
  A percpu area will be allocated for each prog when the prog is loaded
  and be freed when the prog is destroyed.

Let's replace it with the generic tracing recursion prevention mechanism,
which can work fine with anything. In the above example, the irq context
won't be considered as recursion again,
  normal:
    test_recursion_try_acquire() <- OK

    softirq:
      test_recursion_try_acquire() <- OK

      irq:
        test_recursion_try_acquire() <- OK

Note that, currently one single recursion in process context is allowed
due to the TRACE_CTX_TRANSITION workaround, which can be fixed in the
future. That said, below behavior is expected currently,
  normal:
    test_recursion_try_acquire() <- OK
    [ recursion happens ]        <- one single recursion is allowed
    test_recursion_try_acquire() <- OK
    [ recursion happens ]
    test_recursion_try_acquire() <- RECURSION!

To adapt to this behavior, the bpf recursion selftest is changed.

Steven Rostedt (Google) (1):
  tracing: Add generic test_recursion_try_acquire()

Yafang Shao (5):
  bpf: Add __rcu_read_{lock,unlock} into btf id deny list
  tracing: Add the comment for allowing one single recursion in process
    context
  selftests/bpf: Allow one single recursion in fentry recursion test
  bpf: Improve tracing recursion prevention mechanism
  bpf: Remove some denied functions from the btf id deny list

 include/linux/bpf.h                                |  2 +-
 include/linux/trace_recursion.h                    | 49 ++++++++++++++++------
 kernel/bpf/core.c                                  | 10 -----
 kernel/bpf/trampoline.c                            | 44 ++++++++++++++-----
 kernel/bpf/verifier.c                              | 10 ++---
 kernel/trace/bpf_trace.c                           | 12 +++---
 kernel/trace/ftrace.c                              |  2 +
 tools/testing/selftests/bpf/prog_tests/recursion.c |  7 +++-
 tools/testing/selftests/bpf/test_progs.h           | 19 +++++++++
 9 files changed, 107 insertions(+), 48 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 1/6] bpf: Add __rcu_read_{lock,unlock} into btf id deny list
  2023-04-17 15:47 [PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement Yafang Shao
@ 2023-04-17 15:47 ` Yafang Shao
  2023-04-17 15:47 ` [PATCH bpf-next 2/6] tracing: Add generic test_recursion_try_acquire() Yafang Shao
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Yafang Shao @ 2023-04-17 15:47 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat
  Cc: bpf, linux-trace-kernel, linux-kernel, Yafang Shao

The tracing recursion prevention mechanism must be protected by rcu, that
leaves __rcu_read_{lock,unlock} unprotected by this mechanism. If we trace
them, the recursion will happen. Let's add them into the btf id deny list.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/verifier.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5dae11e..83fb94f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18645,6 +18645,10 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 BTF_ID(func, preempt_count_add)
 BTF_ID(func, preempt_count_sub)
 #endif
+#ifdef CONFIG_PREEMPT_RCU
+BTF_ID(func, __rcu_read_lock)
+BTF_ID(func, __rcu_read_unlock)
+#endif
 BTF_SET_END(btf_id_deny)
 
 static bool can_be_sleepable(struct bpf_prog *prog)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 2/6] tracing: Add generic test_recursion_try_acquire()
  2023-04-17 15:47 [PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement Yafang Shao
  2023-04-17 15:47 ` [PATCH bpf-next 1/6] bpf: Add __rcu_read_{lock,unlock} into btf id deny list Yafang Shao
@ 2023-04-17 15:47 ` Yafang Shao
  2023-04-20  6:51   ` Masami Hiramatsu
  2023-04-17 15:47 ` [PATCH bpf-next 3/6] tracing: Add the comment for allowing one single recursion in process context Yafang Shao
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Yafang Shao @ 2023-04-17 15:47 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat
  Cc: bpf, linux-trace-kernel, linux-kernel, Yafang Shao

From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The ftrace_test_recursion_trylock() also disables preemption. This is not
required, but was a clean up as every place that called it also disabled
preemption, and making the two tightly coupled appeared to make the code
simpler.

But the recursion protection can be used for other purposes that do not
require disabling preemption. As the recursion bits are attached to the
task_struct, it follows the task, so there's no need for preemption being
disabled.

Add test_recursion_try_acquire/release() functions to be used generically,
and separate it from being associated with ftrace. It also removes the
"lock" name, as there is no lock happening. Keeping the "lock" for the
ftrace version is better, as it at least differentiates that preemption is
being disabled (hence, "locking the CPU").

Link: https://lore.kernel.org/linux-trace-kernel/20230321020103.13494-1-laoar.shao@gmail.com/

Acked-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/trace_recursion.h | 47 ++++++++++++++++++++++++++++++-----------
 kernel/trace/ftrace.c           |  2 ++
 2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index d48cd92..80de2ee 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -150,9 +150,6 @@ static __always_inline int trace_get_context_bit(void)
 # define trace_warn_on_no_rcu(ip)	false
 #endif
 
-/*
- * Preemption is promised to be disabled when return bit >= 0.
- */
 static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
 							int start)
 {
@@ -182,18 +179,11 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
 	val |= 1 << bit;
 	current->trace_recursion = val;
 	barrier();
-
-	preempt_disable_notrace();
-
 	return bit;
 }
 
-/*
- * Preemption will be enabled (if it was previously enabled).
- */
 static __always_inline void trace_clear_recursion(int bit)
 {
-	preempt_enable_notrace();
 	barrier();
 	trace_recursion_clear(bit);
 }
@@ -205,12 +195,18 @@ static __always_inline void trace_clear_recursion(int bit)
  * tracing recursed in the same context (normal vs interrupt),
  *
  * Returns: -1 if a recursion happened.
- *           >= 0 if no recursion.
+ *           >= 0 if no recursion and preemption will be disabled.
  */
 static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
 							 unsigned long parent_ip)
 {
-	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
+	int bit;
+
+	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
+	if (unlikely(bit < 0))
+		return bit;
+	preempt_disable_notrace();
+	return bit;
 }
 
 /**
@@ -221,6 +217,33 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
  */
 static __always_inline void ftrace_test_recursion_unlock(int bit)
 {
+	preempt_enable_notrace();
+	trace_clear_recursion(bit);
+}
+
+/**
+ * test_recursion_try_acquire - tests for recursion in same context
+ *
+ * This will detect recursion of a function.
+ *
+ * Returns: -1 if a recursion happened.
+ *           >= 0 if no recursion
+ */
+static __always_inline int test_recursion_try_acquire(unsigned long ip,
+						      unsigned long parent_ip)
+{
+	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
+}
+
+/**
+ * test_recursion_release - called after a success of test_recursion_try_acquire()
+ * @bit: The return of a successful test_recursion_try_acquire()
+ *
+ * This releases the recursion lock taken by a non-negative return call
+ * by test_recursion_try_acquire().
+ */
+static __always_inline void test_recursion_release(int bit)
+{
 	trace_clear_recursion(bit);
 }
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c67bcc8..8ad3ab4 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7647,6 +7647,7 @@ void ftrace_reset_array_ops(struct trace_array *tr)
 	if (bit < 0)
 		return;
 
+	preempt_disable();
 	do_for_each_ftrace_op(op, ftrace_ops_list) {
 		/* Stub functions don't need to be called nor tested */
 		if (op->flags & FTRACE_OPS_FL_STUB)
@@ -7668,6 +7669,7 @@ void ftrace_reset_array_ops(struct trace_array *tr)
 		}
 	} while_for_each_ftrace_op(op);
 out:
+	preempt_enable();
 	trace_clear_recursion(bit);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 3/6] tracing: Add the comment for allowing one single recursion in process context
  2023-04-17 15:47 [PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement Yafang Shao
  2023-04-17 15:47 ` [PATCH bpf-next 1/6] bpf: Add __rcu_read_{lock,unlock} into btf id deny list Yafang Shao
  2023-04-17 15:47 ` [PATCH bpf-next 2/6] tracing: Add generic test_recursion_try_acquire() Yafang Shao
@ 2023-04-17 15:47 ` Yafang Shao
  2023-04-17 15:47 ` [PATCH bpf-next 4/6] selftests/bpf: Allow one single recursion in fentry recursion test Yafang Shao
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Yafang Shao @ 2023-04-17 15:47 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat
  Cc: bpf, linux-trace-kernel, linux-kernel, Yafang Shao

After TRACE_CTX_TRANSITION is applied, it will allow one single recursion
in the process context. Below is an example,

SEC("fentry/htab_map_delete_elem")
int BPF_PROG(on_delete, struct bpf_map *map)
{
    pass2++;
    bpf_map_delete_elem(&hash2, &key);
    return 0;
}

In the above test case, the recursion will be detected at the second
bpf_map_delete_elem() call in this prog. Illustrated as follows,

SEC("fentry/htab_map_delete_elem")
    pass2++;   <<<< Turn out to be 1 after this operation.
    bpf_map_delete_elem(&hash2, &key);
         SEC("fentry/htab_map_delete_elem")       <<<< no recursion
            pass2++; <<<< Turn out to be 2 after this operation.
            bpf_map_delete_elem(&hash2, &key);
                SEC("fentry/htab_map_delete_elem") <<<< RECURSION

We'd better explain this behavior explicitly.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/trace_recursion.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 80de2ee..445a055 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -168,6 +168,8 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
 		 * will think a recursion occurred, and the event will be dropped.
 		 * Let a single instance happen via the TRANSITION_BIT to
 		 * not drop those events.
+		 * After this rule is applied, one single recursion is allowed in
+		 * the process context.
 		 */
 		bit = TRACE_CTX_TRANSITION + start;
 		if (val & (1 << bit)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 4/6] selftests/bpf: Allow one single recursion in fentry recursion test
  2023-04-17 15:47 [PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement Yafang Shao
                   ` (2 preceding siblings ...)
  2023-04-17 15:47 ` [PATCH bpf-next 3/6] tracing: Add the comment for allowing one single recursion in process context Yafang Shao
@ 2023-04-17 15:47 ` Yafang Shao
  2023-04-17 15:47 ` [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism Yafang Shao
  2023-04-17 15:47 ` [PATCH bpf-next 6/6] bpf: Remove some denied functions from the btf id deny list Yafang Shao
  5 siblings, 0 replies; 25+ messages in thread
From: Yafang Shao @ 2023-04-17 15:47 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat
  Cc: bpf, linux-trace-kernel, linux-kernel, Yafang Shao

This is a prepation for replacing prog->active with
test_recursion_{acquire,release}, in which one single recursion in the
process context is allowed. The behavior will be as follows,

SEC("fentry/htab_map_delete_elem")
    pass2++;   <<<< Turn out to be 1 after this operation.
    bpf_map_delete_elem(&hash2, &key);
         SEC("fentry/htab_map_delete_elem")    <<<< not recursion
            pass2++; <<<< Turn out to be 2 after this operation.
            bpf_map_delete_elem(&hash2, &key);
                SEC("fentry/htab_map_delete_elem") <<<< RECURSION

Hence we need to change the selftest to allow it. To be
backward-compatibility, we allow both the old value and the new value
to be expected, so a new helper ASSERT_IN_ARRAY() is introduced.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 tools/testing/selftests/bpf/prog_tests/recursion.c |  7 +++++--
 tools/testing/selftests/bpf/test_progs.h           | 19 +++++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/recursion.c b/tools/testing/selftests/bpf/prog_tests/recursion.c
index 23552d3..dfbed2e 100644
--- a/tools/testing/selftests/bpf/prog_tests/recursion.c
+++ b/tools/testing/selftests/bpf/prog_tests/recursion.c
@@ -8,6 +8,7 @@ void test_recursion(void)
 	struct bpf_prog_info prog_info = {};
 	__u32 prog_info_len = sizeof(prog_info);
 	struct recursion *skel;
+	int expected[2];
 	int key = 0;
 	int err;
 
@@ -27,9 +28,11 @@ void test_recursion(void)
 
 	ASSERT_EQ(skel->bss->pass2, 0, "pass2 == 0");
 	bpf_map_delete_elem(bpf_map__fd(skel->maps.hash2), &key);
-	ASSERT_EQ(skel->bss->pass2, 1, "pass2 == 1");
+	expected[1] = 2;
+	ASSERT_IN_ARRAY(skel->bss->pass2, expected, "pass2 in [0 2]");
 	bpf_map_delete_elem(bpf_map__fd(skel->maps.hash2), &key);
-	ASSERT_EQ(skel->bss->pass2, 2, "pass2 == 2");
+	expected[1] = 4;
+	ASSERT_IN_ARRAY(skel->bss->pass2, expected, "pass2 in [0 4]");
 
 	err = bpf_prog_get_info_by_fd(bpf_program__fd(skel->progs.on_delete),
 				      &prog_info, &prog_info_len);
diff --git a/tools/testing/selftests/bpf/test_progs.h b/tools/testing/selftests/bpf/test_progs.h
index 10ba432..79e96cc 100644
--- a/tools/testing/selftests/bpf/test_progs.h
+++ b/tools/testing/selftests/bpf/test_progs.h
@@ -245,6 +245,25 @@ struct msg {
 	___ok;								\
 })
 
+#define ASSERT_IN_ARRAY(actual, expected, name) ({			\
+	static int duration;						\
+	typeof(actual) ___act = (actual);				\
+	typeof((expected)[0]) * ___exp = (expected);			\
+	bool ___ok = false;						\
+	int i;								\
+									\
+	for (i = 0; i < ARRAY_SIZE(expected); i++) {			\
+		if (___act == ___exp[i]) {				\
+			___ok = true;					\
+			break;						\
+		}							\
+	}								\
+	CHECK(!___ok, (name),						\
+	      "unexpected %s: actual %lld not in array\n",		\
+	      (name), (long long)(___act));				\
+	___ok;								\
+})
+
 #define ASSERT_NEQ(actual, expected, name) ({				\
 	static int duration = 0;					\
 	typeof(actual) ___act = (actual);				\
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-17 15:47 [PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement Yafang Shao
                   ` (3 preceding siblings ...)
  2023-04-17 15:47 ` [PATCH bpf-next 4/6] selftests/bpf: Allow one single recursion in fentry recursion test Yafang Shao
@ 2023-04-17 15:47 ` Yafang Shao
  2023-04-17 20:14   ` Alexei Starovoitov
                     ` (2 more replies)
  2023-04-17 15:47 ` [PATCH bpf-next 6/6] bpf: Remove some denied functions from the btf id deny list Yafang Shao
  5 siblings, 3 replies; 25+ messages in thread
From: Yafang Shao @ 2023-04-17 15:47 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat
  Cc: bpf, linux-trace-kernel, linux-kernel, Yafang Shao

Currently we use prog->active to prevent tracing recursion, but it has
some downsides,

- It can't identify different contexts
  That said, if a process context is interrupted by a irq context and
  the irq context runs the same code path, it will be considered as
  recursion. For example,
    normal:
      this_cpu_inc_return(*(prog->active)) == 1 <- OK

      irq:
        this_cpu_inc_return(*(prog->active)) == 1 <- FAIL!
        [ Considered as recusion ]

- It has to maintain a percpu area
  A percpu area will be allocated for each prog when the prog is loaded
  and be freed when the prog is destroyed.

Let's replace it with the generic tracing recursion prevention mechanism,
which can work fine with anything. In the above example, the irq context
won't be considered as recursion again,
  normal:
    test_recursion_try_acquire() <- OK

    softirq:
      test_recursion_try_acquire() <- OK

      irq:
        test_recursion_try_acquire() <- OK

Note that, currently one single recursion in process context is allowed
due to the TRACE_CTX_TRANSITION workaround, which can be fixed in the
future. That said, below behavior is expected currently,
  normal:
    test_recursion_try_acquire() <- OK
    [ recursion happens ]        <- one single recursion is allowed
    test_recursion_try_acquire() <- OK
    [ recursion happens ]
    test_recursion_try_acquire() <- RECURSION!

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/bpf.h      |  2 +-
 kernel/bpf/core.c        | 10 ----------
 kernel/bpf/trampoline.c  | 44 +++++++++++++++++++++++++++++++++-----------
 kernel/trace/bpf_trace.c | 12 +++++++-----
 4 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 18b592f..c42ff90 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1467,7 +1467,6 @@ struct bpf_prog {
 	u32			jited_len;	/* Size of jited insns in bytes */
 	u8			tag[BPF_TAG_SIZE];
 	struct bpf_prog_stats __percpu *stats;
-	int __percpu		*active;
 	unsigned int		(*bpf_func)(const void *ctx,
 					    const struct bpf_insn *insn);
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
@@ -1813,6 +1812,7 @@ struct bpf_tramp_run_ctx {
 	struct bpf_run_ctx run_ctx;
 	u64 bpf_cookie;
 	struct bpf_run_ctx *saved_run_ctx;
+	int recursion_bit;
 };
 
 static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 7421487..0942ab2 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -103,12 +103,6 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
 		vfree(fp);
 		return NULL;
 	}
-	fp->active = alloc_percpu_gfp(int, bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags));
-	if (!fp->active) {
-		vfree(fp);
-		kfree(aux);
-		return NULL;
-	}
 
 	fp->pages = size / PAGE_SIZE;
 	fp->aux = aux;
@@ -138,7 +132,6 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
 
 	prog->stats = alloc_percpu_gfp(struct bpf_prog_stats, gfp_flags);
 	if (!prog->stats) {
-		free_percpu(prog->active);
 		kfree(prog->aux);
 		vfree(prog);
 		return NULL;
@@ -256,7 +249,6 @@ struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size,
 		 */
 		fp_old->aux = NULL;
 		fp_old->stats = NULL;
-		fp_old->active = NULL;
 		__bpf_prog_free(fp_old);
 	}
 
@@ -272,7 +264,6 @@ void __bpf_prog_free(struct bpf_prog *fp)
 		kfree(fp->aux);
 	}
 	free_percpu(fp->stats);
-	free_percpu(fp->active);
 	vfree(fp);
 }
 
@@ -1385,7 +1376,6 @@ static void bpf_prog_clone_free(struct bpf_prog *fp)
 	 */
 	fp->aux = NULL;
 	fp->stats = NULL;
-	fp->active = NULL;
 	__bpf_prog_free(fp);
 }
 
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index f61d513..3df39a5 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
 static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
 	__acquires(RCU)
 {
-	rcu_read_lock();
-	migrate_disable();
-
-	run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
+	int bit;
 
-	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
+	rcu_read_lock();
+	bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
+	run_ctx->recursion_bit = bit;
+	if (bit < 0) {
+		preempt_disable_notrace();
 		bpf_prog_inc_misses_counter(prog);
+		preempt_enable_notrace();
 		return 0;
 	}
+
+	migrate_disable();
+
+	run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
 	return bpf_prog_start_time();
 }
 
@@ -880,11 +886,16 @@ static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start,
 					  struct bpf_tramp_run_ctx *run_ctx)
 	__releases(RCU)
 {
+	if (run_ctx->recursion_bit < 0)
+		goto out;
+
 	bpf_reset_run_ctx(run_ctx->saved_run_ctx);
 
 	update_prog_stats(prog, start);
-	this_cpu_dec(*(prog->active));
 	migrate_enable();
+	test_recursion_release(run_ctx->recursion_bit);
+
+out:
 	rcu_read_unlock();
 }
 
@@ -916,15 +927,21 @@ static void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start,
 u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog,
 					     struct bpf_tramp_run_ctx *run_ctx)
 {
-	rcu_read_lock_trace();
-	migrate_disable();
-	might_fault();
+	int bit;
 
-	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
+	rcu_read_lock_trace();
+	bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
+	run_ctx->recursion_bit = bit;
+	if (bit < 0) {
+		preempt_disable_notrace();
 		bpf_prog_inc_misses_counter(prog);
+		preempt_enable_notrace();
 		return 0;
 	}
 
+	migrate_disable();
+	might_fault();
+
 	run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
 
 	return bpf_prog_start_time();
@@ -933,11 +950,16 @@ u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog,
 void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start,
 					     struct bpf_tramp_run_ctx *run_ctx)
 {
+	if (run_ctx->recursion_bit < 0)
+		goto out;
+
 	bpf_reset_run_ctx(run_ctx->saved_run_ctx);
 
 	update_prog_stats(prog, start);
-	this_cpu_dec(*(prog->active));
 	migrate_enable();
+	test_recursion_release(run_ctx->recursion_bit);
+
+out:
 	rcu_read_unlock_trace();
 }
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index bcf91bc..bb9a4c9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2250,16 +2250,18 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
 static __always_inline
 void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 {
-	cant_sleep();
-	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
+	int bit;
+
+	bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
+	if (bit < 0) {
 		bpf_prog_inc_misses_counter(prog);
-		goto out;
+		return;
 	}
+	cant_sleep();
 	rcu_read_lock();
 	(void) bpf_prog_run(prog, args);
 	rcu_read_unlock();
-out:
-	this_cpu_dec(*(prog->active));
+	test_recursion_release(bit);
 }
 
 #define UNPACK(...)			__VA_ARGS__
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH bpf-next 6/6] bpf: Remove some denied functions from the btf id deny list
  2023-04-17 15:47 [PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement Yafang Shao
                   ` (4 preceding siblings ...)
  2023-04-17 15:47 ` [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism Yafang Shao
@ 2023-04-17 15:47 ` Yafang Shao
  5 siblings, 0 replies; 25+ messages in thread
From: Yafang Shao @ 2023-04-17 15:47 UTC (permalink / raw)
  To: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat
  Cc: bpf, linux-trace-kernel, linux-kernel, Yafang Shao

With the generic tracing recursion prevention mechanism applied, it is
safe to trace migrate_{disable,enable} and preempt_count_{sub,add}. So
we can remove them from the deny list.
However we can't remove rcu_read_unlock_strict and
__rcu_read_{lock,unlock}, because they are used in rcu_read_unlock() or
rcu_read_lock().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 kernel/bpf/verifier.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 83fb94f..40f6b2c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18634,17 +18634,9 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 
 BTF_SET_START(btf_id_deny)
 BTF_ID_UNUSED
-#ifdef CONFIG_SMP
-BTF_ID(func, migrate_disable)
-BTF_ID(func, migrate_enable)
-#endif
 #if !defined CONFIG_PREEMPT_RCU && !defined CONFIG_TINY_RCU
 BTF_ID(func, rcu_read_unlock_strict)
 #endif
-#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
-BTF_ID(func, preempt_count_add)
-BTF_ID(func, preempt_count_sub)
-#endif
 #ifdef CONFIG_PREEMPT_RCU
 BTF_ID(func, __rcu_read_lock)
 BTF_ID(func, __rcu_read_unlock)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-17 15:47 ` [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism Yafang Shao
@ 2023-04-17 20:14   ` Alexei Starovoitov
  2023-04-18  1:49     ` Yafang Shao
  2023-04-17 23:29   ` kernel test robot
  2023-04-27 13:26   ` Steven Rostedt
  2 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2023-04-17 20:14 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat, bpf,
	linux-trace-kernel, linux-kernel

On Mon, Apr 17, 2023 at 03:47:36PM +0000, Yafang Shao wrote:
> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index f61d513..3df39a5 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
>  static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
>  	__acquires(RCU)
>  {
> -	rcu_read_lock();
> -	migrate_disable();
> -
> -	run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
> +	int bit;
>  
> -	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> +	rcu_read_lock();
> +	bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);

and bpf will prevent ftrace to run and vice versa.
Not a good idea.

One bpf prog will prevent different bpf prog to run since they share current task.
Not a good idea either.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-17 15:47 ` [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism Yafang Shao
  2023-04-17 20:14   ` Alexei Starovoitov
@ 2023-04-17 23:29   ` kernel test robot
  2023-04-27 13:26   ` Steven Rostedt
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2023-04-17 23:29 UTC (permalink / raw)
  To: Yafang Shao, ast, daniel, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat
  Cc: oe-kbuild-all, bpf, linux-trace-kernel, linux-kernel, Yafang Shao

Hi Yafang,

kernel test robot noticed the following build errors:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Yafang-Shao/bpf-Add-__rcu_read_-lock-unlock-into-btf-id-deny-list/20230417-235009
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20230417154737.12740-6-laoar.shao%40gmail.com
patch subject: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
config: loongarch-defconfig (https://download.01.org/0day-ci/archive/20230418/202304180736.cWjpwhs6-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/ac84d2623c0469a703245030f2b23612ab4505dd
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Yafang-Shao/bpf-Add-__rcu_read_-lock-unlock-into-btf-id-deny-list/20230417-235009
        git checkout ac84d2623c0469a703245030f2b23612ab4505dd
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304180736.cWjpwhs6-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/bpf/trampoline.c: In function '__bpf_prog_enter_recur':
>> kernel/bpf/trampoline.c:848:15: error: implicit declaration of function 'test_recursion_try_acquire' [-Werror=implicit-function-declaration]
     848 |         bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
         |               ^~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/bpf/trampoline.c: In function '__bpf_prog_exit_recur':
>> kernel/bpf/trampoline.c:896:9: error: implicit declaration of function 'test_recursion_release'; did you mean 'dev_recursion_level'? [-Werror=implicit-function-declaration]
     896 |         test_recursion_release(run_ctx->recursion_bit);
         |         ^~~~~~~~~~~~~~~~~~~~~~
         |         dev_recursion_level
   cc1: some warnings being treated as errors


vim +/test_recursion_try_acquire +848 kernel/bpf/trampoline.c

   828	
   829	/* The logic is similar to bpf_prog_run(), but with an explicit
   830	 * rcu_read_lock() and migrate_disable() which are required
   831	 * for the trampoline. The macro is split into
   832	 * call __bpf_prog_enter
   833	 * call prog->bpf_func
   834	 * call __bpf_prog_exit
   835	 *
   836	 * __bpf_prog_enter returns:
   837	 * 0 - skip execution of the bpf prog
   838	 * 1 - execute bpf prog
   839	 * [2..MAX_U64] - execute bpf prog and record execution time.
   840	 *     This is start time.
   841	 */
   842	static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
   843		__acquires(RCU)
   844	{
   845		int bit;
   846	
   847		rcu_read_lock();
 > 848		bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
   849		run_ctx->recursion_bit = bit;
   850		if (bit < 0) {
   851			preempt_disable_notrace();
   852			bpf_prog_inc_misses_counter(prog);
   853			preempt_enable_notrace();
   854			return 0;
   855		}
   856	
   857		migrate_disable();
   858	
   859		run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
   860		return bpf_prog_start_time();
   861	}
   862	
   863	static void notrace update_prog_stats(struct bpf_prog *prog,
   864					      u64 start)
   865	{
   866		struct bpf_prog_stats *stats;
   867	
   868		if (static_branch_unlikely(&bpf_stats_enabled_key) &&
   869		    /* static_key could be enabled in __bpf_prog_enter*
   870		     * and disabled in __bpf_prog_exit*.
   871		     * And vice versa.
   872		     * Hence check that 'start' is valid.
   873		     */
   874		    start > NO_START_TIME) {
   875			unsigned long flags;
   876	
   877			stats = this_cpu_ptr(prog->stats);
   878			flags = u64_stats_update_begin_irqsave(&stats->syncp);
   879			u64_stats_inc(&stats->cnt);
   880			u64_stats_add(&stats->nsecs, sched_clock() - start);
   881			u64_stats_update_end_irqrestore(&stats->syncp, flags);
   882		}
   883	}
   884	
   885	static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start,
   886						  struct bpf_tramp_run_ctx *run_ctx)
   887		__releases(RCU)
   888	{
   889		if (run_ctx->recursion_bit < 0)
   890			goto out;
   891	
   892		bpf_reset_run_ctx(run_ctx->saved_run_ctx);
   893	
   894		update_prog_stats(prog, start);
   895		migrate_enable();
 > 896		test_recursion_release(run_ctx->recursion_bit);
   897	
   898	out:
   899		rcu_read_unlock();
   900	}
   901	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-17 20:14   ` Alexei Starovoitov
@ 2023-04-18  1:49     ` Yafang Shao
  2023-04-18 15:38       ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Yafang Shao @ 2023-04-18  1:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, mhiramat, bpf,
	linux-trace-kernel, linux-kernel

On Tue, Apr 18, 2023 at 4:15 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Apr 17, 2023 at 03:47:36PM +0000, Yafang Shao wrote:
> > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > index f61d513..3df39a5 100644
> > --- a/kernel/bpf/trampoline.c
> > +++ b/kernel/bpf/trampoline.c
> > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
> >  static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
> >       __acquires(RCU)
> >  {
> > -     rcu_read_lock();
> > -     migrate_disable();
> > -
> > -     run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
> > +     int bit;
> >
> > -     if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> > +     rcu_read_lock();
> > +     bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
>
> and bpf will prevent ftrace to run and vice versa.
> Not a good idea.
>
> One bpf prog will prevent different bpf prog to run since they share current task.
> Not a good idea either.

That shouldn't happen. test_recursion_try_acquire() uses a
per-task_struct value. One single task_struct can't run in parallel,
right?
Note that the bpf program running in softirq or irq context won't be
prevented by it.
IIUC, the bpf program should run in serial in one single task, right?
That said, one bpf program can only run after another bpf program
finished in the same task?

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-18  1:49     ` Yafang Shao
@ 2023-04-18 15:38       ` Alexei Starovoitov
  2023-04-19 11:46         ` Yafang Shao
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2023-04-18 15:38 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Steven Rostedt,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Mon, Apr 17, 2023 at 6:49 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Apr 18, 2023 at 4:15 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Apr 17, 2023 at 03:47:36PM +0000, Yafang Shao wrote:
> > > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > > index f61d513..3df39a5 100644
> > > --- a/kernel/bpf/trampoline.c
> > > +++ b/kernel/bpf/trampoline.c
> > > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
> > >  static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
> > >       __acquires(RCU)
> > >  {
> > > -     rcu_read_lock();
> > > -     migrate_disable();
> > > -
> > > -     run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
> > > +     int bit;
> > >
> > > -     if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> > > +     rcu_read_lock();
> > > +     bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
> >
> > and bpf will prevent ftrace to run and vice versa.
> > Not a good idea.
> >
> > One bpf prog will prevent different bpf prog to run since they share current task.
> > Not a good idea either.
>
> That shouldn't happen. test_recursion_try_acquire() uses a
> per-task_struct value. One single task_struct can't run in parallel,
> right?
> Note that the bpf program running in softirq or irq context won't be
> prevented by it.
> IIUC, the bpf program should run in serial in one single task, right?
> That said, one bpf program can only run after another bpf program
> finished in the same task?

bpf progs can nest in the same task.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-18 15:38       ` Alexei Starovoitov
@ 2023-04-19 11:46         ` Yafang Shao
       [not found]           ` <CAADnVQ+FO-+1OALTtgVkcpH3Adc6xS9qjzORyq2vwVtwY2UoxQ@mail.gmail.com>
  0 siblings, 1 reply; 25+ messages in thread
From: Yafang Shao @ 2023-04-19 11:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Steven Rostedt,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Tue, Apr 18, 2023 at 11:38 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Apr 17, 2023 at 6:49 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Apr 18, 2023 at 4:15 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Apr 17, 2023 at 03:47:36PM +0000, Yafang Shao wrote:
> > > > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > > > index f61d513..3df39a5 100644
> > > > --- a/kernel/bpf/trampoline.c
> > > > +++ b/kernel/bpf/trampoline.c
> > > > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
> > > >  static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
> > > >       __acquires(RCU)
> > > >  {
> > > > -     rcu_read_lock();
> > > > -     migrate_disable();
> > > > -
> > > > -     run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
> > > > +     int bit;
> > > >
> > > > -     if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> > > > +     rcu_read_lock();
> > > > +     bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
> > >
> > > and bpf will prevent ftrace to run and vice versa.
> > > Not a good idea.
> > >
> > > One bpf prog will prevent different bpf prog to run since they share current task.
> > > Not a good idea either.
> >
> > That shouldn't happen. test_recursion_try_acquire() uses a
> > per-task_struct value. One single task_struct can't run in parallel,
> > right?
> > Note that the bpf program running in softirq or irq context won't be
> > prevented by it.
> > IIUC, the bpf program should run in serial in one single task, right?
> > That said, one bpf program can only run after another bpf program
> > finished in the same task?
>
> bpf progs can nest in the same task.

Do you mean the tail_call ?

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 2/6] tracing: Add generic test_recursion_try_acquire()
  2023-04-17 15:47 ` [PATCH bpf-next 2/6] tracing: Add generic test_recursion_try_acquire() Yafang Shao
@ 2023-04-20  6:51   ` Masami Hiramatsu
  0 siblings, 0 replies; 25+ messages in thread
From: Masami Hiramatsu @ 2023-04-20  6:51 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, rostedt, bpf, linux-trace-kernel,
	linux-kernel

On Mon, 17 Apr 2023 15:47:33 +0000
Yafang Shao <laoar.shao@gmail.com> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> The ftrace_test_recursion_trylock() also disables preemption. This is not
> required, but was a clean up as every place that called it also disabled
> preemption, and making the two tightly coupled appeared to make the code
> simpler.
> 
> But the recursion protection can be used for other purposes that do not
> require disabling preemption. As the recursion bits are attached to the
> task_struct, it follows the task, so there's no need for preemption being
> disabled.
> 
> Add test_recursion_try_acquire/release() functions to be used generically,
> and separate it from being associated with ftrace. It also removes the
> "lock" name, as there is no lock happening. Keeping the "lock" for the
> ftrace version is better, as it at least differentiates that preemption is
> being disabled (hence, "locking the CPU").
> 
> Link: https://lore.kernel.org/linux-trace-kernel/20230321020103.13494-1-laoar.shao@gmail.com/
> 
> Acked-by: Yafang Shao <laoar.shao@gmail.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

This looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

> ---
>  include/linux/trace_recursion.h | 47 ++++++++++++++++++++++++++++++-----------
>  kernel/trace/ftrace.c           |  2 ++
>  2 files changed, 37 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index d48cd92..80de2ee 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -150,9 +150,6 @@ static __always_inline int trace_get_context_bit(void)
>  # define trace_warn_on_no_rcu(ip)	false
>  #endif
>  
> -/*
> - * Preemption is promised to be disabled when return bit >= 0.
> - */
>  static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
>  							int start)
>  {
> @@ -182,18 +179,11 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
>  	val |= 1 << bit;
>  	current->trace_recursion = val;
>  	barrier();
> -
> -	preempt_disable_notrace();
> -
>  	return bit;
>  }
>  
> -/*
> - * Preemption will be enabled (if it was previously enabled).
> - */
>  static __always_inline void trace_clear_recursion(int bit)
>  {
> -	preempt_enable_notrace();
>  	barrier();
>  	trace_recursion_clear(bit);
>  }
> @@ -205,12 +195,18 @@ static __always_inline void trace_clear_recursion(int bit)
>   * tracing recursed in the same context (normal vs interrupt),
>   *
>   * Returns: -1 if a recursion happened.
> - *           >= 0 if no recursion.
> + *           >= 0 if no recursion and preemption will be disabled.
>   */
>  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>  							 unsigned long parent_ip)
>  {
> -	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> +	int bit;
> +
> +	bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> +	if (unlikely(bit < 0))
> +		return bit;
> +	preempt_disable_notrace();
> +	return bit;
>  }
>  
>  /**
> @@ -221,6 +217,33 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>   */
>  static __always_inline void ftrace_test_recursion_unlock(int bit)
>  {
> +	preempt_enable_notrace();
> +	trace_clear_recursion(bit);
> +}
> +
> +/**
> + * test_recursion_try_acquire - tests for recursion in same context
> + *
> + * This will detect recursion of a function.
> + *
> + * Returns: -1 if a recursion happened.
> + *           >= 0 if no recursion
> + */
> +static __always_inline int test_recursion_try_acquire(unsigned long ip,
> +						      unsigned long parent_ip)
> +{
> +	return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> +}
> +
> +/**
> + * test_recursion_release - called after a success of test_recursion_try_acquire()
> + * @bit: The return of a successful test_recursion_try_acquire()
> + *
> + * This releases the recursion lock taken by a non-negative return call
> + * by test_recursion_try_acquire().
> + */
> +static __always_inline void test_recursion_release(int bit)
> +{
>  	trace_clear_recursion(bit);
>  }
>  
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index c67bcc8..8ad3ab4 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7647,6 +7647,7 @@ void ftrace_reset_array_ops(struct trace_array *tr)
>  	if (bit < 0)
>  		return;
>  
> +	preempt_disable();
>  	do_for_each_ftrace_op(op, ftrace_ops_list) {
>  		/* Stub functions don't need to be called nor tested */
>  		if (op->flags & FTRACE_OPS_FL_STUB)
> @@ -7668,6 +7669,7 @@ void ftrace_reset_array_ops(struct trace_array *tr)
>  		}
>  	} while_for_each_ftrace_op(op);
>  out:
> +	preempt_enable();
>  	trace_clear_recursion(bit);
>  }
>  
> -- 
> 1.8.3.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
       [not found]           ` <CAADnVQ+FO-+1OALTtgVkcpH3Adc6xS9qjzORyq2vwVtwY2UoxQ@mail.gmail.com>
@ 2023-04-24 21:40             ` Steven Rostedt
  2023-04-27  9:57               ` Yafang Shao
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-04-24 21:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Wed, 19 Apr 2023 15:46:34 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> No. Just one prog at entry into any of the kernel functions
> and another prog at entry of funcs that 1st bpf prog called indirectly.
> Like one prog is tracing networking events while another
> is focusing on mm. They should not conflict.

You mean that you have:

function start:
  __bpf_prog_enter_recur()
    bpf_program1()
      __bpf_prog_enter_recur()
        bpf_program2();
      __bpf_prog_exit_recur()
  __bpf_prog_exit_recur()

  rest of function

That is, a bpf program can be called within another bpf pogram between
the prog_enter and prog_exit(), that is in the same context (normal,
softirq, irq, etc)?

The protection is on the trampoline where the bpf program is called.
Not sure how ftrace can stop BPF or BPF stop ftrace, unless bpf is
tracing a ftrace callback, or ftrace is tracing a bpf function.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-24 21:40             ` Steven Rostedt
@ 2023-04-27  9:57               ` Yafang Shao
  2023-04-27 12:15                 ` Yafang Shao
  0 siblings, 1 reply; 25+ messages in thread
From: Yafang Shao @ 2023-04-27  9:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Tue, Apr 25, 2023 at 5:40 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 19 Apr 2023 15:46:34 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > No. Just one prog at entry into any of the kernel functions
> > and another prog at entry of funcs that 1st bpf prog called indirectly.
> > Like one prog is tracing networking events while another
> > is focusing on mm. They should not conflict.
>
> You mean that you have:
>
> function start:
>   __bpf_prog_enter_recur()
>     bpf_program1()
>       __bpf_prog_enter_recur()
>         bpf_program2();
>       __bpf_prog_exit_recur()
>   __bpf_prog_exit_recur()
>
>   rest of function
>
> That is, a bpf program can be called within another bpf pogram between
> the prog_enter and prog_exit(), that is in the same context (normal,
> softirq, irq, etc)?
>

Right, that can happen per my verification. Below is a simple bpf
program to verify it.

struct {
    __uint(type, BPF_MAP_TYPE_LPM_TRIE);
    __type(key, __u64);
    __type(value, __u64);
    __uint(max_entries, 1024);
    __uint(map_flags, BPF_F_NO_PREALLOC);
} write_map SEC(".maps");

__u64 key;

SEC("fentry/kernel_clone")
int program1()
{
    __u64 value = 1;

    bpf_printk("before update");
    // It will call trie_update_elem and thus trigger program2.
    bpf_map_update_elem(&write_map, &key, &value, BPF_ANY);
    __sync_fetch_and_add(&key, 1);
    bpf_printk("after update");
    return 0;
}

SEC("fentry/trie_update_elem")
int program2()
{
    bpf_printk("trie_update_elem");
    return 0;
}

The result as follows,

         kubelet-203203  [018] ....1  9579.862862:
__bpf_prog_enter_recur: __bpf_prog_enter_recur
         kubelet-203203  [018] ...11  9579.862869: bpf_trace_printk:
before update
         kubelet-203203  [018] ....2  9579.862869:
__bpf_prog_enter_recur: __bpf_prog_enter_recur
         kubelet-203203  [018] ...12  9579.862870: bpf_trace_printk:
trie_update_elem
         kubelet-203203  [018] ....2  9579.862870:
__bpf_prog_exit_recur: __bpf_prog_exit_recur
         kubelet-203203  [018] ...11  9579.862870: bpf_trace_printk:
after update
         kubelet-203203  [018] ....1  9579.862871:
__bpf_prog_exit_recur: __bpf_prog_exit_recur

Note that we can't trace __bpf_prog_enter_recur and
__bpf_prog_exit_recur, so we have to modify the kernel to print them.

> The protection is on the trampoline where the bpf program is called.
> Not sure how ftrace can stop BPF or BPF stop ftrace, unless bpf is
> tracing a ftrace callback, or ftrace is tracing a bpf function.
>
> -- Steve



-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27  9:57               ` Yafang Shao
@ 2023-04-27 12:15                 ` Yafang Shao
  2023-04-27 12:35                   ` Yafang Shao
  0 siblings, 1 reply; 25+ messages in thread
From: Yafang Shao @ 2023-04-27 12:15 UTC (permalink / raw)
  To: Steven Rostedt, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Thu, Apr 27, 2023 at 5:57 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Apr 25, 2023 at 5:40 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 19 Apr 2023 15:46:34 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > No. Just one prog at entry into any of the kernel functions
> > > and another prog at entry of funcs that 1st bpf prog called indirectly.
> > > Like one prog is tracing networking events while another
> > > is focusing on mm. They should not conflict.
> >
> > You mean that you have:
> >
> > function start:
> >   __bpf_prog_enter_recur()
> >     bpf_program1()
> >       __bpf_prog_enter_recur()
> >         bpf_program2();
> >       __bpf_prog_exit_recur()
> >   __bpf_prog_exit_recur()
> >
> >   rest of function
> >
> > That is, a bpf program can be called within another bpf pogram between
> > the prog_enter and prog_exit(), that is in the same context (normal,
> > softirq, irq, etc)?
> >
>
> Right, that can happen per my verification. Below is a simple bpf
> program to verify it.
>
> struct {
>     __uint(type, BPF_MAP_TYPE_LPM_TRIE);
>     __type(key, __u64);
>     __type(value, __u64);
>     __uint(max_entries, 1024);
>     __uint(map_flags, BPF_F_NO_PREALLOC);
> } write_map SEC(".maps");
>
> __u64 key;
>
> SEC("fentry/kernel_clone")
> int program1()
> {
>     __u64 value = 1;
>
>     bpf_printk("before update");
>     // It will call trie_update_elem and thus trigger program2.
>     bpf_map_update_elem(&write_map, &key, &value, BPF_ANY);
>     __sync_fetch_and_add(&key, 1);
>     bpf_printk("after update");
>     return 0;
> }
>
> SEC("fentry/trie_update_elem")
> int program2()
> {
>     bpf_printk("trie_update_elem");
>     return 0;
> }
>
> The result as follows,
>
>          kubelet-203203  [018] ....1  9579.862862:
> __bpf_prog_enter_recur: __bpf_prog_enter_recur
>          kubelet-203203  [018] ...11  9579.862869: bpf_trace_printk:
> before update
>          kubelet-203203  [018] ....2  9579.862869:
> __bpf_prog_enter_recur: __bpf_prog_enter_recur
>          kubelet-203203  [018] ...12  9579.862870: bpf_trace_printk:
> trie_update_elem
>          kubelet-203203  [018] ....2  9579.862870:
> __bpf_prog_exit_recur: __bpf_prog_exit_recur
>          kubelet-203203  [018] ...11  9579.862870: bpf_trace_printk:
> after update
>          kubelet-203203  [018] ....1  9579.862871:
> __bpf_prog_exit_recur: __bpf_prog_exit_recur
>
> Note that we can't trace __bpf_prog_enter_recur and
> __bpf_prog_exit_recur, so we have to modify the kernel to print them.
>

... However, surprisingly it still works even after this patchset is
applied, because the hardirq/softirq flag is set when the program2 is
running, see also the flags in the above trace_pipe output. Is that
expected ?!
I need  some time to figure it out, but maybe you have a quick answer...

> > The protection is on the trampoline where the bpf program is called.
> > Not sure how ftrace can stop BPF or BPF stop ftrace, unless bpf is
> > tracing a ftrace callback, or ftrace is tracing a bpf function.
> >
> > -- Steve
>


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27 12:15                 ` Yafang Shao
@ 2023-04-27 12:35                   ` Yafang Shao
  0 siblings, 0 replies; 25+ messages in thread
From: Yafang Shao @ 2023-04-27 12:35 UTC (permalink / raw)
  To: Steven Rostedt, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Thu, Apr 27, 2023 at 8:15 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 5:57 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Apr 25, 2023 at 5:40 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Wed, 19 Apr 2023 15:46:34 -0700
> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > > No. Just one prog at entry into any of the kernel functions
> > > > and another prog at entry of funcs that 1st bpf prog called indirectly.
> > > > Like one prog is tracing networking events while another
> > > > is focusing on mm. They should not conflict.
> > >
> > > You mean that you have:
> > >
> > > function start:
> > >   __bpf_prog_enter_recur()
> > >     bpf_program1()
> > >       __bpf_prog_enter_recur()
> > >         bpf_program2();
> > >       __bpf_prog_exit_recur()
> > >   __bpf_prog_exit_recur()
> > >
> > >   rest of function
> > >
> > > That is, a bpf program can be called within another bpf pogram between
> > > the prog_enter and prog_exit(), that is in the same context (normal,
> > > softirq, irq, etc)?
> > >
> >
> > Right, that can happen per my verification. Below is a simple bpf
> > program to verify it.
> >
> > struct {
> >     __uint(type, BPF_MAP_TYPE_LPM_TRIE);
> >     __type(key, __u64);
> >     __type(value, __u64);
> >     __uint(max_entries, 1024);
> >     __uint(map_flags, BPF_F_NO_PREALLOC);
> > } write_map SEC(".maps");
> >
> > __u64 key;
> >
> > SEC("fentry/kernel_clone")
> > int program1()
> > {
> >     __u64 value = 1;
> >
> >     bpf_printk("before update");
> >     // It will call trie_update_elem and thus trigger program2.
> >     bpf_map_update_elem(&write_map, &key, &value, BPF_ANY);
> >     __sync_fetch_and_add(&key, 1);
> >     bpf_printk("after update");
> >     return 0;
> > }
> >
> > SEC("fentry/trie_update_elem")
> > int program2()
> > {
> >     bpf_printk("trie_update_elem");
> >     return 0;
> > }
> >
> > The result as follows,
> >
> >          kubelet-203203  [018] ....1  9579.862862:
> > __bpf_prog_enter_recur: __bpf_prog_enter_recur
> >          kubelet-203203  [018] ...11  9579.862869: bpf_trace_printk:
> > before update
> >          kubelet-203203  [018] ....2  9579.862869:
> > __bpf_prog_enter_recur: __bpf_prog_enter_recur
> >          kubelet-203203  [018] ...12  9579.862870: bpf_trace_printk:
> > trie_update_elem
> >          kubelet-203203  [018] ....2  9579.862870:
> > __bpf_prog_exit_recur: __bpf_prog_exit_recur
> >          kubelet-203203  [018] ...11  9579.862870: bpf_trace_printk:
> > after update
> >          kubelet-203203  [018] ....1  9579.862871:
> > __bpf_prog_exit_recur: __bpf_prog_exit_recur
> >
> > Note that we can't trace __bpf_prog_enter_recur and
> > __bpf_prog_exit_recur, so we have to modify the kernel to print them.
> >
>
> ... However, surprisingly it still works even after this patchset is
> applied, because the hardirq/softirq flag is set when the program2 is
> running, see also the flags in the above trace_pipe output. Is that
> expected ?!
> I need  some time to figure it out, but maybe you have a quick answer...

Answer it by myself, that is because of the
allowing-one-single-recursion rule. I misread the trace flags before.
Sorry about the noise.


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-17 15:47 ` [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism Yafang Shao
  2023-04-17 20:14   ` Alexei Starovoitov
  2023-04-17 23:29   ` kernel test robot
@ 2023-04-27 13:26   ` Steven Rostedt
  2023-04-27 14:22     ` Yafang Shao
  2 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-04-27 13:26 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, mhiramat, bpf, linux-trace-kernel,
	linux-kernel

On Mon, 17 Apr 2023 15:47:36 +0000
Yafang Shao <laoar.shao@gmail.com> wrote:

> diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> index f61d513..3df39a5 100644
> --- a/kernel/bpf/trampoline.c
> +++ b/kernel/bpf/trampoline.c
> @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
>  static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
>  	__acquires(RCU)

Because __bpf_prog_enter_recur() and __bpf_prog_exit_recur() can
legitimately nest (as you pointed out later in the thread), I think my
original plan is the way to go.



>  {
> -	rcu_read_lock();
> -	migrate_disable();
> -
> -	run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
> +	int bit;
>  
> -	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> +	rcu_read_lock();
> +	bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
> +	run_ctx->recursion_bit = bit;
> +	if (bit < 0) {
> +		preempt_disable_notrace();
>  		bpf_prog_inc_misses_counter(prog);
> +		preempt_enable_notrace();
>  		return 0;
>  	}
> +
> +	migrate_disable();

Just encompass the migrate_disable/enable() with the recursion protection.

That is, here add:

	test_recursion_release(recursion_bit);

No need to save it in the run_ctx, as you can use a local variable.

As I mentioned, if it passes when checking migrate_disable() it will also
pass when checking around migrate_enable() so the two will still be paired
properly, even if only the migrate_enable() starts recursing.


  bit = test_recursion_try_acquire() // OK
  if (bit < 0)
	return;
  migrate_disable();
  test_recursion_release(bit);

  [..]

  bit = test_recursion_try_acquire() // OK
  migrate_enable() // traced and recurses...

    bit = test_recursion_try_acquire() // fails
    if (bit < 0)
          return; // returns here
    migrate_disable() // does not get called.

The recursion around migrate_disable/enable() is needed because it's done
before other checks. You can't attach the test_recursion logic to the
__bpf_prog_enter/exit() routines, because those can legitimately recurse.

-- Steve


> +
> +	run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
>  	return bpf_prog_start_time();
>  }

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27 13:26   ` Steven Rostedt
@ 2023-04-27 14:22     ` Yafang Shao
  2023-04-27 15:18       ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Yafang Shao @ 2023-04-27 14:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, mhiramat, bpf, linux-trace-kernel,
	linux-kernel

On Thu, Apr 27, 2023 at 9:26 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 17 Apr 2023 15:47:36 +0000
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
> > index f61d513..3df39a5 100644
> > --- a/kernel/bpf/trampoline.c
> > +++ b/kernel/bpf/trampoline.c
> > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void)
> >  static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx)
> >       __acquires(RCU)
>
> Because __bpf_prog_enter_recur() and __bpf_prog_exit_recur() can
> legitimately nest (as you pointed out later in the thread), I think my
> original plan is the way to go.
>
>
>
> >  {
> > -     rcu_read_lock();
> > -     migrate_disable();
> > -
> > -     run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx);
> > +     int bit;
> >
> > -     if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
> > +     rcu_read_lock();
> > +     bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_);
> > +     run_ctx->recursion_bit = bit;
> > +     if (bit < 0) {
> > +             preempt_disable_notrace();
> >               bpf_prog_inc_misses_counter(prog);
> > +             preempt_enable_notrace();
> >               return 0;
> >       }
> > +
> > +     migrate_disable();
>
> Just encompass the migrate_disable/enable() with the recursion protection.
>
> That is, here add:
>
>         test_recursion_release(recursion_bit);
>
> No need to save it in the run_ctx, as you can use a local variable.
>
> As I mentioned, if it passes when checking migrate_disable() it will also
> pass when checking around migrate_enable() so the two will still be paired
> properly, even if only the migrate_enable() starts recursing.
>
>
>   bit = test_recursion_try_acquire() // OK
>   if (bit < 0)
>         return;
>   migrate_disable();
>   test_recursion_release(bit);
>
>   [..]
>
>   bit = test_recursion_try_acquire() // OK
>   migrate_enable() // traced and recurses...
>
>     bit = test_recursion_try_acquire() // fails
>     if (bit < 0)
>           return; // returns here
>     migrate_disable() // does not get called.
>
> The recursion around migrate_disable/enable() is needed because it's done
> before other checks. You can't attach the test_recursion logic to the
> __bpf_prog_enter/exit() routines, because those can legitimately recurse.
>

IIUC, the acquire/release pair works as follows,

   test_recursion_try_acquire
     [ protection area ]
   test_recursion_release

After release, there will be no protection, and thus it will fail the
tools/testing/selftests/bpf/progs/recursion.c[1] test case, because
the recursion occurs in the bpf_prog_run() itself,

  __bpf_prog_enter
     test_recursion_try_acquire
     [...]
     test_recursion_release
  // no protection after the release
  bpf_prog_run()
    bpf_prog_run() // the recursion can't be prevented.
        __bpf_prog_enter
            test_recursion_try_acquire
            [...]
            test_recursion_release
       bpf_prog_run()
           bpf_prog_run()
               __bpf_prog_enter
                  test_recursion_try_acquire
                  [...]
                  test_recursion_release
              bpf_prog_run()
              [ And so on ... ]

[1]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/recursion.c#n38

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27 14:22     ` Yafang Shao
@ 2023-04-27 15:18       ` Steven Rostedt
  2023-04-27 15:23         ` Yafang Shao
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-04-27 15:18 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, mhiramat, bpf, linux-trace-kernel,
	linux-kernel

On Thu, 27 Apr 2023 22:22:22 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> IIUC, the acquire/release pair works as follows,
> 
>    test_recursion_try_acquire
>      [ protection area ]
>    test_recursion_release
> 
> After release, there will be no protection, and thus it will fail the
> tools/testing/selftests/bpf/progs/recursion.c[1] test case, because
> the recursion occurs in the bpf_prog_run() itself,

But bpf programs are allowed to recurs. Hence, you need separate logic to
detect that. The test_recursion_*() code is for cases that are not allowed
to recurs.

> 
>   __bpf_prog_enter
>      test_recursion_try_acquire
>      [...]
>      test_recursion_release
>   // no protection after the release
>   bpf_prog_run()
>     bpf_prog_run() // the recursion can't be prevented.

But I thought you can run a bpf_prog from another bpf_prog. So you don't
want to prevent it. You need other logic to detect if it was not suppose to
recurs.

-- Steve


>         __bpf_prog_enter
>             test_recursion_try_acquire
>             [...]
>             test_recursion_release
>        bpf_prog_run()
>            bpf_prog_run()
>                __bpf_prog_enter
>                   test_recursion_try_acquire
>                   [...]
>                   test_recursion_release
>               bpf_prog_run()
>               [ And so on ... ]
> 
> [1]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/recursion.c#n38

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27 15:18       ` Steven Rostedt
@ 2023-04-27 15:23         ` Yafang Shao
  2023-04-27 15:36           ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Yafang Shao @ 2023-04-27 15:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, mhiramat, bpf, linux-trace-kernel,
	linux-kernel

On Thu, Apr 27, 2023 at 11:18 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 27 Apr 2023 22:22:22 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > IIUC, the acquire/release pair works as follows,
> >
> >    test_recursion_try_acquire
> >      [ protection area ]
> >    test_recursion_release
> >
> > After release, there will be no protection, and thus it will fail the
> > tools/testing/selftests/bpf/progs/recursion.c[1] test case, because
> > the recursion occurs in the bpf_prog_run() itself,
>
> But bpf programs are allowed to recurs. Hence, you need separate logic to
> detect that. The test_recursion_*() code is for cases that are not allowed
> to recurs.
>

Agreed.

> >
> >   __bpf_prog_enter
> >      test_recursion_try_acquire
> >      [...]
> >      test_recursion_release
> >   // no protection after the release
> >   bpf_prog_run()
> >     bpf_prog_run() // the recursion can't be prevented.
>
> But I thought you can run a bpf_prog from another bpf_prog. So you don't
> want to prevent it. You need other logic to detect if it was not suppose to
> recurs.
>

If so, we have to keep the prog->active to prevent it, then I'm not
sure if it is worth adding test_recursion_*().

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27 15:23         ` Yafang Shao
@ 2023-04-27 15:36           ` Steven Rostedt
  2023-04-27 15:39             ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Steven Rostedt @ 2023-04-27 15:36 UTC (permalink / raw)
  To: Yafang Shao
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, mhiramat, bpf, linux-trace-kernel,
	linux-kernel

On Thu, 27 Apr 2023 23:23:31 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> > But I thought you can run a bpf_prog from another bpf_prog. So you don't
> > want to prevent it. You need other logic to detect if it was not suppose to
> > recurs.
> >  
> 
> If so, we have to keep the prog->active to prevent it, then I'm not
> sure if it is worth adding test_recursion_*().

I thought that the whole point of this exercise was because the
migrate_disable() itself could be traced (or call something that can), and
that's outside of prog->active protection. Which the test_recursion_*()
code was created for.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27 15:36           ` Steven Rostedt
@ 2023-04-27 15:39             ` Alexei Starovoitov
  2023-04-27 15:43               ` Yafang Shao
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2023-04-27 15:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Yafang Shao, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Thu, Apr 27, 2023 at 8:36 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 27 Apr 2023 23:23:31 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > > But I thought you can run a bpf_prog from another bpf_prog. So you don't
> > > want to prevent it. You need other logic to detect if it was not suppose to
> > > recurs.
> > >
> >
> > If so, we have to keep the prog->active to prevent it, then I'm not
> > sure if it is worth adding test_recursion_*().
>
> I thought that the whole point of this exercise was because the
> migrate_disable() itself could be traced (or call something that can), and
> that's outside of prog->active protection. Which the test_recursion_*()
> code was created for.

Not sure where did this come from.
migrate_enable/disable were added to deny list back in 2021.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27 15:39             ` Alexei Starovoitov
@ 2023-04-27 15:43               ` Yafang Shao
  2023-04-27 15:46                 ` Steven Rostedt
  0 siblings, 1 reply; 25+ messages in thread
From: Yafang Shao @ 2023-04-27 15:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Thu, Apr 27, 2023 at 11:39 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Apr 27, 2023 at 8:36 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Thu, 27 Apr 2023 23:23:31 +0800
> > Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > > > But I thought you can run a bpf_prog from another bpf_prog. So you don't
> > > > want to prevent it. You need other logic to detect if it was not suppose to
> > > > recurs.
> > > >
> > >
> > > If so, we have to keep the prog->active to prevent it, then I'm not
> > > sure if it is worth adding test_recursion_*().
> >
> > I thought that the whole point of this exercise was because the
> > migrate_disable() itself could be traced (or call something that can), and
> > that's outside of prog->active protection. Which the test_recursion_*()
> > code was created for.
>
> Not sure where did this come from.
> migrate_enable/disable were added to deny list back in 2021.

Hi Alexei,

Don't be uneasy.  It is not good to play word games.
What Steven really meant is the preempt_count_{sub, add}.
Anyway thanks Steven for the help with this exercise.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism
  2023-04-27 15:43               ` Yafang Shao
@ 2023-04-27 15:46                 ` Steven Rostedt
  0 siblings, 0 replies; 25+ messages in thread
From: Steven Rostedt @ 2023-04-27 15:46 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Masami Hiramatsu, bpf, linux-trace-kernel, LKML

On Thu, 27 Apr 2023 23:43:35 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> > > I thought that the whole point of this exercise was because the
> > > migrate_disable() itself could be traced (or call something that can), and
> > > that's outside of prog->active protection. Which the test_recursion_*()
> > > code was created for.  
> >
> > Not sure where did this come from.
> > migrate_enable/disable were added to deny list back in 2021.  
> 
> Hi Alexei,
> 
> Don't be uneasy.  It is not good to play word games.
> What Steven really meant is the preempt_count_{sub, add}.
> Anyway thanks Steven for the help with this exercise.

Right, it was the "(or call something that can)" part that this came from.
As Yafang said, migrate_disable() calls preempt_count_add() (on some
configs) which is traced by ftrace, and thus traced by bpf. Or was that
added to the deny list? I think that was one of the solutions as well.

-- Steve

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-04-27 15:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-17 15:47 [PATCH bpf-next 0/6] bpf: Tracing recursion prevention mechanism improvement Yafang Shao
2023-04-17 15:47 ` [PATCH bpf-next 1/6] bpf: Add __rcu_read_{lock,unlock} into btf id deny list Yafang Shao
2023-04-17 15:47 ` [PATCH bpf-next 2/6] tracing: Add generic test_recursion_try_acquire() Yafang Shao
2023-04-20  6:51   ` Masami Hiramatsu
2023-04-17 15:47 ` [PATCH bpf-next 3/6] tracing: Add the comment for allowing one single recursion in process context Yafang Shao
2023-04-17 15:47 ` [PATCH bpf-next 4/6] selftests/bpf: Allow one single recursion in fentry recursion test Yafang Shao
2023-04-17 15:47 ` [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism Yafang Shao
2023-04-17 20:14   ` Alexei Starovoitov
2023-04-18  1:49     ` Yafang Shao
2023-04-18 15:38       ` Alexei Starovoitov
2023-04-19 11:46         ` Yafang Shao
     [not found]           ` <CAADnVQ+FO-+1OALTtgVkcpH3Adc6xS9qjzORyq2vwVtwY2UoxQ@mail.gmail.com>
2023-04-24 21:40             ` Steven Rostedt
2023-04-27  9:57               ` Yafang Shao
2023-04-27 12:15                 ` Yafang Shao
2023-04-27 12:35                   ` Yafang Shao
2023-04-17 23:29   ` kernel test robot
2023-04-27 13:26   ` Steven Rostedt
2023-04-27 14:22     ` Yafang Shao
2023-04-27 15:18       ` Steven Rostedt
2023-04-27 15:23         ` Yafang Shao
2023-04-27 15:36           ` Steven Rostedt
2023-04-27 15:39             ` Alexei Starovoitov
2023-04-27 15:43               ` Yafang Shao
2023-04-27 15:46                 ` Steven Rostedt
2023-04-17 15:47 ` [PATCH bpf-next 6/6] bpf: Remove some denied functions from the btf id deny list Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).