All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next v1 0/2] Fix deadlock with bpf_timer in fentry/fexit progs
@ 2022-11-06  1:51 Kumar Kartikeya Dwivedi
  2022-11-06  1:51 ` [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock Kumar Kartikeya Dwivedi
  2022-11-06  1:51 ` [PATCH RFC bpf-next v1 2/2] [EXAMPLE] selftests/bpf: Add timer deadlock example Kumar Kartikeya Dwivedi
  0 siblings, 2 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-06  1:51 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau

Hi, this is a fix for a deadlock condition I noticed (and verified by
constructing a test case, see patch #2) while looking at the code. There
might be alternative ways to solve this, like having percpu map ops
recursion counter, and detecting reentrancy based on that. I went for
the simpler option for now.

Previous discussions on a patch doing something similar for a different
issue here, but never added BPF_PROG_TYPE_TRACING to this list:

https://lore.kernel.org/bpf/CALOAHbBK1=ujbZp5n4t9zQf+DPZ3-a2ZDL0a3sfhVU6meJGqYw@mail.gmail.com
https://lore.kernel.org/bpf/CALOAHbAQ7AKNPrkawGKHarkB5CGkWXTC8P=+EJP3DAPNEf=Ayw@mail.gmail.com

Note that timer and timer_crash selftests are broken by this change, so
it has potential for wider breakage.

Kumar Kartikeya Dwivedi (2):
  bpf: Fix deadlock for bpf_timer's spinlock
  [EXAMPLE] selftests/bpf: Add timer deadlock example

 kernel/bpf/verifier.c                         | 10 ++--
 .../testing/selftests/bpf/prog_tests/timer.c  | 21 +++++---
 .../selftests/bpf/prog_tests/timer_crash.c    |  9 +++-
 .../selftests/bpf/prog_tests/timer_deadlock.c | 29 +++++++++++
 tools/testing/selftests/bpf/progs/timer.c     |  8 +--
 .../testing/selftests/bpf/progs/timer_crash.c |  4 +-
 .../selftests/bpf/progs/timer_deadlock.c      | 50 +++++++++++++++++++
 7 files changed, 113 insertions(+), 18 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/timer_deadlock.c
 create mode 100644 tools/testing/selftests/bpf/progs/timer_deadlock.c


base-commit: 25906092edb4bcf94cb669bd1ed03a0ef2f4120c
--
2.38.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock
  2022-11-06  1:51 [PATCH RFC bpf-next v1 0/2] Fix deadlock with bpf_timer in fentry/fexit progs Kumar Kartikeya Dwivedi
@ 2022-11-06  1:51 ` Kumar Kartikeya Dwivedi
  2022-11-06 21:20   ` Alexei Starovoitov
  2022-11-06  1:51 ` [PATCH RFC bpf-next v1 2/2] [EXAMPLE] selftests/bpf: Add timer deadlock example Kumar Kartikeya Dwivedi
  1 sibling, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-06  1:51 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau

Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is
excluded is_tracing_prog_type checks. This means that they can use maps
containing bpf_spin_lock, bpf_timer, etc. without verification failure.

However, allowing fentry/fexit programs to use maps that have such
bpf_timer in the map value can lead to deadlock.

Suppose that an fentry program is attached to bpf_prog_put, and a TC
program executes and does bpf_map_update_elem on an array map that both
progs share. If the fentry programs calls bpf_map_update_elem for the
same key, it will lead to acquiring of the same lock from within the
critical section protecting the timer.

The call chain is:

bpf_prog_test_run_opts() // TC
  bpf_prog_TC
    bpf_map_update_elem(array_map, key=0)
      bpf_obj_free_fields
        bpf_timer_cancel_and_free
	  __bpf_spin_lock_irqsave
	    drop_prog_refcnt
	      bpf_prog_put
	        bpf_prog_FENTRY // FENTRY
		  bpf_map_update_elem(array_map, key=0)
		    bpf_obj_free_fields
		      bpf_timer_cancel_and_free
		        __bpf_spin_lock_irqsave // DEADLOCK

BPF_TRACE_ITER attach type can be excluded because it always executes in
process context.

Update selftests using bpf_timer in fentry to TC as they will be broken
by this change.

Fixes: 5e0bc3082e2e ("bpf: Forbid bpf_ktime_get_coarse_ns and bpf_timer_* in tracing progs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c                         | 10 ++++++---
 .../testing/selftests/bpf/prog_tests/timer.c  | 21 ++++++++++++-------
 .../selftests/bpf/prog_tests/timer_crash.c    |  9 +++++++-
 tools/testing/selftests/bpf/progs/timer.c     |  8 +++----
 .../testing/selftests/bpf/progs/timer_crash.c |  4 ++--
 5 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d3b75aa0c54d..6e948c695e84 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12784,7 +12784,7 @@ static int check_pseudo_btf_id(struct bpf_verifier_env *env,
 	return err;
 }
 
-static bool is_tracing_prog_type(enum bpf_prog_type type)
+static bool is_tracing_prog_type(enum bpf_prog_type type, enum bpf_attach_type eatype)
 {
 	switch (type) {
 	case BPF_PROG_TYPE_KPROBE:
@@ -12792,6 +12792,9 @@ static bool is_tracing_prog_type(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_PERF_EVENT:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
+	case BPF_PROG_TYPE_TRACING:
+		if (eatype == BPF_TRACE_ITER)
+			return false;
 		return true;
 	default:
 		return false;
@@ -12803,6 +12806,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 					struct bpf_prog *prog)
 
 {
+	enum bpf_attach_type eatype = env->prog->expected_attach_type;
 	enum bpf_prog_type prog_type = resolve_prog_type(prog);
 
 	if (btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
@@ -12811,7 +12815,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 			return -EINVAL;
 		}
 
-		if (is_tracing_prog_type(prog_type)) {
+		if (is_tracing_prog_type(prog_type, eatype)) {
 			verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
 			return -EINVAL;
 		}
@@ -12823,7 +12827,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 	}
 
 	if (btf_record_has_field(map->record, BPF_TIMER)) {
-		if (is_tracing_prog_type(prog_type)) {
+		if (is_tracing_prog_type(prog_type, eatype)) {
 			verbose(env, "tracing progs cannot use bpf_timer yet\n");
 			return -EINVAL;
 		}
diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c
index 7eb049214859..c0c39da12250 100644
--- a/tools/testing/selftests/bpf/prog_tests/timer.c
+++ b/tools/testing/selftests/bpf/prog_tests/timer.c
@@ -1,25 +1,30 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2021 Facebook */
 #include <test_progs.h>
+#include <network_helpers.h>
 #include "timer.skel.h"
 
 static int timer(struct timer *timer_skel)
 {
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
 	int err, prog_fd;
-	LIBBPF_OPTS(bpf_test_run_opts, topts);
-
-	err = timer__attach(timer_skel);
-	if (!ASSERT_OK(err, "timer_attach"))
-		return err;
 
 	ASSERT_EQ(timer_skel->data->callback_check, 52, "callback_check1");
 	ASSERT_EQ(timer_skel->data->callback2_check, 52, "callback2_check1");
 
 	prog_fd = bpf_program__fd(timer_skel->progs.test1);
 	err = bpf_prog_test_run_opts(prog_fd, &topts);
-	ASSERT_OK(err, "test_run");
-	ASSERT_EQ(topts.retval, 0, "test_run");
-	timer__detach(timer_skel);
+	ASSERT_OK(err, "test_run test1");
+	ASSERT_EQ(topts.retval, 0, "test_run retval test1");
+
+	prog_fd = bpf_program__fd(timer_skel->progs.test2);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	ASSERT_OK(err, "test_run test2");
+	ASSERT_EQ(topts.retval, 0, "test_run retval test2");
 
 	usleep(50); /* 10 usecs should be enough, but give it extra */
 	/* check that timer_cb1() was executed 10+10 times */
diff --git a/tools/testing/selftests/bpf/prog_tests/timer_crash.c b/tools/testing/selftests/bpf/prog_tests/timer_crash.c
index f74b82305da8..91f2333b92aa 100644
--- a/tools/testing/selftests/bpf/prog_tests/timer_crash.c
+++ b/tools/testing/selftests/bpf/prog_tests/timer_crash.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
+#include <network_helpers.h>
 #include "timer_crash.skel.h"
 
 enum {
@@ -9,6 +10,11 @@ enum {
 
 static void test_timer_crash_mode(int mode)
 {
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
 	struct timer_crash *skel;
 
 	skel = timer_crash__open_and_load();
@@ -18,7 +24,8 @@ static void test_timer_crash_mode(int mode)
 	skel->bss->crash_map = mode;
 	if (!ASSERT_OK(timer_crash__attach(skel), "timer_crash__attach"))
 		goto end;
-	usleep(1);
+	ASSERT_OK(bpf_prog_test_run_opts(bpf_program__fd(skel->progs.timer), &topts), "test_run");
+	ASSERT_EQ(topts.retval, 0, "test_run retval");
 end:
 	timer_crash__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c
index acda5c9cea93..492f62917898 100644
--- a/tools/testing/selftests/bpf/progs/timer.c
+++ b/tools/testing/selftests/bpf/progs/timer.c
@@ -119,8 +119,8 @@ static int timer_cb1(void *map, int *key, struct bpf_timer *timer)
 	return 0;
 }
 
-SEC("fentry/bpf_fentry_test1")
-int BPF_PROG2(test1, int, a)
+SEC("tc")
+int test1(void *ctx)
 {
 	struct bpf_timer *arr_timer, *lru_timer;
 	struct elem init = {};
@@ -235,8 +235,8 @@ int bpf_timer_test(void)
 	return 0;
 }
 
-SEC("fentry/bpf_fentry_test2")
-int BPF_PROG2(test2, int, a, int, b)
+SEC("tc")
+int test2(void *ctx)
 {
 	struct hmap_elem init = {}, *val;
 	int key = HTAB, key_malloc = HTAB_MALLOC;
diff --git a/tools/testing/selftests/bpf/progs/timer_crash.c b/tools/testing/selftests/bpf/progs/timer_crash.c
index f8f7944e70da..971eb93f485c 100644
--- a/tools/testing/selftests/bpf/progs/timer_crash.c
+++ b/tools/testing/selftests/bpf/progs/timer_crash.c
@@ -26,8 +26,8 @@ struct {
 int pid = 0;
 int crash_map = 0; /* 0 for amap, 1 for hmap */
 
-SEC("fentry/do_nanosleep")
-int sys_enter(void *ctx)
+SEC("tc")
+int timer(void *ctx)
 {
 	struct map_elem *e, value = {};
 	void *map = crash_map ? (void *)&hmap : (void *)&amap;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC bpf-next v1 2/2] [EXAMPLE] selftests/bpf: Add timer deadlock example
  2022-11-06  1:51 [PATCH RFC bpf-next v1 0/2] Fix deadlock with bpf_timer in fentry/fexit progs Kumar Kartikeya Dwivedi
  2022-11-06  1:51 ` [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock Kumar Kartikeya Dwivedi
@ 2022-11-06  1:51 ` Kumar Kartikeya Dwivedi
  1 sibling, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-06  1:51 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau

This is just an example to showcase that the deadlock can occur in
practice. Run this on an unfixed kernel by uncommenting the skipping
part in timer_deadlock.c

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/timer_deadlock.c | 29 +++++++++++
 .../selftests/bpf/progs/timer_deadlock.c      | 50 +++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/timer_deadlock.c
 create mode 100644 tools/testing/selftests/bpf/progs/timer_deadlock.c

diff --git a/tools/testing/selftests/bpf/prog_tests/timer_deadlock.c b/tools/testing/selftests/bpf/prog_tests/timer_deadlock.c
new file mode 100644
index 000000000000..83657577d137
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/timer_deadlock.c
@@ -0,0 +1,29 @@
+#include <test_progs.h>
+#include <network_helpers.h>
+
+#include "timer_deadlock.skel.h"
+
+void test_timer_deadlock(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		.data_in = &pkt_v4,
+		.data_size_in = sizeof(pkt_v4),
+		.repeat = 1,
+	);
+	struct timer_deadlock *skel;
+
+	/* Remove to observe deadlock */
+	test__skip();
+	return;
+
+	skel = timer_deadlock__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "timer_deadlock__open_and_load"))
+		return;
+	if (!ASSERT_OK(timer_deadlock__attach(skel), "timer_deadlock__attach"))
+		goto end;
+	ASSERT_OK(bpf_prog_test_run_opts(bpf_program__fd(skel->progs.tc_prog), &topts), "test_run");
+	ASSERT_EQ(topts.retval, 0, "test_run retval");
+end:
+	timer_deadlock__destroy(skel);
+}
+
diff --git a/tools/testing/selftests/bpf/progs/timer_deadlock.c b/tools/testing/selftests/bpf/progs/timer_deadlock.c
new file mode 100644
index 000000000000..ac05c7320144
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/timer_deadlock.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+int tid = 0;
+
+struct map_value {
+	struct bpf_timer timer;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__type(key, int);
+	__type(value, struct map_value);
+	__uint(max_entries, 1);
+} array_map SEC(".maps");
+
+static int cb(struct bpf_map *map, int *key, struct map_value *val)
+{
+	return 0;
+}
+
+SEC("tc")
+int tc_prog(void *ctx)
+{
+	struct task_struct *current = bpf_get_current_task_btf();
+	struct map_value *v, val = {};
+
+	v = bpf_map_lookup_elem(&array_map, &(int){0});
+	if (!v)
+		return 0;
+	bpf_timer_init(&v->timer, &array_map, 0);
+	bpf_timer_set_callback(&v->timer, &cb);
+
+	tid = current->pid;
+	return bpf_map_update_elem(&array_map, &(int){0}, &val, 0);
+}
+
+SEC("fentry/bpf_prog_put")
+int fentry_prog(void *ctx)
+{
+	struct map_value val = {};
+
+	if (tid == bpf_get_current_task_btf()->pid)
+		bpf_map_update_elem(&array_map, &(int){0}, &val, 0);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock
  2022-11-06  1:51 ` [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock Kumar Kartikeya Dwivedi
@ 2022-11-06 21:20   ` Alexei Starovoitov
  2022-11-06 21:44     ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-11-06 21:20 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is
> excluded is_tracing_prog_type checks. This means that they can use maps
> containing bpf_spin_lock, bpf_timer, etc. without verification failure.
>
> However, allowing fentry/fexit programs to use maps that have such
> bpf_timer in the map value can lead to deadlock.
>
> Suppose that an fentry program is attached to bpf_prog_put, and a TC
> program executes and does bpf_map_update_elem on an array map that both
> progs share. If the fentry programs calls bpf_map_update_elem for the
> same key, it will lead to acquiring of the same lock from within the
> critical section protecting the timer.
>
> The call chain is:
>
> bpf_prog_test_run_opts() // TC
>   bpf_prog_TC
>     bpf_map_update_elem(array_map, key=0)
>       bpf_obj_free_fields
>         bpf_timer_cancel_and_free
>           __bpf_spin_lock_irqsave
>             drop_prog_refcnt
>               bpf_prog_put
>                 bpf_prog_FENTRY // FENTRY
>                   bpf_map_update_elem(array_map, key=0)
>                     bpf_obj_free_fields
>                       bpf_timer_cancel_and_free
>                         __bpf_spin_lock_irqsave // DEADLOCK
>
> BPF_TRACE_ITER attach type can be excluded because it always executes in
> process context.
>
> Update selftests using bpf_timer in fentry to TC as they will be broken
> by this change.

which is an obvious red flag and the reason why we cannot do
this change.
This specific issue could be addressed with addition of
notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put.
Other calls from __bpf_prog_put can stay as-is,
since they won't be called in this convoluted case.
I frankly don't get why you're spending time digging such
odd corner cases that no one can hit in real use.
There are probably other equally weird corner cases and sooner
or later will just declare them as 'wont-fix'. Not kidding.
Please channel your energy to something that helps.
Positive patches are more pleasant to review.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock
  2022-11-06 21:20   ` Alexei Starovoitov
@ 2022-11-06 21:44     ` Kumar Kartikeya Dwivedi
  2022-11-07  0:31       ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-06 21:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote:
> On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is
> > excluded is_tracing_prog_type checks. This means that they can use maps
> > containing bpf_spin_lock, bpf_timer, etc. without verification failure.
> >
> > However, allowing fentry/fexit programs to use maps that have such
> > bpf_timer in the map value can lead to deadlock.
> >
> > Suppose that an fentry program is attached to bpf_prog_put, and a TC
> > program executes and does bpf_map_update_elem on an array map that both
> > progs share. If the fentry programs calls bpf_map_update_elem for the
> > same key, it will lead to acquiring of the same lock from within the
> > critical section protecting the timer.
> >
> > The call chain is:
> >
> > bpf_prog_test_run_opts() // TC
> >   bpf_prog_TC
> >     bpf_map_update_elem(array_map, key=0)
> >       bpf_obj_free_fields
> >         bpf_timer_cancel_and_free
> >           __bpf_spin_lock_irqsave
> >             drop_prog_refcnt
> >               bpf_prog_put
> >                 bpf_prog_FENTRY // FENTRY
> >                   bpf_map_update_elem(array_map, key=0)
> >                     bpf_obj_free_fields
> >                       bpf_timer_cancel_and_free
> >                         __bpf_spin_lock_irqsave // DEADLOCK
> >
> > BPF_TRACE_ITER attach type can be excluded because it always executes in
> > process context.
> >
> > Update selftests using bpf_timer in fentry to TC as they will be broken
> > by this change.
>
> which is an obvious red flag and the reason why we cannot do
> this change.
> This specific issue could be addressed with addition of
> notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put.
> Other calls from __bpf_prog_put can stay as-is,
> since they won't be called in this convoluted case.
> I frankly don't get why you're spending time digging such
> odd corner cases that no one can hit in real use.

I was trying to figure out whether bpf_list_head_free would be safe to call all
the time in map updates from bpf_obj_free_fields, since it takes the very same
spin lock that BPF program can also take to update the list.

Map update ops are not allowed in the critical section, so this particular kind
of recurisve map update call should not be possible. perf event is already
prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update
the same map.

But then I went looking whether it was a problem elsewhere...

FWIW I have updated my patch to do:

  if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD
	if (is_tracing_prog_type(prog_type) || ‣type: prog_type
	    (prog_type == BPF_PROG_TYPE_TRACING &&
	     env->prog->expected_attach_type != BPF_TRACE_ITER)) {
		verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp
		return -EINVAL;
	}
  }

v5 coming soon.

> There are probably other equally weird corner cases and sooner
> or later will just declare them as 'wont-fix'. Not kidding.

Understood.

> Please channel your energy to something that helps.
> Positive patches are more pleasant to review.

Understood.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock
  2022-11-06 21:44     ` Kumar Kartikeya Dwivedi
@ 2022-11-07  0:31       ` Alexei Starovoitov
  2022-11-07  1:48         ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-11-07  0:31 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Sun, Nov 6, 2022 at 1:44 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote:
> > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is
> > > excluded is_tracing_prog_type checks. This means that they can use maps
> > > containing bpf_spin_lock, bpf_timer, etc. without verification failure.
> > >
> > > However, allowing fentry/fexit programs to use maps that have such
> > > bpf_timer in the map value can lead to deadlock.
> > >
> > > Suppose that an fentry program is attached to bpf_prog_put, and a TC
> > > program executes and does bpf_map_update_elem on an array map that both
> > > progs share. If the fentry programs calls bpf_map_update_elem for the
> > > same key, it will lead to acquiring of the same lock from within the
> > > critical section protecting the timer.
> > >
> > > The call chain is:
> > >
> > > bpf_prog_test_run_opts() // TC
> > >   bpf_prog_TC
> > >     bpf_map_update_elem(array_map, key=0)
> > >       bpf_obj_free_fields
> > >         bpf_timer_cancel_and_free
> > >           __bpf_spin_lock_irqsave
> > >             drop_prog_refcnt
> > >               bpf_prog_put
> > >                 bpf_prog_FENTRY // FENTRY
> > >                   bpf_map_update_elem(array_map, key=0)
> > >                     bpf_obj_free_fields
> > >                       bpf_timer_cancel_and_free
> > >                         __bpf_spin_lock_irqsave // DEADLOCK
> > >
> > > BPF_TRACE_ITER attach type can be excluded because it always executes in
> > > process context.
> > >
> > > Update selftests using bpf_timer in fentry to TC as they will be broken
> > > by this change.
> >
> > which is an obvious red flag and the reason why we cannot do
> > this change.
> > This specific issue could be addressed with addition of
> > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put.
> > Other calls from __bpf_prog_put can stay as-is,
> > since they won't be called in this convoluted case.
> > I frankly don't get why you're spending time digging such
> > odd corner cases that no one can hit in real use.
>
> I was trying to figure out whether bpf_list_head_free would be safe to call all
> the time in map updates from bpf_obj_free_fields, since it takes the very same
> spin lock that BPF program can also take to update the list.
>
> Map update ops are not allowed in the critical section, so this particular kind
> of recurisve map update call should not be possible. perf event is already
> prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update
> the same map.
>
> But then I went looking whether it was a problem elsewhere...
>
> FWIW I have updated my patch to do:
>
>   if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD
>         if (is_tracing_prog_type(prog_type) || ‣type: prog_type
>             (prog_type == BPF_PROG_TYPE_TRACING &&
>              env->prog->expected_attach_type != BPF_TRACE_ITER)) {
>                 verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp
>                 return -EINVAL;
>         }
>   }

That is a severe limitation.
Why cannot you use notrace approach?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock
  2022-11-07  0:31       ` Alexei Starovoitov
@ 2022-11-07  1:48         ` Kumar Kartikeya Dwivedi
  2022-11-07  2:34           ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-07  1:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Mon, Nov 07, 2022 at 06:01:44AM IST, Alexei Starovoitov wrote:
> On Sun, Nov 6, 2022 at 1:44 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote:
> > > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > >
> > > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is
> > > > excluded is_tracing_prog_type checks. This means that they can use maps
> > > > containing bpf_spin_lock, bpf_timer, etc. without verification failure.
> > > >
> > > > However, allowing fentry/fexit programs to use maps that have such
> > > > bpf_timer in the map value can lead to deadlock.
> > > >
> > > > Suppose that an fentry program is attached to bpf_prog_put, and a TC
> > > > program executes and does bpf_map_update_elem on an array map that both
> > > > progs share. If the fentry programs calls bpf_map_update_elem for the
> > > > same key, it will lead to acquiring of the same lock from within the
> > > > critical section protecting the timer.
> > > >
> > > > The call chain is:
> > > >
> > > > bpf_prog_test_run_opts() // TC
> > > >   bpf_prog_TC
> > > >     bpf_map_update_elem(array_map, key=0)
> > > >       bpf_obj_free_fields
> > > >         bpf_timer_cancel_and_free
> > > >           __bpf_spin_lock_irqsave
> > > >             drop_prog_refcnt
> > > >               bpf_prog_put
> > > >                 bpf_prog_FENTRY // FENTRY
> > > >                   bpf_map_update_elem(array_map, key=0)
> > > >                     bpf_obj_free_fields
> > > >                       bpf_timer_cancel_and_free
> > > >                         __bpf_spin_lock_irqsave // DEADLOCK
> > > >
> > > > BPF_TRACE_ITER attach type can be excluded because it always executes in
> > > > process context.
> > > >
> > > > Update selftests using bpf_timer in fentry to TC as they will be broken
> > > > by this change.
> > >
> > > which is an obvious red flag and the reason why we cannot do
> > > this change.
> > > This specific issue could be addressed with addition of
> > > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put.
> > > Other calls from __bpf_prog_put can stay as-is,
> > > since they won't be called in this convoluted case.
> > > I frankly don't get why you're spending time digging such
> > > odd corner cases that no one can hit in real use.
> >
> > I was trying to figure out whether bpf_list_head_free would be safe to call all
> > the time in map updates from bpf_obj_free_fields, since it takes the very same
> > spin lock that BPF program can also take to update the list.
> >
> > Map update ops are not allowed in the critical section, so this particular kind
> > of recurisve map update call should not be possible. perf event is already
> > prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update
> > the same map.
> >
> > But then I went looking whether it was a problem elsewhere...
> >
> > FWIW I have updated my patch to do:
> >
> >   if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD
> >         if (is_tracing_prog_type(prog_type) || ‣type: prog_type
> >             (prog_type == BPF_PROG_TYPE_TRACING &&
> >              env->prog->expected_attach_type != BPF_TRACE_ITER)) {
> >                 verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp
> >                 return -EINVAL;
> >         }
> >   }
>
> That is a severe limitation.
> Why cannot you use notrace approach?

Yes, notrace is indeed an option, but the problem is that everything within that
critical section needs to be notrace. bpf_list_head_free also ends up calling
bpf_obj_free_fields again (the level of recursion however won't exceed 3, since
we clamp list_head -> list_head till 2 levels).

So the notrace needs to be applied to everything within it, which is not a
problem now. It can be done. BPF_TIMER and BPF_KPTR_REF (the indirect call to
dtor) won't be triggered, so it probably just needs to be bpf_list_head_free
and bpf_obj_free_fields.

But it can break silently in the future, if e.g. kptr is allowed. Same for
drop_prog_refcnt if something changes. Every change to anything they call (and
called by functions they call) needs to keep the restriction in mind.

I was wondering if in both cases of bpf_timer and bpf_list_head, we can simply
swap out the object locally while holding the lock, and then do everything
outside the spin lock.

For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical
section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it
would mean swapping out the list_head and then draining it outside the lock.

Then we hopefully don't need to use notrace, and it wouldn't be possible for any
tracing prog to execute while we hold the bpf_spin_lock (unless I missed
something).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock
  2022-11-07  1:48         ` Kumar Kartikeya Dwivedi
@ 2022-11-07  2:34           ` Alexei Starovoitov
  2022-11-07  3:45             ` Alexei Starovoitov
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-11-07  2:34 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Sun, Nov 6, 2022 at 5:48 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Mon, Nov 07, 2022 at 06:01:44AM IST, Alexei Starovoitov wrote:
> > On Sun, Nov 6, 2022 at 1:44 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > On Mon, Nov 07, 2022 at 02:50:08AM IST, Alexei Starovoitov wrote:
> > > > On Sat, Nov 5, 2022 at 6:52 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > > > >
> > > > > Currently, unlike other tracing program types, BPF_PROG_TYPE_TRACING is
> > > > > excluded is_tracing_prog_type checks. This means that they can use maps
> > > > > containing bpf_spin_lock, bpf_timer, etc. without verification failure.
> > > > >
> > > > > However, allowing fentry/fexit programs to use maps that have such
> > > > > bpf_timer in the map value can lead to deadlock.
> > > > >
> > > > > Suppose that an fentry program is attached to bpf_prog_put, and a TC
> > > > > program executes and does bpf_map_update_elem on an array map that both
> > > > > progs share. If the fentry programs calls bpf_map_update_elem for the
> > > > > same key, it will lead to acquiring of the same lock from within the
> > > > > critical section protecting the timer.
> > > > >
> > > > > The call chain is:
> > > > >
> > > > > bpf_prog_test_run_opts() // TC
> > > > >   bpf_prog_TC
> > > > >     bpf_map_update_elem(array_map, key=0)
> > > > >       bpf_obj_free_fields
> > > > >         bpf_timer_cancel_and_free
> > > > >           __bpf_spin_lock_irqsave
> > > > >             drop_prog_refcnt
> > > > >               bpf_prog_put
> > > > >                 bpf_prog_FENTRY // FENTRY
> > > > >                   bpf_map_update_elem(array_map, key=0)
> > > > >                     bpf_obj_free_fields
> > > > >                       bpf_timer_cancel_and_free
> > > > >                         __bpf_spin_lock_irqsave // DEADLOCK
> > > > >
> > > > > BPF_TRACE_ITER attach type can be excluded because it always executes in
> > > > > process context.
> > > > >
> > > > > Update selftests using bpf_timer in fentry to TC as they will be broken
> > > > > by this change.
> > > >
> > > > which is an obvious red flag and the reason why we cannot do
> > > > this change.
> > > > This specific issue could be addressed with addition of
> > > > notrace in drop_prog_refcnt, bpf_prog_put, __bpf_prog_put.
> > > > Other calls from __bpf_prog_put can stay as-is,
> > > > since they won't be called in this convoluted case.
> > > > I frankly don't get why you're spending time digging such
> > > > odd corner cases that no one can hit in real use.
> > >
> > > I was trying to figure out whether bpf_list_head_free would be safe to call all
> > > the time in map updates from bpf_obj_free_fields, since it takes the very same
> > > spin lock that BPF program can also take to update the list.
> > >
> > > Map update ops are not allowed in the critical section, so this particular kind
> > > of recurisve map update call should not be possible. perf event is already
> > > prevented using is_tracing_prog_type, so NMI prog cannot interrupt and update
> > > the same map.
> > >
> > > But then I went looking whether it was a problem elsewhere...
> > >
> > > FWIW I have updated my patch to do:
> > >
> > >   if (btf_record_has_field(map->record, BPF_LIST_HEAD)) { ‣rec: map->record ‣type: BPF_LIST_HEAD
> > >         if (is_tracing_prog_type(prog_type) || ‣type: prog_type
> > >             (prog_type == BPF_PROG_TYPE_TRACING &&
> > >              env->prog->expected_attach_type != BPF_TRACE_ITER)) {
> > >                 verbose(env, "tracing progs cannot use bpf_list_head yet\n"); ‣private_data: env ‣fmt: "tracing progs cannot use bp
> > >                 return -EINVAL;
> > >         }
> > >   }
> >
> > That is a severe limitation.
> > Why cannot you use notrace approach?
>
> Yes, notrace is indeed an option, but the problem is that everything within that
> critical section needs to be notrace. bpf_list_head_free also ends up calling
> bpf_obj_free_fields again (the level of recursion however won't exceed 3, since
> we clamp list_head -> list_head till 2 levels).
>
> So the notrace needs to be applied to everything within it, which is not a
> problem now. It can be done.

let's do it then.

> BPF_TIMER and BPF_KPTR_REF (the indirect call to
> dtor) won't be triggered, so it probably just needs to be bpf_list_head_free
> and bpf_obj_free_fields.
>
> But it can break silently in the future, if e.g. kptr is allowed. Same for
> drop_prog_refcnt if something changes. Every change to anything they call (and
> called by functions they call) needs to keep the restriction in mind.

Only funcs that can realistically be called.
Not every function that is in there.

> I was wondering if in both cases of bpf_timer and bpf_list_head, we can simply
> swap out the object locally while holding the lock, and then do everything
> outside the spin lock.

Maybe, but I wouldn't bother optimizing for convoluted cases.
Use notrace when you can.
Everything that bpf_mem_alloc is doing is notrace for the same reason.

>
> For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical
> section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it
> would mean swapping out the list_head and then draining it outside the lock.

That also works.
drop_prog_refcnt() can be moved after unlock.
Don't see any race.

> Then we hopefully don't need to use notrace, and it wouldn't be possible for any
> tracing prog to execute while we hold the bpf_spin_lock (unless I missed
> something).

yep. spin_lock, link list, obj_new won't be allowed in
is_tracing_prog_type().
Only talking about prog_type_tracing, since those are
a lot more than tracing. Everything new falls into this category.
We might even create an alias like prog_type_generic to highlight that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock
  2022-11-07  2:34           ` Alexei Starovoitov
@ 2022-11-07  3:45             ` Alexei Starovoitov
  2022-11-07 23:13               ` Kumar Kartikeya Dwivedi
  0 siblings, 1 reply; 10+ messages in thread
From: Alexei Starovoitov @ 2022-11-07  3:45 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Sun, Nov 6, 2022 at 6:34 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> >
> > For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical
> > section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it

> > would mean swapping out the list_head and then draining it outside the lock.
>
> That also works.
> drop_prog_refcnt() can be moved after unlock.
> Don't see any race.

I mean not the whole function obviously.
Instead of
static void drop_prog_refcnt(struct bpf_hrtimer *t)
it can become
static struct bpf_prog *drop_prog_refcnt(struct bpf_hrtimer *t)
t->prog and callback_fn should only be manipulated
under lock.
bpf_prog_put itself can happen after unlock.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock
  2022-11-07  3:45             ` Alexei Starovoitov
@ 2022-11-07 23:13               ` Kumar Kartikeya Dwivedi
  0 siblings, 0 replies; 10+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-11-07 23:13 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau

On Mon, Nov 07, 2022 at 09:15:26AM IST, Alexei Starovoitov wrote:
> On Sun, Nov 6, 2022 at 6:34 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > >
> > > For bpf_timer, it would mean moving drop_prog_refcnt outside spin lock critical
> > > section. hrtimer_cancel is already done after the unlock. For bpf_list_head, it
>
> > > would mean swapping out the list_head and then draining it outside the lock.
> >
> > That also works.
> > drop_prog_refcnt() can be moved after unlock.
> > Don't see any race.
>
> I mean not the whole function obviously.
> Instead of
> static void drop_prog_refcnt(struct bpf_hrtimer *t)
> it can become
> static struct bpf_prog *drop_prog_refcnt(struct bpf_hrtimer *t)
> t->prog and callback_fn should only be manipulated
> under lock.
> bpf_prog_put itself can happen after unlock.

Right, both t->prog and t->callback_fn need to be set to NULL under the lock.

I will send out the bpf_timer change separately. For now, I moved list draining
out of the lock in my series and removed the check on BPF_PROG_TYPE_TRACING, and
posted it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-07 23:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-06  1:51 [PATCH RFC bpf-next v1 0/2] Fix deadlock with bpf_timer in fentry/fexit progs Kumar Kartikeya Dwivedi
2022-11-06  1:51 ` [PATCH RFC bpf-next v1 1/2] bpf: Fix deadlock for bpf_timer's spinlock Kumar Kartikeya Dwivedi
2022-11-06 21:20   ` Alexei Starovoitov
2022-11-06 21:44     ` Kumar Kartikeya Dwivedi
2022-11-07  0:31       ` Alexei Starovoitov
2022-11-07  1:48         ` Kumar Kartikeya Dwivedi
2022-11-07  2:34           ` Alexei Starovoitov
2022-11-07  3:45             ` Alexei Starovoitov
2022-11-07 23:13               ` Kumar Kartikeya Dwivedi
2022-11-06  1:51 ` [PATCH RFC bpf-next v1 2/2] [EXAMPLE] selftests/bpf: Add timer deadlock example Kumar Kartikeya Dwivedi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.