* [PATCH bpf v2 0/2] bpf: fix a couple of issues with syscall program @ 2021-08-09 6:03 Yonghong Song 2021-08-09 6:03 ` [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs Yonghong Song 2021-08-09 6:03 ` [PATCH bpf v2 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program Yonghong Song 0 siblings, 2 replies; 9+ messages in thread From: Yonghong Song @ 2021-08-09 6:03 UTC (permalink / raw) To: bpf; +Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team syzbot ([1]) reported a rcu warning for syscall program when bpf_get_current_cgroup_id() is called in the program. The bpf_get_current_cgroup_id() helper expects cgroup related mutex or rcu_read_lock(). The sleepable program cannot be protected with rcu read_lock, hence the warning. Previous version ([2]) tried to add rcu_read_lock() surrounding bpf_prog_run_pin_on_cpu() for syscall program. This doesn't work and will trigger other warning since syscall program is a sleepable program. In this version, patch #1 disallows to call bpf_get_current_cgroup_id() and bpf_get_current_ancestor_cgroup_id() for sleepable programs to silence the warning. Further, syscall program should be protected with bpf_read_lock_trace() which is implemented in Patch #2. [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ [2] https://lore.kernel.org/bpf/20210728172307.1030271-1-yhs@fb.com/ Changelog: v1 -> v2: - use bpf_read_lock_trace() instead of bpf_read_lock() for bpf_prog_run_pin_on_cpu(). - disallow bpf_get_current_[ancestor_]cgroup_id() helper. Yonghong Song (2): bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs bpf: add missing bpf_read_[un]lock_trace() for syscall program kernel/trace/bpf_trace.c | 6 ++++-- net/bpf/test_run.c | 4 ++++ 2 files changed, 8 insertions(+), 2 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs 2021-08-09 6:03 [PATCH bpf v2 0/2] bpf: fix a couple of issues with syscall program Yonghong Song @ 2021-08-09 6:03 ` Yonghong Song 2021-08-09 17:18 ` Andrii Nakryiko 2021-08-09 6:03 ` [PATCH bpf v2 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program Yonghong Song 1 sibling, 1 reply; 9+ messages in thread From: Yonghong Song @ 2021-08-09 6:03 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team, syzbot+7ee5c2c09c284495371f Currently, if bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() helper is called with sleepable programs e.g., sleepable fentry/fmod_ret/fexit/lsm programs, a rcu warning may appear. For example, if I added the following hack to test_progs/test_lsm sleepable fentry program test_sys_setdomainname: --- a/tools/testing/selftests/bpf/progs/lsm.c +++ b/tools/testing/selftests/bpf/progs/lsm.c @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) int buf = 0; long ret; + __u64 cg_id = bpf_get_current_cgroup_id(); + if (cg_id == 1000) + copy_test++; + ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); if (len == -2 && ret == 0 && buf == 1234) copy_test++; I will hit the following rcu warning: include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by test_progs/260: #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 stack backtrace: CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x56/0x7b bpf_get_current_cgroup_id+0x9c/0xb1 bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c bpf_trampoline_6442469132_0+0x2d/0x1000 __x64_sys_setdomainname+0x5/0x110 do_syscall_64+0x3a/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. syzbot reported a similar issue in [1] for syscall program. Helper bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() has the following callchain: task_dfl_cgroup task_css_set task_css_set_check and we have #define task_css_set_check(task, __c) \ rcu_dereference_check((task)->cgroups, \ lockdep_is_held(&cgroup_mutex) || \ lockdep_is_held(&css_set_lock) || \ ((task)->flags & PF_EXITING) || (__c)) Since cgroup_mutex/css_set_lock is not held and the task is not existing and rcu read_lock is not held, a warning will be issued. Note that bpf sleepable program is protected by rcu_read_lock_trace(). To fix the issue, let us make these two helpers not available to sleepable program. I marked the patch fixing 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") which added bpf_get_current_ancestor_cgroup_id() to 5.14. I think backporting 5.14 is probably good enough as sleepable progrems are not widely used. This patch should fix [1] as well since syscall program is a sleepable program and bpf_get_current_cgroup_id() is not available to syscall program any more. [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") Signed-off-by: Yonghong Song <yhs@fb.com> --- kernel/trace/bpf_trace.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index b4916ef388ad..eaa8a8ffbe46 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) #endif #ifdef CONFIG_CGROUPS case BPF_FUNC_get_current_cgroup_id: - return &bpf_get_current_cgroup_id_proto; + return prog->aux->sleepable ? + NULL : &bpf_get_current_cgroup_id_proto; case BPF_FUNC_get_current_ancestor_cgroup_id: - return &bpf_get_current_ancestor_cgroup_id_proto; + return prog->aux->sleepable ? + NULL : &bpf_get_current_ancestor_cgroup_id_proto; #endif case BPF_FUNC_send_signal: return &bpf_send_signal_proto; -- 2.30.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs 2021-08-09 6:03 ` [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs Yonghong Song @ 2021-08-09 17:18 ` Andrii Nakryiko 2021-08-09 17:41 ` Yonghong Song 0 siblings, 1 reply; 9+ messages in thread From: Andrii Nakryiko @ 2021-08-09 17:18 UTC (permalink / raw) To: Yonghong Song Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Kernel Team, syzbot+7ee5c2c09c284495371f On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: > > Currently, if bpf_get_current_cgroup_id() or > bpf_get_current_ancestor_cgroup_id() helper is > called with sleepable programs e.g., sleepable > fentry/fmod_ret/fexit/lsm programs, a rcu warning > may appear. For example, if I added the following > hack to test_progs/test_lsm sleepable fentry program > test_sys_setdomainname: > > --- a/tools/testing/selftests/bpf/progs/lsm.c > +++ b/tools/testing/selftests/bpf/progs/lsm.c > @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) > int buf = 0; > long ret; > > + __u64 cg_id = bpf_get_current_cgroup_id(); > + if (cg_id == 1000) > + copy_test++; > + > ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); > if (len == -2 && ret == 0 && buf == 1234) > copy_test++; > > I will hit the following rcu warning: > > include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > other info that might help us debug this: > rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by test_progs/260: > #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 > stack backtrace: > CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > Call Trace: > dump_stack_lvl+0x56/0x7b > bpf_get_current_cgroup_id+0x9c/0xb1 > bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c > bpf_trampoline_6442469132_0+0x2d/0x1000 > __x64_sys_setdomainname+0x5/0x110 > do_syscall_64+0x3a/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. > syzbot reported a similar issue in [1] for syscall program. Helper > bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() > has the following callchain: > task_dfl_cgroup > task_css_set > task_css_set_check > and we have > #define task_css_set_check(task, __c) \ > rcu_dereference_check((task)->cgroups, \ > lockdep_is_held(&cgroup_mutex) || \ > lockdep_is_held(&css_set_lock) || \ > ((task)->flags & PF_EXITING) || (__c)) > Since cgroup_mutex/css_set_lock is not held and the task > is not existing and rcu read_lock is not held, a warning > will be issued. Note that bpf sleepable program is protected by > rcu_read_lock_trace(). > > To fix the issue, let us make these two helpers not available > to sleepable program. I marked the patch fixing 95b861a7935b > ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > which added bpf_get_current_ancestor_cgroup_id() to > 5.14. I think backporting 5.14 is probably good enough as sleepable > progrems are not widely used. > > This patch should fix [1] as well since syscall program is a sleepable > program and bpf_get_current_cgroup_id() is not available to > syscall program any more. > > [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ > > Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com > Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > Signed-off-by: Yonghong Song <yhs@fb.com> > --- > kernel/trace/bpf_trace.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index b4916ef388ad..eaa8a8ffbe46 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > #endif > #ifdef CONFIG_CGROUPS > case BPF_FUNC_get_current_cgroup_id: > - return &bpf_get_current_cgroup_id_proto; > + return prog->aux->sleepable ? > + NULL : &bpf_get_current_cgroup_id_proto; > case BPF_FUNC_get_current_ancestor_cgroup_id: > - return &bpf_get_current_ancestor_cgroup_id_proto; > + return prog->aux->sleepable ? > + NULL : &bpf_get_current_ancestor_cgroup_id_proto; This feels too extreme. I bet these helpers are as useful in sleepable BPF progs as they are in non-sleepable ones. Why don't we just implement a variant of get_current_cgroup_id (and the ancestor variant as well) which takes that cgroup_mutex lock, and just pick the appropriate implementation. Wouldn't that work? > #endif > case BPF_FUNC_send_signal: > return &bpf_send_signal_proto; > -- > 2.30.2 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs 2021-08-09 17:18 ` Andrii Nakryiko @ 2021-08-09 17:41 ` Yonghong Song 2021-08-09 17:58 ` Andrii Nakryiko 0 siblings, 1 reply; 9+ messages in thread From: Yonghong Song @ 2021-08-09 17:41 UTC (permalink / raw) To: Andrii Nakryiko Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Kernel Team, syzbot+7ee5c2c09c284495371f On 8/9/21 10:18 AM, Andrii Nakryiko wrote: > On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: >> >> Currently, if bpf_get_current_cgroup_id() or >> bpf_get_current_ancestor_cgroup_id() helper is >> called with sleepable programs e.g., sleepable >> fentry/fmod_ret/fexit/lsm programs, a rcu warning >> may appear. For example, if I added the following >> hack to test_progs/test_lsm sleepable fentry program >> test_sys_setdomainname: >> >> --- a/tools/testing/selftests/bpf/progs/lsm.c >> +++ b/tools/testing/selftests/bpf/progs/lsm.c >> @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) >> int buf = 0; >> long ret; >> >> + __u64 cg_id = bpf_get_current_cgroup_id(); >> + if (cg_id == 1000) >> + copy_test++; >> + >> ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); >> if (len == -2 && ret == 0 && buf == 1234) >> copy_test++; >> >> I will hit the following rcu warning: >> >> include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! >> other info that might help us debug this: >> rcu_scheduler_active = 2, debug_locks = 1 >> 1 lock held by test_progs/260: >> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 >> stack backtrace: >> CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >> Call Trace: >> dump_stack_lvl+0x56/0x7b >> bpf_get_current_cgroup_id+0x9c/0xb1 >> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c >> bpf_trampoline_6442469132_0+0x2d/0x1000 >> __x64_sys_setdomainname+0x5/0x110 >> do_syscall_64+0x3a/0x80 >> entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. >> syzbot reported a similar issue in [1] for syscall program. Helper >> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() >> has the following callchain: >> task_dfl_cgroup >> task_css_set >> task_css_set_check >> and we have >> #define task_css_set_check(task, __c) \ >> rcu_dereference_check((task)->cgroups, \ >> lockdep_is_held(&cgroup_mutex) || \ >> lockdep_is_held(&css_set_lock) || \ >> ((task)->flags & PF_EXITING) || (__c)) >> Since cgroup_mutex/css_set_lock is not held and the task >> is not existing and rcu read_lock is not held, a warning >> will be issued. Note that bpf sleepable program is protected by >> rcu_read_lock_trace(). >> >> To fix the issue, let us make these two helpers not available >> to sleepable program. I marked the patch fixing 95b861a7935b >> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >> which added bpf_get_current_ancestor_cgroup_id() to >> 5.14. I think backporting 5.14 is probably good enough as sleepable >> progrems are not widely used. >> >> This patch should fix [1] as well since syscall program is a sleepable >> program and bpf_get_current_cgroup_id() is not available to >> syscall program any more. >> >> [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ >> >> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com >> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >> Signed-off-by: Yonghong Song <yhs@fb.com> >> --- >> kernel/trace/bpf_trace.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >> index b4916ef388ad..eaa8a8ffbe46 100644 >> --- a/kernel/trace/bpf_trace.c >> +++ b/kernel/trace/bpf_trace.c >> @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) >> #endif >> #ifdef CONFIG_CGROUPS >> case BPF_FUNC_get_current_cgroup_id: >> - return &bpf_get_current_cgroup_id_proto; >> + return prog->aux->sleepable ? >> + NULL : &bpf_get_current_cgroup_id_proto; >> case BPF_FUNC_get_current_ancestor_cgroup_id: >> - return &bpf_get_current_ancestor_cgroup_id_proto; >> + return prog->aux->sleepable ? >> + NULL : &bpf_get_current_ancestor_cgroup_id_proto; > > This feels too extreme. I bet these helpers are as useful in sleepable > BPF progs as they are in non-sleepable ones. > > Why don't we just implement a variant of get_current_cgroup_id (and > the ancestor variant as well) which takes that cgroup_mutex lock, and > just pick the appropriate implementation. Wouldn't that work? This may not work. e.g., for sleepable fentry program, if the to-be-traced function is inside in cgroup_mutex, we will have a deadlock. Currently, affected program types are tracing/fentry.s, tracing/fexit.s, tracing/fmod_ret.s, lsm.s and syscall. For fmod_ret.s, lsm.s, they all have some kind of predefined attachment/context, we might be able to check all potential attachment points and allow these two helpers when attachment point is not surrounded by cgroup_mutex. For syscall program, we should be okay as it is called with bpf_prog_test_run interface but I am not sure why user wants a cgroup_id for that. > >> #endif >> case BPF_FUNC_send_signal: >> return &bpf_send_signal_proto; >> -- >> 2.30.2 >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs 2021-08-09 17:41 ` Yonghong Song @ 2021-08-09 17:58 ` Andrii Nakryiko 2021-08-09 20:29 ` Daniel Borkmann 0 siblings, 1 reply; 9+ messages in thread From: Andrii Nakryiko @ 2021-08-09 17:58 UTC (permalink / raw) To: Yonghong Song Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Kernel Team, syzbot+7ee5c2c09c284495371f On Mon, Aug 9, 2021 at 10:41 AM Yonghong Song <yhs@fb.com> wrote: > > > > On 8/9/21 10:18 AM, Andrii Nakryiko wrote: > > On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: > >> > >> Currently, if bpf_get_current_cgroup_id() or > >> bpf_get_current_ancestor_cgroup_id() helper is > >> called with sleepable programs e.g., sleepable > >> fentry/fmod_ret/fexit/lsm programs, a rcu warning > >> may appear. For example, if I added the following > >> hack to test_progs/test_lsm sleepable fentry program > >> test_sys_setdomainname: > >> > >> --- a/tools/testing/selftests/bpf/progs/lsm.c > >> +++ b/tools/testing/selftests/bpf/progs/lsm.c > >> @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) > >> int buf = 0; > >> long ret; > >> > >> + __u64 cg_id = bpf_get_current_cgroup_id(); > >> + if (cg_id == 1000) > >> + copy_test++; > >> + > >> ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); > >> if (len == -2 && ret == 0 && buf == 1234) > >> copy_test++; > >> > >> I will hit the following rcu warning: > >> > >> include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > >> other info that might help us debug this: > >> rcu_scheduler_active = 2, debug_locks = 1 > >> 1 lock held by test_progs/260: > >> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 > >> stack backtrace: > >> CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > >> Call Trace: > >> dump_stack_lvl+0x56/0x7b > >> bpf_get_current_cgroup_id+0x9c/0xb1 > >> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c > >> bpf_trampoline_6442469132_0+0x2d/0x1000 > >> __x64_sys_setdomainname+0x5/0x110 > >> do_syscall_64+0x3a/0x80 > >> entry_SYSCALL_64_after_hwframe+0x44/0xae > >> > >> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. > >> syzbot reported a similar issue in [1] for syscall program. Helper > >> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() > >> has the following callchain: > >> task_dfl_cgroup > >> task_css_set > >> task_css_set_check > >> and we have > >> #define task_css_set_check(task, __c) \ > >> rcu_dereference_check((task)->cgroups, \ > >> lockdep_is_held(&cgroup_mutex) || \ > >> lockdep_is_held(&css_set_lock) || \ > >> ((task)->flags & PF_EXITING) || (__c)) > >> Since cgroup_mutex/css_set_lock is not held and the task > >> is not existing and rcu read_lock is not held, a warning > >> will be issued. Note that bpf sleepable program is protected by > >> rcu_read_lock_trace(). > >> > >> To fix the issue, let us make these two helpers not available > >> to sleepable program. I marked the patch fixing 95b861a7935b > >> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > >> which added bpf_get_current_ancestor_cgroup_id() to > >> 5.14. I think backporting 5.14 is probably good enough as sleepable > >> progrems are not widely used. > >> > >> This patch should fix [1] as well since syscall program is a sleepable > >> program and bpf_get_current_cgroup_id() is not available to > >> syscall program any more. > >> > >> [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ > >> > >> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com > >> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") > >> Signed-off-by: Yonghong Song <yhs@fb.com> > >> --- > >> kernel/trace/bpf_trace.c | 6 ++++-- > >> 1 file changed, 4 insertions(+), 2 deletions(-) > >> > >> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > >> index b4916ef388ad..eaa8a8ffbe46 100644 > >> --- a/kernel/trace/bpf_trace.c > >> +++ b/kernel/trace/bpf_trace.c > >> @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > >> #endif > >> #ifdef CONFIG_CGROUPS > >> case BPF_FUNC_get_current_cgroup_id: > >> - return &bpf_get_current_cgroup_id_proto; > >> + return prog->aux->sleepable ? > >> + NULL : &bpf_get_current_cgroup_id_proto; > >> case BPF_FUNC_get_current_ancestor_cgroup_id: > >> - return &bpf_get_current_ancestor_cgroup_id_proto; > >> + return prog->aux->sleepable ? > >> + NULL : &bpf_get_current_ancestor_cgroup_id_proto; > > > > This feels too extreme. I bet these helpers are as useful in sleepable > > BPF progs as they are in non-sleepable ones. > > > > Why don't we just implement a variant of get_current_cgroup_id (and > > the ancestor variant as well) which takes that cgroup_mutex lock, and > > just pick the appropriate implementation. Wouldn't that work? > > This may not work. e.g., for sleepable fentry program, > if the to-be-traced function is inside in cgroup_mutex, we will > have a deadlock. We can also do preempty_disable() + rcu_read_lock() inside the helper itself, no? I mean in the new "sleepable" variant. > > Currently, affected program types are tracing/fentry.s, > tracing/fexit.s, tracing/fmod_ret.s, lsm.s and syscall. > For fmod_ret.s, lsm.s, they all have > some kind of predefined attachment/context, we might > be able to check all potential attachment points and > allow these two helpers when attachment point is not > surrounded by cgroup_mutex. I don't think it's feasible to know if any given attached kernel function can be called with cgroup_mutex taken. Static analysis will be too complicated and too restrictive. Runtime checks might be too expensive and/or not generic enough. But see above, we can do rcu_read_lock() inside the helper while preventing preemption, and it will behave the same way as if it was called from non-sleepable BPF prog. > For syscall program, we should be okay as it is > called with bpf_prog_test_run interface but I am > not sure why user wants a cgroup_id for that. > > > > >> #endif > >> case BPF_FUNC_send_signal: > >> return &bpf_send_signal_proto; > >> -- > >> 2.30.2 > >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs 2021-08-09 17:58 ` Andrii Nakryiko @ 2021-08-09 20:29 ` Daniel Borkmann 2021-08-09 22:34 ` Yonghong Song 0 siblings, 1 reply; 9+ messages in thread From: Daniel Borkmann @ 2021-08-09 20:29 UTC (permalink / raw) To: Andrii Nakryiko, Yonghong Song Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Kernel Team, syzbot+7ee5c2c09c284495371f On 8/9/21 7:58 PM, Andrii Nakryiko wrote: > On Mon, Aug 9, 2021 at 10:41 AM Yonghong Song <yhs@fb.com> wrote: >> On 8/9/21 10:18 AM, Andrii Nakryiko wrote: >>> On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: >>>> >>>> Currently, if bpf_get_current_cgroup_id() or >>>> bpf_get_current_ancestor_cgroup_id() helper is >>>> called with sleepable programs e.g., sleepable >>>> fentry/fmod_ret/fexit/lsm programs, a rcu warning >>>> may appear. For example, if I added the following >>>> hack to test_progs/test_lsm sleepable fentry program >>>> test_sys_setdomainname: >>>> >>>> --- a/tools/testing/selftests/bpf/progs/lsm.c >>>> +++ b/tools/testing/selftests/bpf/progs/lsm.c >>>> @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs) >>>> int buf = 0; >>>> long ret; >>>> >>>> + __u64 cg_id = bpf_get_current_cgroup_id(); >>>> + if (cg_id == 1000) >>>> + copy_test++; >>>> + >>>> ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); >>>> if (len == -2 && ret == 0 && buf == 1234) >>>> copy_test++; >>>> >>>> I will hit the following rcu warning: >>>> >>>> include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! >>>> other info that might help us debug this: >>>> rcu_scheduler_active = 2, debug_locks = 1 >>>> 1 lock held by test_progs/260: >>>> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0 >>>> stack backtrace: >>>> CPU: 1 PID: 260 Comm: test_progs Tainted: G O 5.14.0-rc2+ #176 >>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >>>> Call Trace: >>>> dump_stack_lvl+0x56/0x7b >>>> bpf_get_current_cgroup_id+0x9c/0xb1 >>>> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c >>>> bpf_trampoline_6442469132_0+0x2d/0x1000 >>>> __x64_sys_setdomainname+0x5/0x110 >>>> do_syscall_64+0x3a/0x80 >>>> entry_SYSCALL_64_after_hwframe+0x44/0xae >>>> >>>> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper. >>>> syzbot reported a similar issue in [1] for syscall program. Helper >>>> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() >>>> has the following callchain: >>>> task_dfl_cgroup >>>> task_css_set >>>> task_css_set_check >>>> and we have >>>> #define task_css_set_check(task, __c) \ >>>> rcu_dereference_check((task)->cgroups, \ >>>> lockdep_is_held(&cgroup_mutex) || \ >>>> lockdep_is_held(&css_set_lock) || \ >>>> ((task)->flags & PF_EXITING) || (__c)) >>>> Since cgroup_mutex/css_set_lock is not held and the task >>>> is not existing and rcu read_lock is not held, a warning >>>> will be issued. Note that bpf sleepable program is protected by >>>> rcu_read_lock_trace(). >>>> >>>> To fix the issue, let us make these two helpers not available >>>> to sleepable program. I marked the patch fixing 95b861a7935b >>>> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >>>> which added bpf_get_current_ancestor_cgroup_id() to >>>> 5.14. I think backporting 5.14 is probably good enough as sleepable >>>> progrems are not widely used. >>>> >>>> This patch should fix [1] as well since syscall program is a sleepable >>>> program and bpf_get_current_cgroup_id() is not available to >>>> syscall program any more. >>>> >>>> [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ >>>> >>>> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com >>>> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >>>> Signed-off-by: Yonghong Song <yhs@fb.com> >>>> --- >>>> kernel/trace/bpf_trace.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >>>> index b4916ef388ad..eaa8a8ffbe46 100644 >>>> --- a/kernel/trace/bpf_trace.c >>>> +++ b/kernel/trace/bpf_trace.c >>>> @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) >>>> #endif >>>> #ifdef CONFIG_CGROUPS >>>> case BPF_FUNC_get_current_cgroup_id: >>>> - return &bpf_get_current_cgroup_id_proto; >>>> + return prog->aux->sleepable ? >>>> + NULL : &bpf_get_current_cgroup_id_proto; >>>> case BPF_FUNC_get_current_ancestor_cgroup_id: >>>> - return &bpf_get_current_ancestor_cgroup_id_proto; >>>> + return prog->aux->sleepable ? >>>> + NULL : &bpf_get_current_ancestor_cgroup_id_proto; >>> >>> This feels too extreme. I bet these helpers are as useful in sleepable >>> BPF progs as they are in non-sleepable ones. >>> >>> Why don't we just implement a variant of get_current_cgroup_id (and >>> the ancestor variant as well) which takes that cgroup_mutex lock, and >>> just pick the appropriate implementation. Wouldn't that work? >> >> This may not work. e.g., for sleepable fentry program, >> if the to-be-traced function is inside in cgroup_mutex, we will >> have a deadlock. > > We can also do preempty_disable() + rcu_read_lock() inside the helper > itself, no? I mean in the new "sleepable" variant. Yep, we do that for example in c5dbb89fc2ac ("bpf: Expose bpf_get_socket_cookie to tracing programs") as well (the sock_gen_cookie() disables preemption). Thanks, Daniel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs 2021-08-09 20:29 ` Daniel Borkmann @ 2021-08-09 22:34 ` Yonghong Song 0 siblings, 0 replies; 9+ messages in thread From: Yonghong Song @ 2021-08-09 22:34 UTC (permalink / raw) To: Daniel Borkmann, Andrii Nakryiko Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Kernel Team, syzbot+7ee5c2c09c284495371f On 8/9/21 1:29 PM, Daniel Borkmann wrote: > On 8/9/21 7:58 PM, Andrii Nakryiko wrote: >> On Mon, Aug 9, 2021 at 10:41 AM Yonghong Song <yhs@fb.com> wrote: >>> On 8/9/21 10:18 AM, Andrii Nakryiko wrote: >>>> On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: >>>>> >>>>> Currently, if bpf_get_current_cgroup_id() or >>>>> bpf_get_current_ancestor_cgroup_id() helper is >>>>> called with sleepable programs e.g., sleepable >>>>> fentry/fmod_ret/fexit/lsm programs, a rcu warning >>>>> may appear. For example, if I added the following >>>>> hack to test_progs/test_lsm sleepable fentry program >>>>> test_sys_setdomainname: >>>>> >>>>> --- a/tools/testing/selftests/bpf/progs/lsm.c >>>>> +++ b/tools/testing/selftests/bpf/progs/lsm.c >>>>> @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, >>>>> struct pt_regs *regs) >>>>> int buf = 0; >>>>> long ret; >>>>> >>>>> + __u64 cg_id = bpf_get_current_cgroup_id(); >>>>> + if (cg_id == 1000) >>>>> + copy_test++; >>>>> + >>>>> ret = bpf_copy_from_user(&buf, sizeof(buf), ptr); >>>>> if (len == -2 && ret == 0 && buf == 1234) >>>>> copy_test++; >>>>> >>>>> I will hit the following rcu warning: >>>>> >>>>> include/linux/cgroup.h:481 suspicious rcu_dereference_check() >>>>> usage! >>>>> other info that might help us debug this: >>>>> rcu_scheduler_active = 2, debug_locks = 1 >>>>> 1 lock held by test_progs/260: >>>>> #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: >>>>> __bpf_prog_enter_sleepable+0x0/0xa0 >>>>> stack backtrace: >>>>> CPU: 1 PID: 260 Comm: test_progs Tainted: G O >>>>> 5.14.0-rc2+ #176 >>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>>>> rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 >>>>> Call Trace: >>>>> dump_stack_lvl+0x56/0x7b >>>>> bpf_get_current_cgroup_id+0x9c/0xb1 >>>>> bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c >>>>> bpf_trampoline_6442469132_0+0x2d/0x1000 >>>>> __x64_sys_setdomainname+0x5/0x110 >>>>> do_syscall_64+0x3a/0x80 >>>>> entry_SYSCALL_64_after_hwframe+0x44/0xae >>>>> >>>>> I can get similar warning using >>>>> bpf_get_current_ancestor_cgroup_id() helper. >>>>> syzbot reported a similar issue in [1] for syscall program. Helper >>>>> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id() >>>>> has the following callchain: >>>>> task_dfl_cgroup >>>>> task_css_set >>>>> task_css_set_check >>>>> and we have >>>>> #define task_css_set_check(task, >>>>> __c) \ >>>>> >>>>> rcu_dereference_check((task)->cgroups, \ >>>>> lockdep_is_held(&cgroup_mutex) >>>>> || \ >>>>> lockdep_is_held(&css_set_lock) >>>>> || \ >>>>> ((task)->flags & PF_EXITING) || (__c)) >>>>> Since cgroup_mutex/css_set_lock is not held and the task >>>>> is not existing and rcu read_lock is not held, a warning >>>>> will be issued. Note that bpf sleepable program is protected by >>>>> rcu_read_lock_trace(). >>>>> >>>>> To fix the issue, let us make these two helpers not available >>>>> to sleepable program. I marked the patch fixing 95b861a7935b >>>>> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing") >>>>> which added bpf_get_current_ancestor_cgroup_id() to >>>>> 5.14. I think backporting 5.14 is probably good enough as sleepable >>>>> progrems are not widely used. >>>>> >>>>> This patch should fix [1] as well since syscall program is a sleepable >>>>> program and bpf_get_current_cgroup_id() is not available to >>>>> syscall program any more. >>>>> >>>>> [1] >>>>> https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/ >>>>> >>>>> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com >>>>> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id >>>>> for tracing") >>>>> Signed-off-by: Yonghong Song <yhs@fb.com> >>>>> --- >>>>> kernel/trace/bpf_trace.c | 6 ++++-- >>>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >>>>> index b4916ef388ad..eaa8a8ffbe46 100644 >>>>> --- a/kernel/trace/bpf_trace.c >>>>> +++ b/kernel/trace/bpf_trace.c >>>>> @@ -1016,9 +1016,11 @@ bpf_tracing_func_proto(enum bpf_func_id >>>>> func_id, const struct bpf_prog *prog) >>>>> #endif >>>>> #ifdef CONFIG_CGROUPS >>>>> case BPF_FUNC_get_current_cgroup_id: >>>>> - return &bpf_get_current_cgroup_id_proto; >>>>> + return prog->aux->sleepable ? >>>>> + NULL : &bpf_get_current_cgroup_id_proto; >>>>> case BPF_FUNC_get_current_ancestor_cgroup_id: >>>>> - return &bpf_get_current_ancestor_cgroup_id_proto; >>>>> + return prog->aux->sleepable ? >>>>> + NULL : >>>>> &bpf_get_current_ancestor_cgroup_id_proto; >>>> >>>> This feels too extreme. I bet these helpers are as useful in sleepable >>>> BPF progs as they are in non-sleepable ones. >>>> >>>> Why don't we just implement a variant of get_current_cgroup_id (and >>>> the ancestor variant as well) which takes that cgroup_mutex lock, and >>>> just pick the appropriate implementation. Wouldn't that work? >>> >>> This may not work. e.g., for sleepable fentry program, >>> if the to-be-traced function is inside in cgroup_mutex, we will >>> have a deadlock. >> >> We can also do preempty_disable() + rcu_read_lock() inside the helper >> itself, no? I mean in the new "sleepable" variant. > > Yep, we do that for example in c5dbb89fc2ac ("bpf: Expose > bpf_get_socket_cookie > to tracing programs") as well (the sock_gen_cookie() disables preemption). I think rcu_read_lock() is enough. For all the cases I mentioned in the above, we already have migrate_disable(). Together with rcu_read_lock(), we should be okay. For non-sleepable programs (tracing,lsm), the helpers are protected with migrate_disable() and rcu_read_lock(). > > Thanks, > Daniel ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program 2021-08-09 6:03 [PATCH bpf v2 0/2] bpf: fix a couple of issues with syscall program Yonghong Song 2021-08-09 6:03 ` [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs Yonghong Song @ 2021-08-09 6:03 ` Yonghong Song 2021-08-09 17:16 ` Andrii Nakryiko 1 sibling, 1 reply; 9+ messages in thread From: Yonghong Song @ 2021-08-09 6:03 UTC (permalink / raw) To: bpf; +Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team Commit 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.") added support for syscall program, which is a sleepable program. But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(), which is needed to ensure proper rcu callback invocations. This patch added bpf_read_[un]lock_trace() properly. Fixes: 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.") Signed-off-by: Yonghong Song <yhs@fb.com> --- net/bpf/test_run.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 1cc75c811e24..caa16bf30fb5 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -7,6 +7,7 @@ #include <linux/vmalloc.h> #include <linux/etherdevice.h> #include <linux/filter.h> +#include <linux/rcupdate_trace.h> #include <linux/sched/signal.h> #include <net/bpf_sk_storage.h> #include <net/sock.h> @@ -951,7 +952,10 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, goto out; } } + + rcu_read_lock_trace(); retval = bpf_prog_run_pin_on_cpu(prog, ctx); + rcu_read_unlock_trace(); if (copy_to_user(&uattr->test.retval, &retval, sizeof(u32))) { err = -EFAULT; -- 2.30.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program 2021-08-09 6:03 ` [PATCH bpf v2 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program Yonghong Song @ 2021-08-09 17:16 ` Andrii Nakryiko 0 siblings, 0 replies; 9+ messages in thread From: Andrii Nakryiko @ 2021-08-09 17:16 UTC (permalink / raw) To: Yonghong Song Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, Kernel Team On Sun, Aug 8, 2021 at 11:03 PM Yonghong Song <yhs@fb.com> wrote: > > Commit 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.") > added support for syscall program, which is a sleepable program. > But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(), > which is needed to ensure proper rcu callback invocations. > This patch added bpf_read_[un]lock_trace() properly. > > Fixes: 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.") > Signed-off-by: Yonghong Song <yhs@fb.com> > --- LGTM. Acked-by: Andrii Nakryiko <andrii@kernel.org> > net/bpf/test_run.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c > index 1cc75c811e24..caa16bf30fb5 100644 > --- a/net/bpf/test_run.c > +++ b/net/bpf/test_run.c > @@ -7,6 +7,7 @@ > #include <linux/vmalloc.h> > #include <linux/etherdevice.h> > #include <linux/filter.h> > +#include <linux/rcupdate_trace.h> > #include <linux/sched/signal.h> > #include <net/bpf_sk_storage.h> > #include <net/sock.h> > @@ -951,7 +952,10 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, > goto out; > } > } > + > + rcu_read_lock_trace(); > retval = bpf_prog_run_pin_on_cpu(prog, ctx); > + rcu_read_unlock_trace(); > > if (copy_to_user(&uattr->test.retval, &retval, sizeof(u32))) { > err = -EFAULT; > -- > 2.30.2 > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-08-09 22:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-09 6:03 [PATCH bpf v2 0/2] bpf: fix a couple of issues with syscall program Yonghong Song 2021-08-09 6:03 ` [PATCH bpf v2 1/2] bpf: don't call bpf_get_current_[ancestor_]cgroup_id() in sleepable progs Yonghong Song 2021-08-09 17:18 ` Andrii Nakryiko 2021-08-09 17:41 ` Yonghong Song 2021-08-09 17:58 ` Andrii Nakryiko 2021-08-09 20:29 ` Daniel Borkmann 2021-08-09 22:34 ` Yonghong Song 2021-08-09 6:03 ` [PATCH bpf v2 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program Yonghong Song 2021-08-09 17:16 ` Andrii Nakryiko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.