[PATCH bpf v3 0/2] bpf: fix a couple of issues with syscall program

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf v3 0/2] bpf: fix a couple of issues with syscall program
@ 2021-08-09 23:51 Yonghong Song
  2021-08-09 23:51 ` [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers Yonghong Song
  2021-08-09 23:51 ` [PATCH bpf v3 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program Yonghong Song
  0 siblings, 2 replies; 8+ messages in thread
From: Yonghong Song @ 2021-08-09 23:51 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team

syzbot ([1]) reported a rcu warning for syscall program when
bpf_get_current_cgroup_id() is called in the program. The
bpf_get_current_cgroup_id() helper expects cgroup related
mutex or rcu_read_lock(). The sleepable program cannot be
protected with rcu read_lock, hence the warning.

To fix the problem, Patch 1 added rcu_read_lock() in
affected helpers, and Patch 2 added missing
bpf_read_lock_trace() for syscall type of bpf programs.

 [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/

Changelog:
  v2 -> v3:
    - use rcu_read_lock() protection for
      bpf_get_current_[ancestor_]cgroup_id() helper.
  v1 -> v2:
    - use bpf_read_lock_trace() instead of bpf_read_lock() for
      bpf_prog_run_pin_on_cpu().
    - disallow bpf_get_current_[ancestor_]cgroup_id() helper.

Yonghong Song (2):
  bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id()
    helpers
  bpf: add missing bpf_read_[un]lock_trace() for syscall program

 kernel/bpf/helpers.c | 12 ++++++++++--
 net/bpf/test_run.c   |  4 ++++
 2 files changed, 14 insertions(+), 2 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
  2021-08-09 23:51 [PATCH bpf v3 0/2] bpf: fix a couple of issues with syscall program Yonghong Song
@ 2021-08-09 23:51 ` Yonghong Song
  2021-08-10  0:25   ` Andrii Nakryiko
  2021-08-10  8:08   ` Daniel Borkmann
  2021-08-09 23:51 ` [PATCH bpf v3 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program Yonghong Song
  1 sibling, 2 replies; 8+ messages in thread
From: Yonghong Song @ 2021-08-09 23:51 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	kernel-team, syzbot+7ee5c2c09c284495371f

Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
          if (len == -2 && ret == 0 && buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)->cgroups,                          \
                   lockdep_is_held(&cgroup_mutex) ||                       \
                   lockdep_is_held(&css_set_lock) ||                       \
                   ((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a7935b
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().

 [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/

Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/helpers.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 62cf00383910..4567d2841133 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
 #ifdef CONFIG_CGROUPS
 BPF_CALL_0(bpf_get_current_cgroup_id)
 {
-	struct cgroup *cgrp = task_dfl_cgroup(current);
+	struct cgroup *cgrp;
+
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	rcu_read_unlock();
 
 	return cgroup_id(cgrp);
 }
@@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
 
 BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
 {
-	struct cgroup *cgrp = task_dfl_cgroup(current);
+	struct cgroup *cgrp;
 	struct cgroup *ancestor;
 
+	rcu_read_lock();
+	cgrp = task_dfl_cgroup(current);
+	rcu_read_unlock();
+
 	ancestor = cgroup_ancestor(cgrp, ancestor_level);
 	if (!ancestor)
 		return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH bpf v3 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program
  2021-08-09 23:51 [PATCH bpf v3 0/2] bpf: fix a couple of issues with syscall program Yonghong Song
  2021-08-09 23:51 ` [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers Yonghong Song
@ 2021-08-09 23:51 ` Yonghong Song
  2021-08-10  8:14   ` Daniel Borkmann
  1 sibling, 1 reply; 8+ messages in thread
From: Yonghong Song @ 2021-08-09 23:51 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, kernel-team

Commit 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.")
added support for syscall program, which is a sleepable program.
But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(),
which is needed to ensure proper rcu callback invocations.
This patch added bpf_read_[un]lock_trace() properly.

Fixes: 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.")
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 net/bpf/test_run.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 1cc75c811e24..caa16bf30fb5 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -7,6 +7,7 @@
 #include <linux/vmalloc.h>
 #include <linux/etherdevice.h>
 #include <linux/filter.h>
+#include <linux/rcupdate_trace.h>
 #include <linux/sched/signal.h>
 #include <net/bpf_sk_storage.h>
 #include <net/sock.h>
@@ -951,7 +952,10 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog,
 			goto out;
 		}
 	}
+
+	rcu_read_lock_trace();
 	retval = bpf_prog_run_pin_on_cpu(prog, ctx);
+	rcu_read_unlock_trace();
 
 	if (copy_to_user(&uattr->test.retval, &retval, sizeof(u32))) {
 		err = -EFAULT;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
  2021-08-09 23:51 ` [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers Yonghong Song
@ 2021-08-10  0:25   ` Andrii Nakryiko
  2021-08-10  8:08   ` Daniel Borkmann
  1 sibling, 0 replies; 8+ messages in thread
From: Andrii Nakryiko @ 2021-08-10  0:25 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Kernel Team, syzbot+7ee5c2c09c284495371f

On Mon, Aug 9, 2021 at 4:51 PM Yonghong Song <yhs@fb.com> wrote:
>
> Currently, if bpf_get_current_cgroup_id() or
> bpf_get_current_ancestor_cgroup_id() helper is
> called with sleepable programs e.g., sleepable
> fentry/fmod_ret/fexit/lsm programs, a rcu warning
> may appear. For example, if I added the following
> hack to test_progs/test_lsm sleepable fentry program
> test_sys_setdomainname:
>
>   --- a/tools/testing/selftests/bpf/progs/lsm.c
>   +++ b/tools/testing/selftests/bpf/progs/lsm.c
>   @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
>           int buf = 0;
>           long ret;
>
>   +       __u64 cg_id = bpf_get_current_cgroup_id();
>   +       if (cg_id == 1000)
>   +               copy_test++;
>   +
>           ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
>           if (len == -2 && ret == 0 && buf == 1234)
>                   copy_test++;
>
> I will hit the following rcu warning:
>
>   include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>   other info that might help us debug this:
>     rcu_scheduler_active = 2, debug_locks = 1
>     1 lock held by test_progs/260:
>       #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
>     stack backtrace:
>     CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
>     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>     Call Trace:
>       dump_stack_lvl+0x56/0x7b
>       bpf_get_current_cgroup_id+0x9c/0xb1
>       bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
>       bpf_trampoline_6442469132_0+0x2d/0x1000
>       __x64_sys_setdomainname+0x5/0x110
>       do_syscall_64+0x3a/0x80
>       entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
> syzbot reported a similar issue in [1] for syscall program. Helper
> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
> has the following callchain:
>    task_dfl_cgroup
>      task_css_set
>        task_css_set_check
> and we have
>    #define task_css_set_check(task, __c)                                   \
>            rcu_dereference_check((task)->cgroups,                          \
>                    lockdep_is_held(&cgroup_mutex) ||                       \
>                    lockdep_is_held(&css_set_lock) ||                       \
>                    ((task)->flags & PF_EXITING) || (__c))
> Since cgroup_mutex/css_set_lock is not held and the task
> is not existing and rcu read_lock is not held, a warning
> will be issued. Note that bpf sleepable program is protected by
> rcu_read_lock_trace().
>
> The above sleepable bpf programs are already protected
> by migrate_disable(). Adding rcu_read_lock() in these
> two helpers will silence the above warning.
> I marked the patch fixing 95b861a7935b
> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> which added bpf_get_current_ancestor_cgroup_id() to tracing programs
> in 5.14. I think backporting 5.14 is probably good enough as sleepable
> progrems are not widely used.
>
> This patch should fix [1] as well since syscall program is a sleepable
> program protected with migrate_disable().
>
>  [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
>
> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---

LGTM, thanks!

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  kernel/bpf/helpers.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 62cf00383910..4567d2841133 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
>  #ifdef CONFIG_CGROUPS
>  BPF_CALL_0(bpf_get_current_cgroup_id)
>  {
> -       struct cgroup *cgrp = task_dfl_cgroup(current);
> +       struct cgroup *cgrp;
> +
> +       rcu_read_lock();
> +       cgrp = task_dfl_cgroup(current);
> +       rcu_read_unlock();
>
>         return cgroup_id(cgrp);
>  }
> @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>
>  BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
>  {
> -       struct cgroup *cgrp = task_dfl_cgroup(current);
> +       struct cgroup *cgrp;
>         struct cgroup *ancestor;
>
> +       rcu_read_lock();
> +       cgrp = task_dfl_cgroup(current);
> +       rcu_read_unlock();
> +
>         ancestor = cgroup_ancestor(cgrp, ancestor_level);
>         if (!ancestor)
>                 return 0;
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
  2021-08-09 23:51 ` [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers Yonghong Song
  2021-08-10  0:25   ` Andrii Nakryiko
@ 2021-08-10  8:08   ` Daniel Borkmann
  2021-08-10 16:32     ` Paul E. McKenney
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel Borkmann @ 2021-08-10  8:08 UTC (permalink / raw)
  To: Yonghong Song, bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, kernel-team,
	syzbot+7ee5c2c09c284495371f, paulmck

[ +Paul ]

On 8/10/21 1:51 AM, Yonghong Song wrote:
[...]
> I will hit the following rcu warning:
> 
>    include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>    other info that might help us debug this:
>      rcu_scheduler_active = 2, debug_locks = 1
>      1 lock held by test_progs/260:
>        #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
>      stack backtrace:
>      CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
>      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>      Call Trace:
>        dump_stack_lvl+0x56/0x7b
>        bpf_get_current_cgroup_id+0x9c/0xb1
>        bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
>        bpf_trampoline_6442469132_0+0x2d/0x1000
>        __x64_sys_setdomainname+0x5/0x110
>        do_syscall_64+0x3a/0x80
>        entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
> syzbot reported a similar issue in [1] for syscall program. Helper
> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
> has the following callchain:
>     task_dfl_cgroup
>       task_css_set
>         task_css_set_check
> and we have
>     #define task_css_set_check(task, __c)                                   \
>             rcu_dereference_check((task)->cgroups,                          \
>                     lockdep_is_held(&cgroup_mutex) ||                       \
>                     lockdep_is_held(&css_set_lock) ||                       \
>                     ((task)->flags & PF_EXITING) || (__c))
> Since cgroup_mutex/css_set_lock is not held and the task
> is not existing and rcu read_lock is not held, a warning
> will be issued. Note that bpf sleepable program is protected by
> rcu_read_lock_trace().
> 
> The above sleepable bpf programs are already protected
> by migrate_disable(). Adding rcu_read_lock() in these
> two helpers will silence the above warning.
> I marked the patch fixing 95b861a7935b
> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> which added bpf_get_current_ancestor_cgroup_id() to tracing programs
> in 5.14. I think backporting 5.14 is probably good enough as sleepable
> progrems are not widely used.
> 
> This patch should fix [1] as well since syscall program is a sleepable
> program protected with migrate_disable().
> 
>   [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
> 
> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>   kernel/bpf/helpers.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 62cf00383910..4567d2841133 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> -	struct cgroup *cgrp = task_dfl_cgroup(current);
> +	struct cgroup *cgrp;
> +
> +	rcu_read_lock();
> +	cgrp = task_dfl_cgroup(current);
> +	rcu_read_unlock();
>   
>   	return cgroup_id(cgrp);

I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace()
context, then the above is racy given you access the memory via cgroup_id() outside of it,
same below. If rcu_read_lock_trace() is enough and the above is really just to silence the
'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from
task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as
a cleaner workaround], which one is it?

>   }
> @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>   
>   BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
>   {
> -	struct cgroup *cgrp = task_dfl_cgroup(current);
> +	struct cgroup *cgrp;
>   	struct cgroup *ancestor;
>   
> +	rcu_read_lock();
> +	cgrp = task_dfl_cgroup(current);
> +	rcu_read_unlock();
> +
>   	ancestor = cgroup_ancestor(cgrp, ancestor_level);
>   	if (!ancestor)
>   		return 0;
> 

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program
  2021-08-09 23:51 ` [PATCH bpf v3 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program Yonghong Song
@ 2021-08-10  8:14   ` Daniel Borkmann
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Borkmann @ 2021-08-10  8:14 UTC (permalink / raw)
  To: Yonghong Song, bpf; +Cc: Alexei Starovoitov, Andrii Nakryiko, kernel-team

On 8/10/21 1:51 AM, Yonghong Song wrote:
> Commit 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.")
> added support for syscall program, which is a sleepable program.
> But the program run missed bpf_read_lock_trace()/bpf_read_unlock_trace(),
> which is needed to ensure proper rcu callback invocations.
> This patch added bpf_read_[un]lock_trace() properly.
> 
> Fixes: 79a7f8bdb159d ("bpf: Introduce bpf_sys_bpf() helper and program type.")
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Yonghong Song <yhs@fb.com>

(Took this one in already while we clarify 1/2, thanks!)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
  2021-08-10  8:08   ` Daniel Borkmann
@ 2021-08-10 16:32     ` Paul E. McKenney
  2021-08-10 16:36       ` Yonghong Song
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. McKenney @ 2021-08-10 16:32 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Yonghong Song, bpf, Alexei Starovoitov, Andrii Nakryiko,
	kernel-team, syzbot+7ee5c2c09c284495371f

On Tue, Aug 10, 2021 at 10:08:58AM +0200, Daniel Borkmann wrote:
> [ +Paul ]
> 
> On 8/10/21 1:51 AM, Yonghong Song wrote:
> [...]
> > I will hit the following rcu warning:
> > 
> >    include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> >    other info that might help us debug this:
> >      rcu_scheduler_active = 2, debug_locks = 1
> >      1 lock held by test_progs/260:
> >        #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
> >      stack backtrace:
> >      CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
> >      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> >      Call Trace:
> >        dump_stack_lvl+0x56/0x7b
> >        bpf_get_current_cgroup_id+0x9c/0xb1
> >        bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
> >        bpf_trampoline_6442469132_0+0x2d/0x1000
> >        __x64_sys_setdomainname+0x5/0x110
> >        do_syscall_64+0x3a/0x80
> >        entry_SYSCALL_64_after_hwframe+0x44/0xae
> > 
> > I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
> > syzbot reported a similar issue in [1] for syscall program. Helper
> > bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
> > has the following callchain:
> >     task_dfl_cgroup
> >       task_css_set
> >         task_css_set_check
> > and we have
> >     #define task_css_set_check(task, __c)                                   \
> >             rcu_dereference_check((task)->cgroups,                          \
> >                     lockdep_is_held(&cgroup_mutex) ||                       \
> >                     lockdep_is_held(&css_set_lock) ||                       \
> >                     ((task)->flags & PF_EXITING) || (__c))
> > Since cgroup_mutex/css_set_lock is not held and the task
> > is not existing and rcu read_lock is not held, a warning
> > will be issued. Note that bpf sleepable program is protected by
> > rcu_read_lock_trace().
> > 
> > The above sleepable bpf programs are already protected
> > by migrate_disable(). Adding rcu_read_lock() in these
> > two helpers will silence the above warning.
> > I marked the patch fixing 95b861a7935b
> > ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> > which added bpf_get_current_ancestor_cgroup_id() to tracing programs
> > in 5.14. I think backporting 5.14 is probably good enough as sleepable
> > progrems are not widely used.
> > 
> > This patch should fix [1] as well since syscall program is a sleepable
> > program protected with migrate_disable().
> > 
> >   [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
> > 
> > Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
> > Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
> > Signed-off-by: Yonghong Song <yhs@fb.com>
> > ---
> >   kernel/bpf/helpers.c | 12 ++++++++++--
> >   1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 62cf00383910..4567d2841133 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
> >   #ifdef CONFIG_CGROUPS
> >   BPF_CALL_0(bpf_get_current_cgroup_id)
> >   {
> > -	struct cgroup *cgrp = task_dfl_cgroup(current);
> > +	struct cgroup *cgrp;
> > +
> > +	rcu_read_lock();
> > +	cgrp = task_dfl_cgroup(current);
> > +	rcu_read_unlock();
> >   	return cgroup_id(cgrp);
> 
> I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace()
> context, then the above is racy given you access the memory via cgroup_id() outside of it,
> same below. If rcu_read_lock_trace() is enough and the above is really just to silence the
> 'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from
> task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as
> a cleaner workaround], which one is it?

At first glance, something like this would work from an RCU perspective:

BPF_CALL_0(bpf_get_current_cgroup_id)
{
	struct cgroup *cgrp = task_dfl_cgroup(current);
	struct cgroup *cgrp;
	u64 ret;

	rcu_read_lock();
	cgrp = task_dfl_cgroup(current);
 	ret = cgroup_id(cgrp);
	rcu_read_unlock();
 	return ret;
}

If I am reading the code correctly, rcu_read_lock() is required to keep
the cgroup from going away, and the rcu_read_lock_trace() is needed in
addition in order to keep the BPF program from going away.

There might well be better ways to do this, but the above obeys the
letter of the law.  Or at least the law as I understand it.  ;-)

Or does the fact that a cgroup contains a given task prevent that cgroup
from going away?  (I -think- that tasks can be removed from cgroups
without the cooperation of those tasks, but then again there is much
about cgroups that I do not know.)

							Thanx, Paul

> >   }
> > @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
> >   BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
> >   {
> > -	struct cgroup *cgrp = task_dfl_cgroup(current);
> > +	struct cgroup *cgrp;
> >   	struct cgroup *ancestor;
> > +	rcu_read_lock();
> > +	cgrp = task_dfl_cgroup(current);
> > +	rcu_read_unlock();
> > +
> >   	ancestor = cgroup_ancestor(cgrp, ancestor_level);
> >   	if (!ancestor)
> >   		return 0;
> > 
> 
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers
  2021-08-10 16:32     ` Paul E. McKenney
@ 2021-08-10 16:36       ` Yonghong Song
  0 siblings, 0 replies; 8+ messages in thread
From: Yonghong Song @ 2021-08-10 16:36 UTC (permalink / raw)
  To: paulmck, Daniel Borkmann
  Cc: bpf, Alexei Starovoitov, Andrii Nakryiko, kernel-team,
	syzbot+7ee5c2c09c284495371f, htejun

Add Tejun to provide more clarification on cgroup side.

On 8/10/21 9:32 AM, Paul E. McKenney wrote:
> On Tue, Aug 10, 2021 at 10:08:58AM +0200, Daniel Borkmann wrote:
>> [ +Paul ]
>>
>> On 8/10/21 1:51 AM, Yonghong Song wrote:
>> [...]
>>> I will hit the following rcu warning:
>>>
>>>     include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>>>     other info that might help us debug this:
>>>       rcu_scheduler_active = 2, debug_locks = 1
>>>       1 lock held by test_progs/260:
>>>         #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
>>>       stack backtrace:
>>>       CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ #176
>>>       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>>       Call Trace:
>>>         dump_stack_lvl+0x56/0x7b
>>>         bpf_get_current_cgroup_id+0x9c/0xb1
>>>         bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
>>>         bpf_trampoline_6442469132_0+0x2d/0x1000
>>>         __x64_sys_setdomainname+0x5/0x110
>>>         do_syscall_64+0x3a/0x80
>>>         entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>
>>> I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
>>> syzbot reported a similar issue in [1] for syscall program. Helper
>>> bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
>>> has the following callchain:
>>>      task_dfl_cgroup
>>>        task_css_set
>>>          task_css_set_check
>>> and we have
>>>      #define task_css_set_check(task, __c)                                   \
>>>              rcu_dereference_check((task)->cgroups,                          \
>>>                      lockdep_is_held(&cgroup_mutex) ||                       \
>>>                      lockdep_is_held(&css_set_lock) ||                       \
>>>                      ((task)->flags & PF_EXITING) || (__c))
>>> Since cgroup_mutex/css_set_lock is not held and the task
>>> is not existing and rcu read_lock is not held, a warning
>>> will be issued. Note that bpf sleepable program is protected by
>>> rcu_read_lock_trace().
>>>
>>> The above sleepable bpf programs are already protected
>>> by migrate_disable(). Adding rcu_read_lock() in these
>>> two helpers will silence the above warning.
>>> I marked the patch fixing 95b861a7935b
>>> ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
>>> which added bpf_get_current_ancestor_cgroup_id() to tracing programs
>>> in 5.14. I think backporting 5.14 is probably good enough as sleepable
>>> progrems are not widely used.
>>>
>>> This patch should fix [1] as well since syscall program is a sleepable
>>> program protected with migrate_disable().
>>>
>>>    [1] https://lore.kernel.org/bpf/0000000000006d5cab05c7d9bb87@google.com/
>>>
>>> Reported-by: syzbot+7ee5c2c09c284495371f@syzkaller.appspotmail.com
>>> Fixes: 95b861a7935b ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>    kernel/bpf/helpers.c | 12 ++++++++++--
>>>    1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index 62cf00383910..4567d2841133 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -353,7 +353,11 @@ const struct bpf_func_proto bpf_jiffies64_proto = {
>>>    #ifdef CONFIG_CGROUPS
>>>    BPF_CALL_0(bpf_get_current_cgroup_id)
>>>    {
>>> -	struct cgroup *cgrp = task_dfl_cgroup(current);
>>> +	struct cgroup *cgrp;
>>> +
>>> +	rcu_read_lock();
>>> +	cgrp = task_dfl_cgroup(current);
>>> +	rcu_read_unlock();
>>>    	return cgroup_id(cgrp);
>>
>> I'm a bit confused, if cgroup object relies rcu_read_lock() and not rcu_read_lock_trace()
>> context, then the above is racy given you access the memory via cgroup_id() outside of it,
>> same below. If rcu_read_lock_trace() is enough and the above is really just to silence the
>> 'suspicious rcu_dereference_check() usage' splat, then the rcu_dereference_check() from
>> task_css_set_check() should be extended to check for _trace() flavor instead [meaning, as
>> a cleaner workaround], which one is it?

Thanks for pointing out. Definitely a bug on my side. rcu_read_lock() 
region should include cgroup_id(cgrp) as well.

> 
> At first glance, something like this would work from an RCU perspective:
> 
> BPF_CALL_0(bpf_get_current_cgroup_id)
> {
> 	struct cgroup *cgrp = task_dfl_cgroup(current);
> 	struct cgroup *cgrp;
> 	u64 ret;
> 
> 	rcu_read_lock();
> 	cgrp = task_dfl_cgroup(current);
>   	ret = cgroup_id(cgrp);
> 	rcu_read_unlock();
>   	return ret;
> }
> 
> If I am reading the code correctly, rcu_read_lock() is required to keep
> the cgroup from going away, and the rcu_read_lock_trace() is needed in
> addition in order to keep the BPF program from going away.
> 
> There might well be better ways to do this, but the above obeys the
> letter of the law.  Or at least the law as I understand it.  ;-)
> 
> Or does the fact that a cgroup contains a given task prevent that cgroup
> from going away?  (I -think- that tasks can be removed from cgroups
> without the cooperation of those tasks, but then again there is much
> about cgroups that I do not know.)
> 
> 							Thanx, Paul
> 
>>>    }
>>> @@ -366,9 +370,13 @@ const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
>>>    BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
>>>    {
>>> -	struct cgroup *cgrp = task_dfl_cgroup(current);
>>> +	struct cgroup *cgrp;
>>>    	struct cgroup *ancestor;
>>> +	rcu_read_lock();
>>> +	cgrp = task_dfl_cgroup(current);
>>> +	rcu_read_unlock();
>>> +
>>>    	ancestor = cgroup_ancestor(cgrp, ancestor_level);
>>>    	if (!ancestor)
>>>    		return 0;
>>>
>>
>> Thanks,
>> Daniel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-08-10 16:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09 23:51 [PATCH bpf v3 0/2] bpf: fix a couple of issues with syscall program Yonghong Song
2021-08-09 23:51 ` [PATCH bpf v3 1/2] bpf: add rcu read_lock in bpf_get_current_[ancestor_]cgroup_id() helpers Yonghong Song
2021-08-10  0:25   ` Andrii Nakryiko
2021-08-10  8:08   ` Daniel Borkmann
2021-08-10 16:32     ` Paul E. McKenney
2021-08-10 16:36       ` Yonghong Song
2021-08-09 23:51 ` [PATCH bpf v3 2/2] bpf: add missing bpf_read_[un]lock_trace() for syscall program Yonghong Song
2021-08-10  8:14   ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.