All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
@ 2019-02-20 23:01 Daniel Borkmann
  2019-02-20 23:59 ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Borkmann @ 2019-02-20 23:01 UTC (permalink / raw)
  To: ast; +Cc: keescook, netdev, Daniel Borkmann

In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
a check was added for BPF_PROG_RUN() that for every invocation preemption is
disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
not count for seccomp because only cBPF -> eBPF is loaded here and it does
not make use of any functionality that would require this assertion. Fix this
false positive by adding and using SECCOMP_RUN() variant that does not have
the cant_sleep(); check.

Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kees Cook <keescook@chromium.org>
---
 v1 -> v2:
  - More elaborate comment and added SECCOMP_RUN
  - Added Kees' ACK from earlier v1 patch

 include/linux/filter.h | 22 +++++++++++++++++++++-
 kernel/seccomp.c       |  2 +-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f32b3ec..cd7f957 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -533,7 +533,27 @@ struct sk_filter {
 	struct bpf_prog	*prog;
 };
 
-#define BPF_PROG_RUN(filter, ctx)  ({ cant_sleep(); (*(filter)->bpf_func)(ctx, (filter)->insnsi); })
+#define __bpf_prog_run(prog, ctx)		\
+	(*(prog)->bpf_func)(ctx, (prog)->insnsi)
+#define __bpf_prog_run__may_preempt(prog, ctx)	\
+	({ __bpf_prog_run(prog, ctx); })
+#define __bpf_prog_run__non_preempt(prog, ctx)	\
+	({ cant_sleep(); __bpf_prog_run(prog, ctx); })
+
+/* Preemption must be disabled when native eBPF programs are run in
+ * order to not break per CPU data structures, for example; make
+ * sure to throw a stack trace under CONFIG_DEBUG_ATOMIC_SLEEP when
+ * we find that preemption is still enabled.
+ *
+ * Only exception today is seccomp, where progs have transitioned
+ * from cBPF to eBPF, and native eBPF is _not_ supported. They can
+ * safely run with preemption enabled.
+ */
+#define BPF_PROG_RUN(prog, ctx)			\
+	__bpf_prog_run__non_preempt(prog, ctx)
+
+#define SECCOMP_RUN(prog, ctx)			\
+	__bpf_prog_run__may_preempt(prog, ctx)
 
 #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
 
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e815781..701a3cf 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -268,7 +268,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
 	 * value always takes priority (ignoring the DATA).
 	 */
 	for (; f; f = f->prev) {
-		u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
+		u32 cur_ret = SECCOMP_RUN(f->prog, sd);
 
 		if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
 			ret = cur_ret;
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-20 23:01 [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs Daniel Borkmann
@ 2019-02-20 23:59 ` Alexei Starovoitov
  2019-02-21  4:02   ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2019-02-20 23:59 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, keescook, netdev

On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> not make use of any functionality that would require this assertion. Fix this
> false positive by adding and using SECCOMP_RUN() variant that does not have
> the cant_sleep(); check.
> 
> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Kees Cook <keescook@chromium.org>

Applied, Thanks


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-20 23:59 ` Alexei Starovoitov
@ 2019-02-21  4:02   ` Alexei Starovoitov
  2019-02-21  5:31     ` Kees Cook
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2019-02-21  4:02 UTC (permalink / raw)
  To: Daniel Borkmann, Jann Horn, Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, Network Development

On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > not make use of any functionality that would require this assertion. Fix this
> > false positive by adding and using SECCOMP_RUN() variant that does not have
> > the cant_sleep(); check.
> >
> > Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > Acked-by: Kees Cook <keescook@chromium.org>
>
> Applied, Thanks

Actually I think it's a wrong approach to go long term.
I'm thinking to revert it.
I think it's better to disable preemption for duration of
seccomp cbpf prog.
It's short and there is really no reason for it to be preemptible.
When seccomp switches to ebpf we'll have this weird inconsistency.
Let's just disable preemption for seccomp as well.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-21  4:02   ` Alexei Starovoitov
@ 2019-02-21  5:31     ` Kees Cook
  2019-02-21  8:53       ` Daniel Borkmann
  0 siblings, 1 reply; 11+ messages in thread
From: Kees Cook @ 2019-02-21  5:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Jann Horn, Andy Lutomirski, Alexei Starovoitov,
	Network Development

On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > > In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > > disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > > not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > > not make use of any functionality that would require this assertion. Fix this
> > > false positive by adding and using SECCOMP_RUN() variant that does not have
> > > the cant_sleep(); check.
> > >
> > > Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > > Acked-by: Kees Cook <keescook@chromium.org>
> >
> > Applied, Thanks
>
> Actually I think it's a wrong approach to go long term.
> I'm thinking to revert it.
> I think it's better to disable preemption for duration of
> seccomp cbpf prog.
> It's short and there is really no reason for it to be preemptible.
> When seccomp switches to ebpf we'll have this weird inconsistency.
> Let's just disable preemption for seccomp as well.

A lot of changes will be needed for seccomp ebpf -- not the least of
which is convincing me there is a use-case. ;)

But the main issue is that I'm not a huge fan of dropping two
barriers() across syscall entry. That seems pretty heavy-duty for
something that is literally not needed right now.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-21  5:31     ` Kees Cook
@ 2019-02-21  8:53       ` Daniel Borkmann
  2019-02-21 12:56         ` Jann Horn
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Borkmann @ 2019-02-21  8:53 UTC (permalink / raw)
  To: Kees Cook, Alexei Starovoitov
  Cc: Jann Horn, Andy Lutomirski, Alexei Starovoitov, Network Development

On 02/21/2019 06:31 AM, Kees Cook wrote:
> On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>>
>>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
>>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
>>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
>>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
>>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
>>>> not make use of any functionality that would require this assertion. Fix this
>>>> false positive by adding and using SECCOMP_RUN() variant that does not have
>>>> the cant_sleep(); check.
>>>>
>>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
>>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>> Acked-by: Kees Cook <keescook@chromium.org>
>>>
>>> Applied, Thanks
>>
>> Actually I think it's a wrong approach to go long term.
>> I'm thinking to revert it.
>> I think it's better to disable preemption for duration of
>> seccomp cbpf prog.
>> It's short and there is really no reason for it to be preemptible.
>> When seccomp switches to ebpf we'll have this weird inconsistency.
>> Let's just disable preemption for seccomp as well.
> 
> A lot of changes will be needed for seccomp ebpf -- not the least of
> which is convincing me there is a use-case. ;)
> 
> But the main issue is that I'm not a huge fan of dropping two
> barriers() across syscall entry. That seems pretty heavy-duty for
> something that is literally not needed right now.

Yeah, I think it's okay to add once actually technically needed. Last
time I looked, if I recall correctly, at least Chrome installs some
heavy duty seccomp programs that go close to prog limit.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-21  8:53       ` Daniel Borkmann
@ 2019-02-21 12:56         ` Jann Horn
  2019-02-21 19:29           ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Jann Horn @ 2019-02-21 12:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kees Cook, Alexei Starovoitov, Andy Lutomirski,
	Alexei Starovoitov, Network Development

On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 02/21/2019 06:31 AM, Kees Cook wrote:
> > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> >>
> >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >>>
> >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> >>>> not make use of any functionality that would require this assertion. Fix this
> >>>> false positive by adding and using SECCOMP_RUN() variant that does not have
> >>>> the cant_sleep(); check.
> >>>>
> >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >>>> Acked-by: Kees Cook <keescook@chromium.org>
> >>>
> >>> Applied, Thanks
> >>
> >> Actually I think it's a wrong approach to go long term.
> >> I'm thinking to revert it.
> >> I think it's better to disable preemption for duration of
> >> seccomp cbpf prog.
> >> It's short and there is really no reason for it to be preemptible.
> >> When seccomp switches to ebpf we'll have this weird inconsistency.
> >> Let's just disable preemption for seccomp as well.
> >
> > A lot of changes will be needed for seccomp ebpf -- not the least of
> > which is convincing me there is a use-case. ;)
> >
> > But the main issue is that I'm not a huge fan of dropping two
> > barriers() across syscall entry. That seems pretty heavy-duty for
> > something that is literally not needed right now.
>
> Yeah, I think it's okay to add once actually technically needed. Last
> time I looked, if I recall correctly, at least Chrome installs some
> heavy duty seccomp programs that go close to prog limit.

Half of that is probably because that seccomp BPF code is so
inefficient, though.

This snippet shows that those programs constantly recheck the high
halves of arguments:

014e if args[1].high == 0x00000000: [true +3, false +0]
0153   if args[1].low == 0x00000003: [true +135, false +0] -> ret
ALLOW (syscalls: fcntl)
0155   if args[1].high == 0x00000000: [true +3, false +0]
015a     if args[1].low == 0x00000001: [true +128, false +0] -> ret
ALLOW (syscalls: fcntl)
015c     if args[1].high == 0x00000000: [true +3, false +0]
0161       if args[1].low == 0x00000002: [true +121, false +0] -> ret
ALLOW (syscalls: fcntl)
0163       if args[1].high == 0x00000000: [true +3, false +0]
0168         if args[1].low == 0x00000006: [true +114, false +0] ->
ret ALLOW (syscalls: fcntl)
016a         if args[1].high == 0x00000000: [true +3, false +0]
016f           if args[1].low == 0x00000007: [true +107, false +0] ->
ret ALLOW (syscalls: fcntl)
0171           if args[1].high == 0x00000000: [true +3, false +0]
0176             if args[1].low == 0x00000005: [true +100, false +0]
-> ret ALLOW (syscalls: fcntl)
0178             if args[1].high == 0x00000000: [true +3, false +0]
017d               if args[1].low == 0x00000000: [true +93, false +0]
-> ret ALLOW (syscalls: fcntl)
017f               if args[1].high == 0x00000000: [true +3, false +0]
0184                 if args[1].low == 0x00000406: [true +86, false
+0] -> ret ALLOW (syscalls: fcntl)
0186                 if args[1].high == 0x00000000: [true +3, false +0]
018b                   if args[1].low != 0x00000004: [true +80, false
+0] -> ret TRAP
018d                   if args[2].high != 0x00000000: [true +78, false
+0] -> ret TRAP
018f                   if args[2].low COMMON-BITS 0xffe363fc: [true
+76, false +75] -> ret TRAP
01db                   ret ALLOW (syscalls: fcntl)

Some of the generated code is pointless because all reachable code
from that point on has the same outcome (the last "ret ALLOW" in the
following sample is unreachable because they've already checked that
the high bit of the low half is set, so the low half can't be 3):

00c9           if args[0].low == 0x00000005: [true +16, false +0] ->
ret ALLOW (syscalls: clock_gettime, clock_getres)
00cb           if args[0].high == 0x00000000: [true +3, false +0]
00d0             if args[0].low == 0x00000003: [true +9, false +0] ->
ret ALLOW (syscalls: clock_gettime, clock_getres)
01dc             ret TRAP
00cc           if args[0].high != 0xffffffff: [true +8, false +0] -> ret TRAP
00ce           if args[0].low COMMON-BITS 0x80000000: [true +0, false +6]
00d0             if args[0].low == 0x00000003: [true +9, false +0] ->
ret ALLOW (syscalls: clock_gettime, clock_getres)
01dc             ret TRAP
01d7           ret TRAP

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-21 12:56         ` Jann Horn
@ 2019-02-21 19:29           ` Alexei Starovoitov
  2019-02-21 19:53             ` Kees Cook
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2019-02-21 19:29 UTC (permalink / raw)
  To: Jann Horn
  Cc: Daniel Borkmann, Kees Cook, Andy Lutomirski, Alexei Starovoitov,
	Network Development

On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote:
> On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > On 02/21/2019 06:31 AM, Kees Cook wrote:
> > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > >>
> > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> > >> <alexei.starovoitov@gmail.com> wrote:
> > >>>
> > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > >>>> not make use of any functionality that would require this assertion. Fix this
> > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have
> > >>>> the cant_sleep(); check.
> > >>>>
> > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > >>>> Acked-by: Kees Cook <keescook@chromium.org>
> > >>>
> > >>> Applied, Thanks
> > >>
> > >> Actually I think it's a wrong approach to go long term.
> > >> I'm thinking to revert it.
> > >> I think it's better to disable preemption for duration of
> > >> seccomp cbpf prog.
> > >> It's short and there is really no reason for it to be preemptible.
> > >> When seccomp switches to ebpf we'll have this weird inconsistency.
> > >> Let's just disable preemption for seccomp as well.
> > >
> > > A lot of changes will be needed for seccomp ebpf -- not the least of
> > > which is convincing me there is a use-case. ;)
> > >
> > > But the main issue is that I'm not a huge fan of dropping two
> > > barriers() across syscall entry. That seems pretty heavy-duty for
> > > something that is literally not needed right now.
> >
> > Yeah, I think it's okay to add once actually technically needed. Last
> > time I looked, if I recall correctly, at least Chrome installs some
> > heavy duty seccomp programs that go close to prog limit.
> 
> Half of that is probably because that seccomp BPF code is so
> inefficient, though.
> 
> This snippet shows that those programs constantly recheck the high
> halves of arguments:
> 
> Some of the generated code is pointless because all reachable code
> from that point on has the same outcome (the last "ret ALLOW" in the
> following sample is unreachable because they've already checked that
> the high bit of the low half is set, so the low half can't be 3):

and with ebpf these optimizations will be available for free
because llvm will remove unnecessary loads and simplify branches.
There is no technical reason not to use ebpf in seccomp.

When we discussed preemption of classic vs extended in socket filters
context we agreed to make it a requirement that preemption must be
disabled though it's not strictly necessary. RX side of socket filters
was already non-preempt while TX was preemptible.
We must not make an exception of this rule for seccomp.
Hence I've reverted this commit.

Here is the actual fix for seccomp:
From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 21 Feb 2019 10:40:14 -0800
Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog

All BPF programs must be called with preemption disabled.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/seccomp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e815781ed751..a43c601ac252 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
         * All filters in the list are evaluated and the lowest BPF return
         * value always takes priority (ignoring the DATA).
         */
+       preempt_disable();
        for (; f; f = f->prev) {
                u32 cur_ret = BPF_PROG_RUN(f->prog, sd);

@@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
                        *match = f;
                }
        }
+       preempt_enable();
        return ret;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
--

Doing per-cpu increment of cache hot data is practically free and it makes seccomp
play by the rules.

I'm working on another set of patches that will introduce bpf program stats as:
#define BPF_RPOG_RUN(prog, ctx) ({ \
+	u32 ret;						\
+       cant_sleep();                                           \
+	if (static_branch_unlikely(&bpf_stats_enabled_key)) {	\
+		u64 start = sched_clock();			\
+		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
+		this_cpu_inc(prog->aux->stats->cnt);		\
+		this_cpu_add(prog->aux->stats->nsecs,		\
+			     sched_clock() - start);		\
+	} else {						\
+		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
+	}							\
+	ret; })

and seccomp progs will get their stats just like socket filters and the rest of
classic and extended bpf.



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-21 19:29           ` Alexei Starovoitov
@ 2019-02-21 19:53             ` Kees Cook
  2019-02-21 20:36               ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Kees Cook @ 2019-02-21 19:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jann Horn, Daniel Borkmann, Andy Lutomirski, Alexei Starovoitov,
	Network Development

On Thu, Feb 21, 2019 at 11:29 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote:
> > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > On 02/21/2019 06:31 AM, Kees Cook wrote:
> > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > >>
> > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> > > >> <alexei.starovoitov@gmail.com> wrote:
> > > >>>
> > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > > >>>> not make use of any functionality that would require this assertion. Fix this
> > > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have
> > > >>>> the cant_sleep(); check.
> > > >>>>
> > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> > > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > > >>>> Acked-by: Kees Cook <keescook@chromium.org>
> > > >>>
> > > >>> Applied, Thanks
> > > >>
> > > >> Actually I think it's a wrong approach to go long term.
> > > >> I'm thinking to revert it.
> > > >> I think it's better to disable preemption for duration of
> > > >> seccomp cbpf prog.
> > > >> It's short and there is really no reason for it to be preemptible.
> > > >> When seccomp switches to ebpf we'll have this weird inconsistency.
> > > >> Let's just disable preemption for seccomp as well.
> > > >
> > > > A lot of changes will be needed for seccomp ebpf -- not the least of
> > > > which is convincing me there is a use-case. ;)
> > > >
> > > > But the main issue is that I'm not a huge fan of dropping two
> > > > barriers() across syscall entry. That seems pretty heavy-duty for
> > > > something that is literally not needed right now.
> > >
> > > Yeah, I think it's okay to add once actually technically needed. Last
> > > time I looked, if I recall correctly, at least Chrome installs some
> > > heavy duty seccomp programs that go close to prog limit.
> >
> > Half of that is probably because that seccomp BPF code is so
> > inefficient, though.
> >
> > This snippet shows that those programs constantly recheck the high
> > halves of arguments:
> >
> > Some of the generated code is pointless because all reachable code
> > from that point on has the same outcome (the last "ret ALLOW" in the
> > following sample is unreachable because they've already checked that
> > the high bit of the low half is set, so the low half can't be 3):
>
> and with ebpf these optimizations will be available for free
> because llvm will remove unnecessary loads and simplify branches.
> There is no technical reason not to use ebpf in seccomp.
>
> When we discussed preemption of classic vs extended in socket filters
> context we agreed to make it a requirement that preemption must be
> disabled though it's not strictly necessary. RX side of socket filters
> was already non-preempt while TX was preemptible.
> We must not make an exception of this rule for seccomp.
> Hence I've reverted this commit.
>
> Here is the actual fix for seccomp:
> From: Alexei Starovoitov <ast@kernel.org>
> Date: Thu, 21 Feb 2019 10:40:14 -0800
> Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog
>
> All BPF programs must be called with preemption disabled.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/seccomp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index e815781ed751..a43c601ac252 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>          * All filters in the list are evaluated and the lowest BPF return
>          * value always takes priority (ignoring the DATA).
>          */
> +       preempt_disable();
>         for (; f; f = f->prev) {
>                 u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
>
> @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
>                         *match = f;
>                 }
>         }
> +       preempt_enable();
>         return ret;
>  }
>  #endif /* CONFIG_SECCOMP_FILTER */
> --
>
> Doing per-cpu increment of cache hot data is practically free and it makes seccomp
> play by the rules.

Other accesses should dominate the run time, yes. I'm still not a big
fan of unconditionally adding this, but I won't NAK. :P

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-21 19:53             ` Kees Cook
@ 2019-02-21 20:36               ` Alexei Starovoitov
  2019-02-21 22:14                 ` Kees Cook
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2019-02-21 20:36 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jann Horn, Daniel Borkmann, Andy Lutomirski, Alexei Starovoitov,
	Network Development

On Thu, Feb 21, 2019 at 11:53:06AM -0800, Kees Cook wrote:
> On Thu, Feb 21, 2019 at 11:29 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote:
> > > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > > On 02/21/2019 06:31 AM, Kees Cook wrote:
> > > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >>
> > > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> > > > >> <alexei.starovoitov@gmail.com> wrote:
> > > > >>>
> > > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > > > >>>> not make use of any functionality that would require this assertion. Fix this
> > > > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have
> > > > >>>> the cant_sleep(); check.
> > > > >>>>
> > > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> > > > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > > > >>>> Acked-by: Kees Cook <keescook@chromium.org>
> > > > >>>
> > > > >>> Applied, Thanks
> > > > >>
> > > > >> Actually I think it's a wrong approach to go long term.
> > > > >> I'm thinking to revert it.
> > > > >> I think it's better to disable preemption for duration of
> > > > >> seccomp cbpf prog.
> > > > >> It's short and there is really no reason for it to be preemptible.
> > > > >> When seccomp switches to ebpf we'll have this weird inconsistency.
> > > > >> Let's just disable preemption for seccomp as well.
> > > > >
> > > > > A lot of changes will be needed for seccomp ebpf -- not the least of
> > > > > which is convincing me there is a use-case. ;)
> > > > >
> > > > > But the main issue is that I'm not a huge fan of dropping two
> > > > > barriers() across syscall entry. That seems pretty heavy-duty for
> > > > > something that is literally not needed right now.
> > > >
> > > > Yeah, I think it's okay to add once actually technically needed. Last
> > > > time I looked, if I recall correctly, at least Chrome installs some
> > > > heavy duty seccomp programs that go close to prog limit.
> > >
> > > Half of that is probably because that seccomp BPF code is so
> > > inefficient, though.
> > >
> > > This snippet shows that those programs constantly recheck the high
> > > halves of arguments:
> > >
> > > Some of the generated code is pointless because all reachable code
> > > from that point on has the same outcome (the last "ret ALLOW" in the
> > > following sample is unreachable because they've already checked that
> > > the high bit of the low half is set, so the low half can't be 3):
> >
> > and with ebpf these optimizations will be available for free
> > because llvm will remove unnecessary loads and simplify branches.
> > There is no technical reason not to use ebpf in seccomp.
> >
> > When we discussed preemption of classic vs extended in socket filters
> > context we agreed to make it a requirement that preemption must be
> > disabled though it's not strictly necessary. RX side of socket filters
> > was already non-preempt while TX was preemptible.
> > We must not make an exception of this rule for seccomp.
> > Hence I've reverted this commit.
> >
> > Here is the actual fix for seccomp:
> > From: Alexei Starovoitov <ast@kernel.org>
> > Date: Thu, 21 Feb 2019 10:40:14 -0800
> > Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog
> >
> > All BPF programs must be called with preemption disabled.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  kernel/seccomp.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index e815781ed751..a43c601ac252 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >          * All filters in the list are evaluated and the lowest BPF return
> >          * value always takes priority (ignoring the DATA).
> >          */
> > +       preempt_disable();
> >         for (; f; f = f->prev) {
> >                 u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
> >
> > @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >                         *match = f;
> >                 }
> >         }
> > +       preempt_enable();
> >         return ret;
> >  }
> >  #endif /* CONFIG_SECCOMP_FILTER */
> > --
> >
> > Doing per-cpu increment of cache hot data is practically free and it makes seccomp
> > play by the rules.
> 
> Other accesses should dominate the run time, yes. I'm still not a big
> fan of unconditionally adding this, but I won't NAK. :P

Thank you.

I also would like to touch on your comment:
"A lot of changes will be needed for seccomp ebpf"
There were two attempts to add it in the past and the patches were
small and straightforward.
If I recall correctly both times you nacked them because performance gains
and ease of use arguments were not convincing enough, right?
Are you still not convinced ?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-21 20:36               ` Alexei Starovoitov
@ 2019-02-21 22:14                 ` Kees Cook
  2019-02-22  0:22                   ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Kees Cook @ 2019-02-21 22:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jann Horn, Daniel Borkmann, Andy Lutomirski, Alexei Starovoitov,
	Network Development

On Thu, Feb 21, 2019 at 12:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> I also would like to touch on your comment:
> "A lot of changes will be needed for seccomp ebpf"
> There were two attempts to add it in the past and the patches were
> small and straightforward.

Yeah, agreed: doing it is technically easy. My concerns have mainly
revolved around avoiding increased complexity and attack surface.
There have been, for example, a lot of verifier bugs that were not
reachable through seccomp's BPF usage, given it enforcing only using a
subset of cBPF. i.e. seccomp filters couldn't be used as Spectre
gadgets, etc.

> If I recall correctly both times you nacked them because performance gains
> and ease of use arguments were not convincing enough, right?

Right. There wasn't, in my opinion enough of a performance benefit vs
just having efficient BPF to start with.

> Are you still not convinced ?

For now, yeah. I'm sure there will be some future time when a use-case
appears where gaining some special eBPF hook/feature will outweigh the
increased attack surface. I haven't seen it yet, but I'm not crazy
enough to think it'll never happen. (In fact, recently I even had
Tycho see if he could implement the recent seccomp user notification
stuff via eBPF.)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs
  2019-02-21 22:14                 ` Kees Cook
@ 2019-02-22  0:22                   ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2019-02-22  0:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: Alexei Starovoitov, Jann Horn, Daniel Borkmann,
	Alexei Starovoitov, Network Development

On Thu, Feb 21, 2019 at 2:14 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Feb 21, 2019 at 12:36 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > I also would like to touch on your comment:
> > "A lot of changes will be needed for seccomp ebpf"
> > There were two attempts to add it in the past and the patches were
> > small and straightforward.
>
> Yeah, agreed: doing it is technically easy. My concerns have mainly
> revolved around avoiding increased complexity and attack surface.
> There have been, for example, a lot of verifier bugs that were not
> reachable through seccomp's BPF usage, given it enforcing only using a
> subset of cBPF. i.e. seccomp filters couldn't be used as Spectre
> gadgets, etc.
>
> > If I recall correctly both times you nacked them because performance gains
> > and ease of use arguments were not convincing enough, right?
>
> Right. There wasn't, in my opinion enough of a performance benefit vs
> just having efficient BPF to start with.
>
> > Are you still not convinced ?
>
> For now, yeah. I'm sure there will be some future time when a use-case
> appears where gaining some special eBPF hook/feature will outweigh the
> increased attack surface. I haven't seen it yet, but I'm not crazy
> enough to think it'll never happen. (In fact, recently I even had
> Tycho see if he could implement the recent seccomp user notification
> stuff via eBPF.)
>

I consider the potential for much improved performance to be a
maybe-good-enough argument.  The down side is that there are programs
that load cBPF seccomp filters from inside a sandbox, and being able
to load eBPF from inside a sandbox is potentially undesirable.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-22  0:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-20 23:01 [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs Daniel Borkmann
2019-02-20 23:59 ` Alexei Starovoitov
2019-02-21  4:02   ` Alexei Starovoitov
2019-02-21  5:31     ` Kees Cook
2019-02-21  8:53       ` Daniel Borkmann
2019-02-21 12:56         ` Jann Horn
2019-02-21 19:29           ` Alexei Starovoitov
2019-02-21 19:53             ` Kees Cook
2019-02-21 20:36               ` Alexei Starovoitov
2019-02-21 22:14                 ` Kees Cook
2019-02-22  0:22                   ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.