* [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs @ 2019-02-20 23:01 Daniel Borkmann 2019-02-20 23:59 ` Alexei Starovoitov 0 siblings, 1 reply; 11+ messages in thread From: Daniel Borkmann @ 2019-02-20 23:01 UTC (permalink / raw) To: ast; +Cc: keescook, netdev, Daniel Borkmann In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") a check was added for BPF_PROG_RUN() that for every invocation preemption is disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does not count for seccomp because only cBPF -> eBPF is loaded here and it does not make use of any functionality that would require this assertion. Fix this false positive by adding and using SECCOMP_RUN() variant that does not have the cant_sleep(); check. Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kees Cook <keescook@chromium.org> --- v1 -> v2: - More elaborate comment and added SECCOMP_RUN - Added Kees' ACK from earlier v1 patch include/linux/filter.h | 22 +++++++++++++++++++++- kernel/seccomp.c | 2 +- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index f32b3ec..cd7f957 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -533,7 +533,27 @@ struct sk_filter { struct bpf_prog *prog; }; -#define BPF_PROG_RUN(filter, ctx) ({ cant_sleep(); (*(filter)->bpf_func)(ctx, (filter)->insnsi); }) +#define __bpf_prog_run(prog, ctx) \ + (*(prog)->bpf_func)(ctx, (prog)->insnsi) +#define __bpf_prog_run__may_preempt(prog, ctx) \ + ({ __bpf_prog_run(prog, ctx); }) +#define __bpf_prog_run__non_preempt(prog, ctx) \ + ({ cant_sleep(); __bpf_prog_run(prog, ctx); }) + +/* Preemption must be disabled when native eBPF programs are run in + * order to not break per CPU data structures, for example; make + * sure to throw a stack trace under CONFIG_DEBUG_ATOMIC_SLEEP when + * we find that preemption is still enabled. + * + * Only exception today is seccomp, where progs have transitioned + * from cBPF to eBPF, and native eBPF is _not_ supported. They can + * safely run with preemption enabled. + */ +#define BPF_PROG_RUN(prog, ctx) \ + __bpf_prog_run__non_preempt(prog, ctx) + +#define SECCOMP_RUN(prog, ctx) \ + __bpf_prog_run__may_preempt(prog, ctx) #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN diff --git a/kernel/seccomp.c b/kernel/seccomp.c index e815781..701a3cf 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -268,7 +268,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, * value always takes priority (ignoring the DATA). */ for (; f; f = f->prev) { - u32 cur_ret = BPF_PROG_RUN(f->prog, sd); + u32 cur_ret = SECCOMP_RUN(f->prog, sd); if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) { ret = cur_ret; -- 2.9.5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-20 23:01 [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs Daniel Borkmann @ 2019-02-20 23:59 ` Alexei Starovoitov 2019-02-21 4:02 ` Alexei Starovoitov 0 siblings, 1 reply; 11+ messages in thread From: Alexei Starovoitov @ 2019-02-20 23:59 UTC (permalink / raw) To: Daniel Borkmann; +Cc: ast, keescook, netdev On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > a check was added for BPF_PROG_RUN() that for every invocation preemption is > disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > not count for seccomp because only cBPF -> eBPF is loaded here and it does > not make use of any functionality that would require this assertion. Fix this > false positive by adding and using SECCOMP_RUN() variant that does not have > the cant_sleep(); check. > > Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Acked-by: Kees Cook <keescook@chromium.org> Applied, Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-20 23:59 ` Alexei Starovoitov @ 2019-02-21 4:02 ` Alexei Starovoitov 2019-02-21 5:31 ` Kees Cook 0 siblings, 1 reply; 11+ messages in thread From: Alexei Starovoitov @ 2019-02-21 4:02 UTC (permalink / raw) To: Daniel Borkmann, Jann Horn, Andy Lutomirski Cc: Alexei Starovoitov, Kees Cook, Network Development On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > > In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > a check was added for BPF_PROG_RUN() that for every invocation preemption is > > disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > > not count for seccomp because only cBPF -> eBPF is loaded here and it does > > not make use of any functionality that would require this assertion. Fix this > > false positive by adding and using SECCOMP_RUN() variant that does not have > > the cant_sleep(); check. > > > > Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > > Acked-by: Kees Cook <keescook@chromium.org> > > Applied, Thanks Actually I think it's a wrong approach to go long term. I'm thinking to revert it. I think it's better to disable preemption for duration of seccomp cbpf prog. It's short and there is really no reason for it to be preemptible. When seccomp switches to ebpf we'll have this weird inconsistency. Let's just disable preemption for seccomp as well. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-21 4:02 ` Alexei Starovoitov @ 2019-02-21 5:31 ` Kees Cook 2019-02-21 8:53 ` Daniel Borkmann 0 siblings, 1 reply; 11+ messages in thread From: Kees Cook @ 2019-02-21 5:31 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Borkmann, Jann Horn, Andy Lutomirski, Alexei Starovoitov, Network Development On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > > > In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > > a check was added for BPF_PROG_RUN() that for every invocation preemption is > > > disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > > > not count for seccomp because only cBPF -> eBPF is loaded here and it does > > > not make use of any functionality that would require this assertion. Fix this > > > false positive by adding and using SECCOMP_RUN() variant that does not have > > > the cant_sleep(); check. > > > > > > Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > > Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com > > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > > > Acked-by: Kees Cook <keescook@chromium.org> > > > > Applied, Thanks > > Actually I think it's a wrong approach to go long term. > I'm thinking to revert it. > I think it's better to disable preemption for duration of > seccomp cbpf prog. > It's short and there is really no reason for it to be preemptible. > When seccomp switches to ebpf we'll have this weird inconsistency. > Let's just disable preemption for seccomp as well. A lot of changes will be needed for seccomp ebpf -- not the least of which is convincing me there is a use-case. ;) But the main issue is that I'm not a huge fan of dropping two barriers() across syscall entry. That seems pretty heavy-duty for something that is literally not needed right now. -- Kees Cook ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-21 5:31 ` Kees Cook @ 2019-02-21 8:53 ` Daniel Borkmann 2019-02-21 12:56 ` Jann Horn 0 siblings, 1 reply; 11+ messages in thread From: Daniel Borkmann @ 2019-02-21 8:53 UTC (permalink / raw) To: Kees Cook, Alexei Starovoitov Cc: Jann Horn, Andy Lutomirski, Alexei Starovoitov, Network Development On 02/21/2019 06:31 AM, Kees Cook wrote: > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov >> <alexei.starovoitov@gmail.com> wrote: >>> >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does >>>> not make use of any functionality that would require this assertion. Fix this >>>> false positive by adding and using SECCOMP_RUN() variant that does not have >>>> the cant_sleep(); check. >>>> >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> >>>> Acked-by: Kees Cook <keescook@chromium.org> >>> >>> Applied, Thanks >> >> Actually I think it's a wrong approach to go long term. >> I'm thinking to revert it. >> I think it's better to disable preemption for duration of >> seccomp cbpf prog. >> It's short and there is really no reason for it to be preemptible. >> When seccomp switches to ebpf we'll have this weird inconsistency. >> Let's just disable preemption for seccomp as well. > > A lot of changes will be needed for seccomp ebpf -- not the least of > which is convincing me there is a use-case. ;) > > But the main issue is that I'm not a huge fan of dropping two > barriers() across syscall entry. That seems pretty heavy-duty for > something that is literally not needed right now. Yeah, I think it's okay to add once actually technically needed. Last time I looked, if I recall correctly, at least Chrome installs some heavy duty seccomp programs that go close to prog limit. Thanks, Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-21 8:53 ` Daniel Borkmann @ 2019-02-21 12:56 ` Jann Horn 2019-02-21 19:29 ` Alexei Starovoitov 0 siblings, 1 reply; 11+ messages in thread From: Jann Horn @ 2019-02-21 12:56 UTC (permalink / raw) To: Daniel Borkmann Cc: Kees Cook, Alexei Starovoitov, Andy Lutomirski, Alexei Starovoitov, Network Development On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > On 02/21/2019 06:31 AM, Kees Cook wrote: > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > >> > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov > >> <alexei.starovoitov@gmail.com> wrote: > >>> > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does > >>>> not make use of any functionality that would require this assertion. Fix this > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have > >>>> the cant_sleep(); check. > >>>> > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > >>>> Acked-by: Kees Cook <keescook@chromium.org> > >>> > >>> Applied, Thanks > >> > >> Actually I think it's a wrong approach to go long term. > >> I'm thinking to revert it. > >> I think it's better to disable preemption for duration of > >> seccomp cbpf prog. > >> It's short and there is really no reason for it to be preemptible. > >> When seccomp switches to ebpf we'll have this weird inconsistency. > >> Let's just disable preemption for seccomp as well. > > > > A lot of changes will be needed for seccomp ebpf -- not the least of > > which is convincing me there is a use-case. ;) > > > > But the main issue is that I'm not a huge fan of dropping two > > barriers() across syscall entry. That seems pretty heavy-duty for > > something that is literally not needed right now. > > Yeah, I think it's okay to add once actually technically needed. Last > time I looked, if I recall correctly, at least Chrome installs some > heavy duty seccomp programs that go close to prog limit. Half of that is probably because that seccomp BPF code is so inefficient, though. This snippet shows that those programs constantly recheck the high halves of arguments: 014e if args[1].high == 0x00000000: [true +3, false +0] 0153 if args[1].low == 0x00000003: [true +135, false +0] -> ret ALLOW (syscalls: fcntl) 0155 if args[1].high == 0x00000000: [true +3, false +0] 015a if args[1].low == 0x00000001: [true +128, false +0] -> ret ALLOW (syscalls: fcntl) 015c if args[1].high == 0x00000000: [true +3, false +0] 0161 if args[1].low == 0x00000002: [true +121, false +0] -> ret ALLOW (syscalls: fcntl) 0163 if args[1].high == 0x00000000: [true +3, false +0] 0168 if args[1].low == 0x00000006: [true +114, false +0] -> ret ALLOW (syscalls: fcntl) 016a if args[1].high == 0x00000000: [true +3, false +0] 016f if args[1].low == 0x00000007: [true +107, false +0] -> ret ALLOW (syscalls: fcntl) 0171 if args[1].high == 0x00000000: [true +3, false +0] 0176 if args[1].low == 0x00000005: [true +100, false +0] -> ret ALLOW (syscalls: fcntl) 0178 if args[1].high == 0x00000000: [true +3, false +0] 017d if args[1].low == 0x00000000: [true +93, false +0] -> ret ALLOW (syscalls: fcntl) 017f if args[1].high == 0x00000000: [true +3, false +0] 0184 if args[1].low == 0x00000406: [true +86, false +0] -> ret ALLOW (syscalls: fcntl) 0186 if args[1].high == 0x00000000: [true +3, false +0] 018b if args[1].low != 0x00000004: [true +80, false +0] -> ret TRAP 018d if args[2].high != 0x00000000: [true +78, false +0] -> ret TRAP 018f if args[2].low COMMON-BITS 0xffe363fc: [true +76, false +75] -> ret TRAP 01db ret ALLOW (syscalls: fcntl) Some of the generated code is pointless because all reachable code from that point on has the same outcome (the last "ret ALLOW" in the following sample is unreachable because they've already checked that the high bit of the low half is set, so the low half can't be 3): 00c9 if args[0].low == 0x00000005: [true +16, false +0] -> ret ALLOW (syscalls: clock_gettime, clock_getres) 00cb if args[0].high == 0x00000000: [true +3, false +0] 00d0 if args[0].low == 0x00000003: [true +9, false +0] -> ret ALLOW (syscalls: clock_gettime, clock_getres) 01dc ret TRAP 00cc if args[0].high != 0xffffffff: [true +8, false +0] -> ret TRAP 00ce if args[0].low COMMON-BITS 0x80000000: [true +0, false +6] 00d0 if args[0].low == 0x00000003: [true +9, false +0] -> ret ALLOW (syscalls: clock_gettime, clock_getres) 01dc ret TRAP 01d7 ret TRAP ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-21 12:56 ` Jann Horn @ 2019-02-21 19:29 ` Alexei Starovoitov 2019-02-21 19:53 ` Kees Cook 0 siblings, 1 reply; 11+ messages in thread From: Alexei Starovoitov @ 2019-02-21 19:29 UTC (permalink / raw) To: Jann Horn Cc: Daniel Borkmann, Kees Cook, Andy Lutomirski, Alexei Starovoitov, Network Development On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote: > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 02/21/2019 06:31 AM, Kees Cook wrote: > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > >> > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov > > >> <alexei.starovoitov@gmail.com> wrote: > > >>> > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does > > >>>> not make use of any functionality that would require this assertion. Fix this > > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have > > >>>> the cant_sleep(); check. > > >>>> > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com > > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > > >>>> Acked-by: Kees Cook <keescook@chromium.org> > > >>> > > >>> Applied, Thanks > > >> > > >> Actually I think it's a wrong approach to go long term. > > >> I'm thinking to revert it. > > >> I think it's better to disable preemption for duration of > > >> seccomp cbpf prog. > > >> It's short and there is really no reason for it to be preemptible. > > >> When seccomp switches to ebpf we'll have this weird inconsistency. > > >> Let's just disable preemption for seccomp as well. > > > > > > A lot of changes will be needed for seccomp ebpf -- not the least of > > > which is convincing me there is a use-case. ;) > > > > > > But the main issue is that I'm not a huge fan of dropping two > > > barriers() across syscall entry. That seems pretty heavy-duty for > > > something that is literally not needed right now. > > > > Yeah, I think it's okay to add once actually technically needed. Last > > time I looked, if I recall correctly, at least Chrome installs some > > heavy duty seccomp programs that go close to prog limit. > > Half of that is probably because that seccomp BPF code is so > inefficient, though. > > This snippet shows that those programs constantly recheck the high > halves of arguments: > > Some of the generated code is pointless because all reachable code > from that point on has the same outcome (the last "ret ALLOW" in the > following sample is unreachable because they've already checked that > the high bit of the low half is set, so the low half can't be 3): and with ebpf these optimizations will be available for free because llvm will remove unnecessary loads and simplify branches. There is no technical reason not to use ebpf in seccomp. When we discussed preemption of classic vs extended in socket filters context we agreed to make it a requirement that preemption must be disabled though it's not strictly necessary. RX side of socket filters was already non-preempt while TX was preemptible. We must not make an exception of this rule for seccomp. Hence I've reverted this commit. Here is the actual fix for seccomp: From: Alexei Starovoitov <ast@kernel.org> Date: Thu, 21 Feb 2019 10:40:14 -0800 Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog All BPF programs must be called with preemption disabled. Signed-off-by: Alexei Starovoitov <ast@kernel.org> --- kernel/seccomp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index e815781ed751..a43c601ac252 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, * All filters in the list are evaluated and the lowest BPF return * value always takes priority (ignoring the DATA). */ + preempt_disable(); for (; f; f = f->prev) { u32 cur_ret = BPF_PROG_RUN(f->prog, sd); @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, *match = f; } } + preempt_enable(); return ret; } #endif /* CONFIG_SECCOMP_FILTER */ -- Doing per-cpu increment of cache hot data is practically free and it makes seccomp play by the rules. I'm working on another set of patches that will introduce bpf program stats as: #define BPF_RPOG_RUN(prog, ctx) ({ \ + u32 ret; \ + cant_sleep(); \ + if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ + u64 start = sched_clock(); \ + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ + this_cpu_inc(prog->aux->stats->cnt); \ + this_cpu_add(prog->aux->stats->nsecs, \ + sched_clock() - start); \ + } else { \ + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ + } \ + ret; }) and seccomp progs will get their stats just like socket filters and the rest of classic and extended bpf. ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-21 19:29 ` Alexei Starovoitov @ 2019-02-21 19:53 ` Kees Cook 2019-02-21 20:36 ` Alexei Starovoitov 0 siblings, 1 reply; 11+ messages in thread From: Kees Cook @ 2019-02-21 19:53 UTC (permalink / raw) To: Alexei Starovoitov Cc: Jann Horn, Daniel Borkmann, Andy Lutomirski, Alexei Starovoitov, Network Development On Thu, Feb 21, 2019 at 11:29 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote: > > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > On 02/21/2019 06:31 AM, Kees Cook wrote: > > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > >> > > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov > > > >> <alexei.starovoitov@gmail.com> wrote: > > > >>> > > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is > > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does > > > >>>> not make use of any functionality that would require this assertion. Fix this > > > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have > > > >>>> the cant_sleep(); check. > > > >>>> > > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com > > > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > > > >>>> Acked-by: Kees Cook <keescook@chromium.org> > > > >>> > > > >>> Applied, Thanks > > > >> > > > >> Actually I think it's a wrong approach to go long term. > > > >> I'm thinking to revert it. > > > >> I think it's better to disable preemption for duration of > > > >> seccomp cbpf prog. > > > >> It's short and there is really no reason for it to be preemptible. > > > >> When seccomp switches to ebpf we'll have this weird inconsistency. > > > >> Let's just disable preemption for seccomp as well. > > > > > > > > A lot of changes will be needed for seccomp ebpf -- not the least of > > > > which is convincing me there is a use-case. ;) > > > > > > > > But the main issue is that I'm not a huge fan of dropping two > > > > barriers() across syscall entry. That seems pretty heavy-duty for > > > > something that is literally not needed right now. > > > > > > Yeah, I think it's okay to add once actually technically needed. Last > > > time I looked, if I recall correctly, at least Chrome installs some > > > heavy duty seccomp programs that go close to prog limit. > > > > Half of that is probably because that seccomp BPF code is so > > inefficient, though. > > > > This snippet shows that those programs constantly recheck the high > > halves of arguments: > > > > Some of the generated code is pointless because all reachable code > > from that point on has the same outcome (the last "ret ALLOW" in the > > following sample is unreachable because they've already checked that > > the high bit of the low half is set, so the low half can't be 3): > > and with ebpf these optimizations will be available for free > because llvm will remove unnecessary loads and simplify branches. > There is no technical reason not to use ebpf in seccomp. > > When we discussed preemption of classic vs extended in socket filters > context we agreed to make it a requirement that preemption must be > disabled though it's not strictly necessary. RX side of socket filters > was already non-preempt while TX was preemptible. > We must not make an exception of this rule for seccomp. > Hence I've reverted this commit. > > Here is the actual fix for seccomp: > From: Alexei Starovoitov <ast@kernel.org> > Date: Thu, 21 Feb 2019 10:40:14 -0800 > Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog > > All BPF programs must be called with preemption disabled. > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- > kernel/seccomp.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index e815781ed751..a43c601ac252 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, > * All filters in the list are evaluated and the lowest BPF return > * value always takes priority (ignoring the DATA). > */ > + preempt_disable(); > for (; f; f = f->prev) { > u32 cur_ret = BPF_PROG_RUN(f->prog, sd); > > @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, > *match = f; > } > } > + preempt_enable(); > return ret; > } > #endif /* CONFIG_SECCOMP_FILTER */ > -- > > Doing per-cpu increment of cache hot data is practically free and it makes seccomp > play by the rules. Other accesses should dominate the run time, yes. I'm still not a big fan of unconditionally adding this, but I won't NAK. :P -- Kees Cook ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-21 19:53 ` Kees Cook @ 2019-02-21 20:36 ` Alexei Starovoitov 2019-02-21 22:14 ` Kees Cook 0 siblings, 1 reply; 11+ messages in thread From: Alexei Starovoitov @ 2019-02-21 20:36 UTC (permalink / raw) To: Kees Cook Cc: Jann Horn, Daniel Borkmann, Andy Lutomirski, Alexei Starovoitov, Network Development On Thu, Feb 21, 2019 at 11:53:06AM -0800, Kees Cook wrote: > On Thu, Feb 21, 2019 at 11:29 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote: > > > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > On 02/21/2019 06:31 AM, Kees Cook wrote: > > > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov > > > > > <alexei.starovoitov@gmail.com> wrote: > > > > >> > > > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov > > > > >> <alexei.starovoitov@gmail.com> wrote: > > > > >>> > > > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > > > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > > > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is > > > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > > > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does > > > > >>>> not make use of any functionality that would require this assertion. Fix this > > > > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have > > > > >>>> the cant_sleep(); check. > > > > >>>> > > > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com > > > > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > > > > >>>> Acked-by: Kees Cook <keescook@chromium.org> > > > > >>> > > > > >>> Applied, Thanks > > > > >> > > > > >> Actually I think it's a wrong approach to go long term. > > > > >> I'm thinking to revert it. > > > > >> I think it's better to disable preemption for duration of > > > > >> seccomp cbpf prog. > > > > >> It's short and there is really no reason for it to be preemptible. > > > > >> When seccomp switches to ebpf we'll have this weird inconsistency. > > > > >> Let's just disable preemption for seccomp as well. > > > > > > > > > > A lot of changes will be needed for seccomp ebpf -- not the least of > > > > > which is convincing me there is a use-case. ;) > > > > > > > > > > But the main issue is that I'm not a huge fan of dropping two > > > > > barriers() across syscall entry. That seems pretty heavy-duty for > > > > > something that is literally not needed right now. > > > > > > > > Yeah, I think it's okay to add once actually technically needed. Last > > > > time I looked, if I recall correctly, at least Chrome installs some > > > > heavy duty seccomp programs that go close to prog limit. > > > > > > Half of that is probably because that seccomp BPF code is so > > > inefficient, though. > > > > > > This snippet shows that those programs constantly recheck the high > > > halves of arguments: > > > > > > Some of the generated code is pointless because all reachable code > > > from that point on has the same outcome (the last "ret ALLOW" in the > > > following sample is unreachable because they've already checked that > > > the high bit of the low half is set, so the low half can't be 3): > > > > and with ebpf these optimizations will be available for free > > because llvm will remove unnecessary loads and simplify branches. > > There is no technical reason not to use ebpf in seccomp. > > > > When we discussed preemption of classic vs extended in socket filters > > context we agreed to make it a requirement that preemption must be > > disabled though it's not strictly necessary. RX side of socket filters > > was already non-preempt while TX was preemptible. > > We must not make an exception of this rule for seccomp. > > Hence I've reverted this commit. > > > > Here is the actual fix for seccomp: > > From: Alexei Starovoitov <ast@kernel.org> > > Date: Thu, 21 Feb 2019 10:40:14 -0800 > > Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog > > > > All BPF programs must be called with preemption disabled. > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > --- > > kernel/seccomp.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > > index e815781ed751..a43c601ac252 100644 > > --- a/kernel/seccomp.c > > +++ b/kernel/seccomp.c > > @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, > > * All filters in the list are evaluated and the lowest BPF return > > * value always takes priority (ignoring the DATA). > > */ > > + preempt_disable(); > > for (; f; f = f->prev) { > > u32 cur_ret = BPF_PROG_RUN(f->prog, sd); > > > > @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, > > *match = f; > > } > > } > > + preempt_enable(); > > return ret; > > } > > #endif /* CONFIG_SECCOMP_FILTER */ > > -- > > > > Doing per-cpu increment of cache hot data is practically free and it makes seccomp > > play by the rules. > > Other accesses should dominate the run time, yes. I'm still not a big > fan of unconditionally adding this, but I won't NAK. :P Thank you. I also would like to touch on your comment: "A lot of changes will be needed for seccomp ebpf" There were two attempts to add it in the past and the patches were small and straightforward. If I recall correctly both times you nacked them because performance gains and ease of use arguments were not convincing enough, right? Are you still not convinced ? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-21 20:36 ` Alexei Starovoitov @ 2019-02-21 22:14 ` Kees Cook 2019-02-22 0:22 ` Andy Lutomirski 0 siblings, 1 reply; 11+ messages in thread From: Kees Cook @ 2019-02-21 22:14 UTC (permalink / raw) To: Alexei Starovoitov Cc: Jann Horn, Daniel Borkmann, Andy Lutomirski, Alexei Starovoitov, Network Development On Thu, Feb 21, 2019 at 12:36 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > I also would like to touch on your comment: > "A lot of changes will be needed for seccomp ebpf" > There were two attempts to add it in the past and the patches were > small and straightforward. Yeah, agreed: doing it is technically easy. My concerns have mainly revolved around avoiding increased complexity and attack surface. There have been, for example, a lot of verifier bugs that were not reachable through seccomp's BPF usage, given it enforcing only using a subset of cBPF. i.e. seccomp filters couldn't be used as Spectre gadgets, etc. > If I recall correctly both times you nacked them because performance gains > and ease of use arguments were not convincing enough, right? Right. There wasn't, in my opinion enough of a performance benefit vs just having efficient BPF to start with. > Are you still not convinced ? For now, yeah. I'm sure there will be some future time when a use-case appears where gaining some special eBPF hook/feature will outweigh the increased attack surface. I haven't seen it yet, but I'm not crazy enough to think it'll never happen. (In fact, recently I even had Tycho see if he could implement the recent seccomp user notification stuff via eBPF.) -- Kees Cook ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs 2019-02-21 22:14 ` Kees Cook @ 2019-02-22 0:22 ` Andy Lutomirski 0 siblings, 0 replies; 11+ messages in thread From: Andy Lutomirski @ 2019-02-22 0:22 UTC (permalink / raw) To: Kees Cook Cc: Alexei Starovoitov, Jann Horn, Daniel Borkmann, Alexei Starovoitov, Network Development On Thu, Feb 21, 2019 at 2:14 PM Kees Cook <keescook@chromium.org> wrote: > > On Thu, Feb 21, 2019 at 12:36 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > I also would like to touch on your comment: > > "A lot of changes will be needed for seccomp ebpf" > > There were two attempts to add it in the past and the patches were > > small and straightforward. > > Yeah, agreed: doing it is technically easy. My concerns have mainly > revolved around avoiding increased complexity and attack surface. > There have been, for example, a lot of verifier bugs that were not > reachable through seccomp's BPF usage, given it enforcing only using a > subset of cBPF. i.e. seccomp filters couldn't be used as Spectre > gadgets, etc. > > > If I recall correctly both times you nacked them because performance gains > > and ease of use arguments were not convincing enough, right? > > Right. There wasn't, in my opinion enough of a performance benefit vs > just having efficient BPF to start with. > > > Are you still not convinced ? > > For now, yeah. I'm sure there will be some future time when a use-case > appears where gaining some special eBPF hook/feature will outweigh the > increased attack surface. I haven't seen it yet, but I'm not crazy > enough to think it'll never happen. (In fact, recently I even had > Tycho see if he could implement the recent seccomp user notification > stuff via eBPF.) > I consider the potential for much improved performance to be a maybe-good-enough argument. The down side is that there are programs that load cBPF seccomp filters from inside a sandbox, and being able to load eBPF from inside a sandbox is potentially undesirable. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-02-22 0:22 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-02-20 23:01 [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs Daniel Borkmann 2019-02-20 23:59 ` Alexei Starovoitov 2019-02-21 4:02 ` Alexei Starovoitov 2019-02-21 5:31 ` Kees Cook 2019-02-21 8:53 ` Daniel Borkmann 2019-02-21 12:56 ` Jann Horn 2019-02-21 19:29 ` Alexei Starovoitov 2019-02-21 19:53 ` Kees Cook 2019-02-21 20:36 ` Alexei Starovoitov 2019-02-21 22:14 ` Kees Cook 2019-02-22 0:22 ` Andy Lutomirski
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.