From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 740E8C43381 for ; Thu, 21 Feb 2019 19:29:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 291672083B for ; Thu, 21 Feb 2019 19:29:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WCcGAwvp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726468AbfBUT3W (ORCPT ); Thu, 21 Feb 2019 14:29:22 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:38955 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726198AbfBUT3W (ORCPT ); Thu, 21 Feb 2019 14:29:22 -0500 Received: by mail-pf1-f195.google.com with SMTP id i20so3783570pfo.6 for ; Thu, 21 Feb 2019 11:29:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=RJgIJ01Ep862yAF4jfU9XIOVRfzhC5oQ+Em9VTGX08w=; b=WCcGAwvpxvKfrc1Q/wi54SLq/mjPPhGNeO8POIxQIyoUmLgzU9AmqqsTyLo/D3OkIS lKm1BRzqiUUYm7lx2RHPVfCvp0h8DGJSASU8+xr5X9+N85aMoiKSj5RcYWys5jtrlvC3 MRG2pkdOnbtr6DAL/k9EtXpciuK6jogtK1FglygfQM28it1eF6RaQ98kZm1nHZ0Xcg/Z /p5a+ac3y0XPpl/YsEihamZx8fvH34arFG7HeZ6Vg+mD/MgP2QhBwiHRbVPB/rXrJu8U L5RMlhfAXVjQl8gTzQr4Ey4lFAuKjVaANj3aoFBp6OYMjv+YCHmFVpu7G7kQjTtCfm9l xC0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=RJgIJ01Ep862yAF4jfU9XIOVRfzhC5oQ+Em9VTGX08w=; b=OFwjrxbuxcpe9cQFh062+0gbB0FpP24LbKMnuogzoM6JiLNHcMWUAdfOtGFOJGPo1e NN4if+t0SHyePr2zUdevZN6DhqGp0XqCXY/hKDBAsihy+AeTu0ow+chGvVOQLebiuUZa KZFzHD74+TYeDCtVGseFW0gdPGcIS/3WdasfKHOah3/eXIhsUmAsP6sMRNTttg4URqV5 O2h2BTVenyqImfmuOHrUgkjHojNfKF07fM1jtkWK5K3xGDbXqO7hy3HTxlLPt1cH+SqG D8CMGkGWKKotBmXIozKCgQYrSvHYSA4mXvGCPDQAAfKpiTVUPVAL0mcgZBHJDbEg5keL Ymrg== X-Gm-Message-State: AHQUAuaWO9b1ZueVAkAetqrSYdv9+Css+Qy+U6zczvJoxFkcPwemLt7E bPKUVkNTeGmb6wy1vrcuE30= X-Google-Smtp-Source: AHgI3IaFN5sEhvqdW4KzItmRXFbckZrzJks2fWHRnRsptYz7NrqmcV6WyljqzGiVG5l0ZJhCKPj7WQ== X-Received: by 2002:aa7:81c5:: with SMTP id c5mr116669pfn.217.1550777360940; Thu, 21 Feb 2019 11:29:20 -0800 (PST) Received: from ast-mbp.dhcp.thefacebook.com ([2620:10d:c090:200::4:eceb]) by smtp.gmail.com with ESMTPSA id q28sm47443379pgl.35.2019.02.21.11.29.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Feb 2019 11:29:19 -0800 (PST) Date: Thu, 21 Feb 2019 11:29:18 -0800 From: Alexei Starovoitov To: Jann Horn Cc: Daniel Borkmann , Kees Cook , Andy Lutomirski , Alexei Starovoitov , Network Development Subject: Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs Message-ID: <20190221192916.2mcd4fmxbdj2j2u3@ast-mbp.dhcp.thefacebook.com> References: <20190220230135.9748-1-daniel@iogearbox.net> <20190220235952.uzrsjypoqkha7ya6@ast-mbp.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180223 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote: > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann wrote: > > On 02/21/2019 06:31 AM, Kees Cook wrote: > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov > > > wrote: > > >> > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov > > >> wrote: > > >>> > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote: > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does > > >>>> not make use of any functionality that would require this assertion. Fix this > > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have > > >>>> the cant_sleep(); check. > > >>>> > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled") > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com > > >>>> Signed-off-by: Daniel Borkmann > > >>>> Acked-by: Kees Cook > > >>> > > >>> Applied, Thanks > > >> > > >> Actually I think it's a wrong approach to go long term. > > >> I'm thinking to revert it. > > >> I think it's better to disable preemption for duration of > > >> seccomp cbpf prog. > > >> It's short and there is really no reason for it to be preemptible. > > >> When seccomp switches to ebpf we'll have this weird inconsistency. > > >> Let's just disable preemption for seccomp as well. > > > > > > A lot of changes will be needed for seccomp ebpf -- not the least of > > > which is convincing me there is a use-case. ;) > > > > > > But the main issue is that I'm not a huge fan of dropping two > > > barriers() across syscall entry. That seems pretty heavy-duty for > > > something that is literally not needed right now. > > > > Yeah, I think it's okay to add once actually technically needed. Last > > time I looked, if I recall correctly, at least Chrome installs some > > heavy duty seccomp programs that go close to prog limit. > > Half of that is probably because that seccomp BPF code is so > inefficient, though. > > This snippet shows that those programs constantly recheck the high > halves of arguments: > > Some of the generated code is pointless because all reachable code > from that point on has the same outcome (the last "ret ALLOW" in the > following sample is unreachable because they've already checked that > the high bit of the low half is set, so the low half can't be 3): and with ebpf these optimizations will be available for free because llvm will remove unnecessary loads and simplify branches. There is no technical reason not to use ebpf in seccomp. When we discussed preemption of classic vs extended in socket filters context we agreed to make it a requirement that preemption must be disabled though it's not strictly necessary. RX side of socket filters was already non-preempt while TX was preemptible. We must not make an exception of this rule for seccomp. Hence I've reverted this commit. Here is the actual fix for seccomp: From: Alexei Starovoitov Date: Thu, 21 Feb 2019 10:40:14 -0800 Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog All BPF programs must be called with preemption disabled. Signed-off-by: Alexei Starovoitov --- kernel/seccomp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index e815781ed751..a43c601ac252 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, * All filters in the list are evaluated and the lowest BPF return * value always takes priority (ignoring the DATA). */ + preempt_disable(); for (; f; f = f->prev) { u32 cur_ret = BPF_PROG_RUN(f->prog, sd); @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, *match = f; } } + preempt_enable(); return ret; } #endif /* CONFIG_SECCOMP_FILTER */ -- Doing per-cpu increment of cache hot data is practically free and it makes seccomp play by the rules. I'm working on another set of patches that will introduce bpf program stats as: #define BPF_RPOG_RUN(prog, ctx) ({ \ + u32 ret; \ + cant_sleep(); \ + if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ + u64 start = sched_clock(); \ + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ + this_cpu_inc(prog->aux->stats->cnt); \ + this_cpu_add(prog->aux->stats->nsecs, \ + sched_clock() - start); \ + } else { \ + ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \ + } \ + ret; }) and seccomp progs will get their stats just like socket filters and the rest of classic and extended bpf.