From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=K4f4=Q4=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 740E8C43381
	for <netdev@archiver.kernel.org>; Thu, 21 Feb 2019 19:29:24 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 291672083B
	for <netdev@archiver.kernel.org>; Thu, 21 Feb 2019 19:29:24 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WCcGAwvp"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726468AbfBUT3W (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Thu, 21 Feb 2019 14:29:22 -0500
Received: from mail-pf1-f195.google.com ([209.85.210.195]:38955 "EHLO
        mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726198AbfBUT3W (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 21 Feb 2019 14:29:22 -0500
Received: by mail-pf1-f195.google.com with SMTP id i20so3783570pfo.6
        for <netdev@vger.kernel.org>; Thu, 21 Feb 2019 11:29:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to:user-agent;
        bh=RJgIJ01Ep862yAF4jfU9XIOVRfzhC5oQ+Em9VTGX08w=;
        b=WCcGAwvpxvKfrc1Q/wi54SLq/mjPPhGNeO8POIxQIyoUmLgzU9AmqqsTyLo/D3OkIS
         lKm1BRzqiUUYm7lx2RHPVfCvp0h8DGJSASU8+xr5X9+N85aMoiKSj5RcYWys5jtrlvC3
         MRG2pkdOnbtr6DAL/k9EtXpciuK6jogtK1FglygfQM28it1eF6RaQ98kZm1nHZ0Xcg/Z
         /p5a+ac3y0XPpl/YsEihamZx8fvH34arFG7HeZ6Vg+mD/MgP2QhBwiHRbVPB/rXrJu8U
         L5RMlhfAXVjQl8gTzQr4Ey4lFAuKjVaANj3aoFBp6OYMjv+YCHmFVpu7G7kQjTtCfm9l
         xC0g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to:user-agent;
        bh=RJgIJ01Ep862yAF4jfU9XIOVRfzhC5oQ+Em9VTGX08w=;
        b=OFwjrxbuxcpe9cQFh062+0gbB0FpP24LbKMnuogzoM6JiLNHcMWUAdfOtGFOJGPo1e
         NN4if+t0SHyePr2zUdevZN6DhqGp0XqCXY/hKDBAsihy+AeTu0ow+chGvVOQLebiuUZa
         KZFzHD74+TYeDCtVGseFW0gdPGcIS/3WdasfKHOah3/eXIhsUmAsP6sMRNTttg4URqV5
         O2h2BTVenyqImfmuOHrUgkjHojNfKF07fM1jtkWK5K3xGDbXqO7hy3HTxlLPt1cH+SqG
         D8CMGkGWKKotBmXIozKCgQYrSvHYSA4mXvGCPDQAAfKpiTVUPVAL0mcgZBHJDbEg5keL
         Ymrg==
X-Gm-Message-State: AHQUAuaWO9b1ZueVAkAetqrSYdv9+Css+Qy+U6zczvJoxFkcPwemLt7E
        bPKUVkNTeGmb6wy1vrcuE30=
X-Google-Smtp-Source: AHgI3IaFN5sEhvqdW4KzItmRXFbckZrzJks2fWHRnRsptYz7NrqmcV6WyljqzGiVG5l0ZJhCKPj7WQ==
X-Received: by 2002:aa7:81c5:: with SMTP id c5mr116669pfn.217.1550777360940;
        Thu, 21 Feb 2019 11:29:20 -0800 (PST)
Received: from ast-mbp.dhcp.thefacebook.com ([2620:10d:c090:200::4:eceb])
        by smtp.gmail.com with ESMTPSA id q28sm47443379pgl.35.2019.02.21.11.29.19
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 21 Feb 2019 11:29:19 -0800 (PST)
Date:   Thu, 21 Feb 2019 11:29:18 -0800
From:   Alexei Starovoitov <alexei.starovoitov@gmail.com>
To:     Jann Horn <jannh@google.com>
Cc:     Daniel Borkmann <daniel@iogearbox.net>,
        Kees Cook <keescook@chromium.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Alexei Starovoitov <ast@kernel.org>,
        Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption
 splat for cbpf->ebpf progs
Message-ID: <20190221192916.2mcd4fmxbdj2j2u3@ast-mbp.dhcp.thefacebook.com>
References: <20190220230135.9748-1-daniel@iogearbox.net>
 <20190220235952.uzrsjypoqkha7ya6@ast-mbp.dhcp.thefacebook.com>
 <CAADnVQJYjXKe7NKwjiCDt-tsgejZ1S0ApA4aJUw6se5XsWY5KQ@mail.gmail.com>
 <CAGXu5jLHV78-d+_yTKVsgawDAVn7FXZVCcvhtDesXcxhZf_NKg@mail.gmail.com>
 <f5231b6f-cc97-f3ea-3262-edea973f21e2@iogearbox.net>
 <CAG48ez1Edngv8LLb_svz8Kq3vQKf7-a_Q1whyHfXiTDP8FO+rw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAG48ez1Edngv8LLb_svz8Kq3vQKf7-a_Q1whyHfXiTDP8FO+rw@mail.gmail.com>
User-Agent: NeoMutt/20180223
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote:
> On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > On 02/21/2019 06:31 AM, Kees Cook wrote:
> > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > >>
> > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> > >> <alexei.starovoitov@gmail.com> wrote:
> > >>>
> > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > >>>> not make use of any functionality that would require this assertion. Fix this
> > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have
> > >>>> the cant_sleep(); check.
> > >>>>
> > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > >>>> Acked-by: Kees Cook <keescook@chromium.org>
> > >>>
> > >>> Applied, Thanks
> > >>
> > >> Actually I think it's a wrong approach to go long term.
> > >> I'm thinking to revert it.
> > >> I think it's better to disable preemption for duration of
> > >> seccomp cbpf prog.
> > >> It's short and there is really no reason for it to be preemptible.
> > >> When seccomp switches to ebpf we'll have this weird inconsistency.
> > >> Let's just disable preemption for seccomp as well.
> > >
> > > A lot of changes will be needed for seccomp ebpf -- not the least of
> > > which is convincing me there is a use-case. ;)
> > >
> > > But the main issue is that I'm not a huge fan of dropping two
> > > barriers() across syscall entry. That seems pretty heavy-duty for
> > > something that is literally not needed right now.
> >
> > Yeah, I think it's okay to add once actually technically needed. Last
> > time I looked, if I recall correctly, at least Chrome installs some
> > heavy duty seccomp programs that go close to prog limit.
> 
> Half of that is probably because that seccomp BPF code is so
> inefficient, though.
> 
> This snippet shows that those programs constantly recheck the high
> halves of arguments:
> 
> Some of the generated code is pointless because all reachable code
> from that point on has the same outcome (the last "ret ALLOW" in the
> following sample is unreachable because they've already checked that
> the high bit of the low half is set, so the low half can't be 3):

and with ebpf these optimizations will be available for free
because llvm will remove unnecessary loads and simplify branches.
There is no technical reason not to use ebpf in seccomp.

When we discussed preemption of classic vs extended in socket filters
context we agreed to make it a requirement that preemption must be
disabled though it's not strictly necessary. RX side of socket filters
was already non-preempt while TX was preemptible.
We must not make an exception of this rule for seccomp.
Hence I've reverted this commit.

Here is the actual fix for seccomp:
From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 21 Feb 2019 10:40:14 -0800
Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog

All BPF programs must be called with preemption disabled.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/seccomp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e815781ed751..a43c601ac252 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
         * All filters in the list are evaluated and the lowest BPF return
         * value always takes priority (ignoring the DATA).
         */
+       preempt_disable();
        for (; f; f = f->prev) {
                u32 cur_ret = BPF_PROG_RUN(f->prog, sd);

@@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
                        *match = f;
                }
        }
+       preempt_enable();
        return ret;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
--

Doing per-cpu increment of cache hot data is practically free and it makes seccomp
play by the rules.

I'm working on another set of patches that will introduce bpf program stats as:
#define BPF_RPOG_RUN(prog, ctx) ({ \
+	u32 ret;						\
+       cant_sleep();                                           \
+	if (static_branch_unlikely(&bpf_stats_enabled_key)) {	\
+		u64 start = sched_clock();			\
+		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
+		this_cpu_inc(prog->aux->stats->cnt);		\
+		this_cpu_add(prog->aux->stats->nsecs,		\
+			     sched_clock() - start);		\
+	} else {						\
+		ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi);	\
+	}							\
+	ret; })

and seccomp progs will get their stats just like socket filters and the rest of
classic and extended bpf.