From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=K4f4=Q4=vger.kernel.org=netdev-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.9 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,
	SPF_PASS,URIBL_BLOCKED,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 315E9C43381
	for <netdev@archiver.kernel.org>; Thu, 21 Feb 2019 20:36:21 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id EBDDB20823
	for <netdev@archiver.kernel.org>; Thu, 21 Feb 2019 20:36:20 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="byul+2dz"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726428AbfBUUgT (ORCPT <rfc822;netdev@archiver.kernel.org>);
        Thu, 21 Feb 2019 15:36:19 -0500
Received: from mail-pg1-f195.google.com ([209.85.215.195]:43007 "EHLO
        mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726075AbfBUUgT (ORCPT
        <rfc822;netdev@vger.kernel.org>); Thu, 21 Feb 2019 15:36:19 -0500
Received: by mail-pg1-f195.google.com with SMTP id b2so8583951pgl.9
        for <netdev@vger.kernel.org>; Thu, 21 Feb 2019 12:36:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to:user-agent;
        bh=Dp7dp8fRoiE3zhyHD29GzxWElXqC1ffRTzwrSEa+O0k=;
        b=byul+2dzPripIav6jCXgPVWA7hHM/r1G/4VWBxokhj+Zk2u3xki6yE2m17X/eR4iru
         2qK9iXwIqkPc4nL77QFr3augWwTrg0GJ1MrRlu19e4Ziz/c016QbuDQNqicmn1SKrp+p
         nXlJUsDAD0OyJjB9q18j/kmGBZt2zd09WGFuuR6DNySCRIt27hbnXwspB0PKea0XtztT
         UPPYUeAKRjR1iZhP4vYwYnzYG095dF80rTTP+pihSrt5hWg2olqW9zzBdnx3r+FDrk7b
         naoR2BSTAZJkPEL8h3Qhz+cQY/5uHfmKBUGSVcsoWJhMGz1djJVZiVDyxkcDPzTS2yVx
         C/Sg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to:user-agent;
        bh=Dp7dp8fRoiE3zhyHD29GzxWElXqC1ffRTzwrSEa+O0k=;
        b=RhP6819AeZqyfslxh7fWvdaPHIck7IM3FqQvK1aKsynjBVitBMhE1QQK1dqdr9I9ev
         v19PpJhjw50QSYOxCKp+TyHztCB0dmjtmD1p5IyE6J/SVqqooGlms0PLzoq5mWhbnbcy
         QkKkcOnxx4evGBquwjkVPszFljSfMjywalDN4HTXuecBEV5/IG1cuCY2gcazKlgE7jos
         gY/6m5ghZRJyd5KuLcoAfmFL8nXWrVSuuHzDv9wNn60cYcFvWZ6Mg1GlS57/nVguigZw
         n+2ZYcLcuZVKq8jRucJnd/F0eaXb7i4CCobVzSANGMVNN+XTzmOg3+t0L5bz1niuf8Mm
         GRQw==
X-Gm-Message-State: AHQUAubUX9OnVGotzpkxbxrMozb337HKA2B9psDMx4zCd011ahCTeBCa
        DVta1E00aIW5sYOkCpPOei0=
X-Google-Smtp-Source: AHgI3Ians31NbgiahQhsWKSmL6R44sJ8hBGRvwtqaYCdCr12FeeP/rrPmK8e1t0eOThRdm+KNCH6Pg==
X-Received: by 2002:a62:c302:: with SMTP id v2mr381771pfg.155.1550781378076;
        Thu, 21 Feb 2019 12:36:18 -0800 (PST)
Received: from ast-mbp.dhcp.thefacebook.com ([2620:10d:c090:200::4:eceb])
        by smtp.gmail.com with ESMTPSA id d16sm1516126pfo.112.2019.02.21.12.36.16
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 21 Feb 2019 12:36:16 -0800 (PST)
Date:   Thu, 21 Feb 2019 12:36:15 -0800
From:   Alexei Starovoitov <alexei.starovoitov@gmail.com>
To:     Kees Cook <keescook@chromium.org>
Cc:     Jann Horn <jannh@google.com>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Andy Lutomirski <luto@amacapital.net>,
        Alexei Starovoitov <ast@kernel.org>,
        Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2] bpf, seccomp: fix false positive preemption
 splat for cbpf->ebpf progs
Message-ID: <20190221203613.q6k757fi3wxtoj5y@ast-mbp.dhcp.thefacebook.com>
References: <20190220230135.9748-1-daniel@iogearbox.net>
 <20190220235952.uzrsjypoqkha7ya6@ast-mbp.dhcp.thefacebook.com>
 <CAADnVQJYjXKe7NKwjiCDt-tsgejZ1S0ApA4aJUw6se5XsWY5KQ@mail.gmail.com>
 <CAGXu5jLHV78-d+_yTKVsgawDAVn7FXZVCcvhtDesXcxhZf_NKg@mail.gmail.com>
 <f5231b6f-cc97-f3ea-3262-edea973f21e2@iogearbox.net>
 <CAG48ez1Edngv8LLb_svz8Kq3vQKf7-a_Q1whyHfXiTDP8FO+rw@mail.gmail.com>
 <20190221192916.2mcd4fmxbdj2j2u3@ast-mbp.dhcp.thefacebook.com>
 <CAGXu5jKE3PzxNDEMagWFG+37_6FLyQgkLRShHQHB7Ufq-8egDA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAGXu5jKE3PzxNDEMagWFG+37_6FLyQgkLRShHQHB7Ufq-8egDA@mail.gmail.com>
User-Agent: NeoMutt/20180223
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

On Thu, Feb 21, 2019 at 11:53:06AM -0800, Kees Cook wrote:
> On Thu, Feb 21, 2019 at 11:29 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Feb 21, 2019 at 01:56:53PM +0100, Jann Horn wrote:
> > > On Thu, Feb 21, 2019 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > > On 02/21/2019 06:31 AM, Kees Cook wrote:
> > > > > On Wed, Feb 20, 2019 at 8:03 PM Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >>
> > > > >> On Wed, Feb 20, 2019 at 3:59 PM Alexei Starovoitov
> > > > >> <alexei.starovoitov@gmail.com> wrote:
> > > > >>>
> > > > >>> On Thu, Feb 21, 2019 at 12:01:35AM +0100, Daniel Borkmann wrote:
> > > > >>>> In 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > > >>>> a check was added for BPF_PROG_RUN() that for every invocation preemption is
> > > > >>>> disabled to not break eBPF assumptions (e.g. per-cpu map). Of course this does
> > > > >>>> not count for seccomp because only cBPF -> eBPF is loaded here and it does
> > > > >>>> not make use of any functionality that would require this assertion. Fix this
> > > > >>>> false positive by adding and using SECCOMP_RUN() variant that does not have
> > > > >>>> the cant_sleep(); check.
> > > > >>>>
> > > > >>>> Fixes: 568f196756ad ("bpf: check that BPF programs run with preemption disabled")
> > > > >>>> Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
> > > > >>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > > > >>>> Acked-by: Kees Cook <keescook@chromium.org>
> > > > >>>
> > > > >>> Applied, Thanks
> > > > >>
> > > > >> Actually I think it's a wrong approach to go long term.
> > > > >> I'm thinking to revert it.
> > > > >> I think it's better to disable preemption for duration of
> > > > >> seccomp cbpf prog.
> > > > >> It's short and there is really no reason for it to be preemptible.
> > > > >> When seccomp switches to ebpf we'll have this weird inconsistency.
> > > > >> Let's just disable preemption for seccomp as well.
> > > > >
> > > > > A lot of changes will be needed for seccomp ebpf -- not the least of
> > > > > which is convincing me there is a use-case. ;)
> > > > >
> > > > > But the main issue is that I'm not a huge fan of dropping two
> > > > > barriers() across syscall entry. That seems pretty heavy-duty for
> > > > > something that is literally not needed right now.
> > > >
> > > > Yeah, I think it's okay to add once actually technically needed. Last
> > > > time I looked, if I recall correctly, at least Chrome installs some
> > > > heavy duty seccomp programs that go close to prog limit.
> > >
> > > Half of that is probably because that seccomp BPF code is so
> > > inefficient, though.
> > >
> > > This snippet shows that those programs constantly recheck the high
> > > halves of arguments:
> > >
> > > Some of the generated code is pointless because all reachable code
> > > from that point on has the same outcome (the last "ret ALLOW" in the
> > > following sample is unreachable because they've already checked that
> > > the high bit of the low half is set, so the low half can't be 3):
> >
> > and with ebpf these optimizations will be available for free
> > because llvm will remove unnecessary loads and simplify branches.
> > There is no technical reason not to use ebpf in seccomp.
> >
> > When we discussed preemption of classic vs extended in socket filters
> > context we agreed to make it a requirement that preemption must be
> > disabled though it's not strictly necessary. RX side of socket filters
> > was already non-preempt while TX was preemptible.
> > We must not make an exception of this rule for seccomp.
> > Hence I've reverted this commit.
> >
> > Here is the actual fix for seccomp:
> > From: Alexei Starovoitov <ast@kernel.org>
> > Date: Thu, 21 Feb 2019 10:40:14 -0800
> > Subject: [PATCH] seccomp, bpf: disable preemption before calling into bpf prog
> >
> > All BPF programs must be called with preemption disabled.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  kernel/seccomp.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index e815781ed751..a43c601ac252 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -267,6 +267,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >          * All filters in the list are evaluated and the lowest BPF return
> >          * value always takes priority (ignoring the DATA).
> >          */
> > +       preempt_disable();
> >         for (; f; f = f->prev) {
> >                 u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
> >
> > @@ -275,6 +276,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
> >                         *match = f;
> >                 }
> >         }
> > +       preempt_enable();
> >         return ret;
> >  }
> >  #endif /* CONFIG_SECCOMP_FILTER */
> > --
> >
> > Doing per-cpu increment of cache hot data is practically free and it makes seccomp
> > play by the rules.
> 
> Other accesses should dominate the run time, yes. I'm still not a big
> fan of unconditionally adding this, but I won't NAK. :P

Thank you.

I also would like to touch on your comment:
"A lot of changes will be needed for seccomp ebpf"
There were two attempts to add it in the past and the patches were
small and straightforward.
If I recall correctly both times you nacked them because performance gains
and ease of use arguments were not convincing enough, right?
Are you still not convinced ?