From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1B40C433B4 for ; Mon, 17 May 2021 17:08:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C40576124C for ; Mon, 17 May 2021 17:08:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235477AbhEQRJw (ORCPT ); Mon, 17 May 2021 13:09:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232261AbhEQRJv (ORCPT ); Mon, 17 May 2021 13:09:51 -0400 Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF8D5C061573 for ; Mon, 17 May 2021 10:08:34 -0700 (PDT) Received: by mail-io1-xd2d.google.com with SMTP id a11so6587516ioo.0 for ; Mon, 17 May 2021 10:08:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sargun.me; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=hN1fiA1phqGUjL4yu5RPTW49+OWuWPqOhTclUxcTVTM=; b=stvBT95tmJr9ijwlzfUrpbqTeGdjjqzLDbsC1KySG1+IwC4oOU3lk0OhRAPjqM+QIK NISJeFGF4Bc3IN9KlaBuuKAvNaWhy8mQ0Qw1Vsv1olKD4D49m+WADyzLiS0+Yg+e0YgS fdf+5jNwgljDMOCt43k0bYZSvGmf+NPMPi768= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=hN1fiA1phqGUjL4yu5RPTW49+OWuWPqOhTclUxcTVTM=; b=k0YeLtD/cP8R1lNXWOHYXYPVwSaENdGMFeBW/ue6Ec/SD+W8KRx7Q+ZGoGYWmKAIPG kqhgKQxwZvJdV142qVloa+c2OsfIj4pMtZ+6A1l3h8JQQsSO2kR0temQP568Ct4vQNPT No54FxaV8kZzPVsOQMl/LMgny1EdfRqGrL+YhUIEJd+GZCG+HflaCMufosViBW9yNnE5 AOScj+vkCoahl6HJp/c96M+YPkM6PQV/vJMavv/YzxAjIrScmTGOc0UMxJzXxUfVAMgR SKFJfQ8s1Jf9+ptbrm830SSSSLCk7tP/yZEm+iswSy/PiGSl1/BjESg7YmlQI4QwBkAd HmiQ== X-Gm-Message-State: AOAM533Nx8Qx8r2v0D8dnCpMRoP+f/ed8vfeW8Ei7WAaNKryV9GhTym8 c8iApl8LcIBOZN8l0xAHbZ/nJ4lxHDamAwJlau6hYd/btOHL4Cq2 X-Google-Smtp-Source: ABdhPJyxx1VOxJo+YjtzQesx0ri4JWpHUTsm/VBLiwbJ8L3CSi75xkRZ3+Sx1dQPhNxWcFsB+t6M+Hw5iNklVL4sGp8= X-Received: by 2002:a05:6638:10c4:: with SMTP id q4mr977779jad.29.1621271314217; Mon, 17 May 2021 10:08:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Sargun Dhillon Date: Mon, 17 May 2021 10:07:58 -0700 Message-ID: Subject: Re: [RFC PATCH bpf-next seccomp 00/12] eBPF seccomp filters To: Tianyin Xu Cc: Andy Lutomirski , YiFei Zhu , "containers@lists.linux.dev" , bpf , "Zhu, YiFei" , LSM List , Alexei Starovoitov , Andrea Arcangeli , "Kuo, Hsuan-Chi" , Claudio Canella , Daniel Borkmann , Daniel Gruss , Dimitrios Skarlatos , Giuseppe Scrivano , Hubertus Franke , Jann Horn , "Jia, Jinghao" , "Torrellas, Josep" , Kees Cook , Tobin Feldman-Fitzthum , Tom Hromatka , Will Drewry Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: On Sun, May 16, 2021 at 1:39 AM Tianyin Xu wrote: > > On Sat, May 15, 2021 at 10:49 AM Andy Lutomirski wrote: > > > > On 5/10/21 10:21 PM, YiFei Zhu wrote: > > > On Mon, May 10, 2021 at 12:47 PM Andy Lutomirski wr= ote: > > >> On Mon, May 10, 2021 at 10:22 AM YiFei Zhu = wrote: > > >>> > > >>> From: YiFei Zhu > > >>> > > >>> Based on: https://urldefense.com/v3/__https://lists.linux-foundatio= n.org/pipermail/containers/2018-February/038571.html__;!!DZ3fjg!thbAoRgmCeW= jlv0qPDndNZW1j6Y2Kl_huVyUffr4wVbISf-aUiULaWHwkKJrNJyo$ > > >>> > > >>> This patchset enables seccomp filters to be written in eBPF. > > >>> Supporting eBPF filters has been proposed a few times in the past. > > >>> The main concerns were (1) use cases and (2) security. We have > > >>> identified many use cases that can benefit from advanced eBPF > > >>> filters, such as: > > >> > > >> I haven't reviewed this carefully, but I think we need to distinguis= h > > >> a few things: > > >> > > >> 1. Using the eBPF *language*. > > >> > > >> 2. Allowing the use of stateful / non-pure eBPF features. > > >> > > >> 3. Allowing the eBPF programs to read the target process' memory. > > >> > > >> I'm generally in favor of (1). I'm not at all sure about (2), and I= 'm > > >> even less convinced by (3). > > >> > > >>> > > >>> * exec-only-once filter / apply filter after exec > > >> > > >> This is (2). I'm not sure it's a good idea. > > > > > > The basic idea is that for a container runtime it may wait to execute > > > a program in a container without that program being able to execve > > > another program, stopping any attack that involves loading another > > > binary. The container runtime can block any syscall but execve in the > > > exec-ed process by using only cBPF. > > > > > > The use case is suggested by Andrea Arcangeli and Giuseppe Scrivano. > > > @Andrea and @Giuseppe, could you clarify more in case I missed > > > something? > > > > We've discussed having a notifier-using filter be able to replace its > > filter. This would allow this and other use cases without any > > additional eBPF or cBPF code. > > > > A notifier is not always a solution (even ignoring its perf overhead). > > One problem, pointed out by Andrea Arcangeli, is that notifiers need > userspace daemons. So, it can hardly be used by daemonless container > engines like Podman. > > And, /* sorry for repeating.. */ the performance overhead of notifiers > is not close to ebpf, which prevents use cases that require native > performance. While I agree with you that this is the case right now, there's no reason i= t has to be the case. There's a variety of mechanisms that can be employed to significantly speed up the performance of the notifier. For example, rig= ht now the notifier is behind one large per-filter lock. That could be removed allowing for better concurrency. There are a large number of mechanisms that scale O(n) with the outstanding notifications -- again, something that could be improved. The other big improvement that could be made is being able to use something like io_uring with the notifier interface, but it would require a fairly significant user API change -- and a move away from ioctl. I'm not sure if people are excited about that idea at the moment. > > > > >> eBPF doesn't really have a privilege model yet. There was a long an= d > > >> disappointing thread about this awhile back. > > > > > > The idea is that =E2=80=9Cseccomp-eBPF does not make life easier for = an > > > adversary=E2=80=9D. Any attack an adversary could potentially utilize > > > seccomp-eBPF, they can do the same with other eBPF features, i.e. it > > > would be an issue with eBPF in general rather than specifically > > > seccomp=E2=80=99s use of eBPF. > > > > > > Here it is referring to the helpers goes to the base > > > bpf_base_func_proto if the caller is unprivileged (!bpf_capable || > > > !perfmon_capable). In this case, if the adversary would utilize eBPF > > > helpers to perform an attack, they could do it via another > > > unprivileged prog type. > > > > > > That said, there are a few additional helpers this patchset is adding= : > > > * get_current_uid_gid > > > * get_current_pid_tgid > > > These two provide public information (are namespaces a concern?). I > > > have no idea what kind of exploit it could add unless the adversary > > > somehow side-channels the task_struct? But in that case, how is the > > > reading of task_struct different from how the rest of the kernel is > > > reading task_struct? > > > > Yes, namespaces are a concern. This idea got mostly shot down for kdbu= s > > (what ever happened to that?), and it likely has the same problems for > > seccomp. > > > > >> > > >> What is this for? > > > > > > Memory reading opens up lots of use cases. For example, logging what > > > files are being opened without imposing too much performance penalty > > > from strace. Or as an accelerator for user notify emulation, where > > > syscalls can be rejected on a fast path if we know the memory content= s > > > does not satisfy certain conditions that user notify will check. > > > > > > > This has all kinds of race conditions. > > > > > > I hate to be a party pooper, but this patchset is going to very high ba= r > > to acceptance. Right now, seccomp has a couple of excellent properties= : > > > > First, while it has limited expressiveness, it is simple enough that th= e > > implementation can be easily understood and the scope for > > vulnerabilities that fall through the cracks of the seccomp sandbox > > model is low. Compare this to Windows' low-integrity/high-integrity > > sandbox system: there is a never ending string of sandbox escapes due t= o > > token misuse, unexpected things at various integrity levels, etc. > > Seccomp doesn't have tokens or integrity levels, and these bugs don't > > happen. > > > > Second, seccomp works, almost unchanged, in a completely unprivileged > > context. The last time making eBPF work sensibly in a less- or > > -unprivileged context, the maintainers mostly rejected the idea of > > developing/debugging a permission model for maps, cleaning up the bpf > > object id system, etc. You are going to have a very hard time > > convincing the seccomp maintainers to let any of these mechanism > > interact with seccomp until the underlying permission model is in place= . > > > > --Andy > > Thanks for pointing out the tradeoff between expressiveness vs. simplicit= y. > > Note that we are _not_ proposing to replace cbpf, but propose to also > support ebpf filters. There certainly are use cases where cbpf is > sufficient, but there are also important use cases ebpf could make > life much easier. > > Most importantly, we strongly believe that ebpf filters can be > supported without reducing security. > > No worries about =E2=80=9Cparty pooping=E2=80=9D and we appreciate the fe= edback. We=E2=80=99d > love to hear concerns and collect feedback so we can address them to > hit that very high bar. > > > ~t > > -- > Tianyin Xu > University of Illinois at Urbana-Champaign > https://tianyin.github.io/