From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA96B70 for ; Thu, 20 May 2021 09:05:54 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id A3C49610CB; Thu, 20 May 2021 09:05:47 +0000 (UTC) Date: Thu, 20 May 2021 11:05:43 +0200 From: Christian Brauner To: Andy Lutomirski Cc: YiFei Zhu , containers@lists.linux.dev, bpf , YiFei Zhu , LSM List , Alexei Starovoitov , Andrea Arcangeli , Austin Kuo , Claudio Canella , Daniel Borkmann , Daniel Gruss , Dimitrios Skarlatos , Giuseppe Scrivano , Hubertus Franke , Jann Horn , Jinghao Jia , Josep Torrellas , Kees Cook , Sargun Dhillon , Tianyin Xu , Tobin Feldman-Fitzthum , Tom Hromatka , Will Drewry Subject: Re: [RFC PATCH bpf-next seccomp 00/12] eBPF seccomp filters Message-ID: <20210520090543.vay4guole7hkeaf3@wittgenstein> References: X-Mailing-List: containers@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Sat, May 15, 2021 at 08:49:01AM -0700, Andy Lutomirski wrote: > On 5/10/21 10:21 PM, YiFei Zhu wrote: > > On Mon, May 10, 2021 at 12:47 PM Andy Lutomirski wrote: > >> On Mon, May 10, 2021 at 10:22 AM YiFei Zhu wrote: > >>> > >>> From: YiFei Zhu > >>> > >>> Based on: https://lists.linux-foundation.org/pipermail/containers/2018-February/038571.html > >>> > >>> This patchset enables seccomp filters to be written in eBPF. > >>> Supporting eBPF filters has been proposed a few times in the past. > >>> The main concerns were (1) use cases and (2) security. We have > >>> identified many use cases that can benefit from advanced eBPF > >>> filters, such as: > >> > >> I haven't reviewed this carefully, but I think we need to distinguish > >> a few things: > >> > >> 1. Using the eBPF *language*. > >> > >> 2. Allowing the use of stateful / non-pure eBPF features. > >> > >> 3. Allowing the eBPF programs to read the target process' memory. > >> > >> I'm generally in favor of (1). I'm not at all sure about (2), and I'm > >> even less convinced by (3). > >> > >>> > >>> * exec-only-once filter / apply filter after exec > >> > >> This is (2). I'm not sure it's a good idea. > > > > The basic idea is that for a container runtime it may wait to execute > > a program in a container without that program being able to execve > > another program, stopping any attack that involves loading another > > binary. The container runtime can block any syscall but execve in the > > exec-ed process by using only cBPF. > > > > The use case is suggested by Andrea Arcangeli and Giuseppe Scrivano. > > @Andrea and @Giuseppe, could you clarify more in case I missed > > something? > > We've discussed having a notifier-using filter be able to replace its > filter. This would allow this and other use cases without any > additional eBPF or cBPF code. Are you referring to sm like I sketched in https://lore.kernel.org/containers/20210301110907.2qoxmiy55gpkgwnq@wittgenstein/ ? > > >> eBPF doesn't really have a privilege model yet. There was a long and > >> disappointing thread about this awhile back. > > > > The idea is that “seccomp-eBPF does not make life easier for an > > adversary”. Any attack an adversary could potentially utilize > > seccomp-eBPF, they can do the same with other eBPF features, i.e. it > > would be an issue with eBPF in general rather than specifically > > seccomp’s use of eBPF. > > > > Here it is referring to the helpers goes to the base > > bpf_base_func_proto if the caller is unprivileged (!bpf_capable || > > !perfmon_capable). In this case, if the adversary would utilize eBPF > > helpers to perform an attack, they could do it via another > > unprivileged prog type. > > > > That said, there are a few additional helpers this patchset is adding: > > * get_current_uid_gid > > * get_current_pid_tgid > > These two provide public information (are namespaces a concern?). I If they are seen from userspace in any way then these must be resolved relative to the caller's userns or caller's pidns. So yes, namespaces need to be taken into account. > > have no idea what kind of exploit it could add unless the adversary > > somehow side-channels the task_struct? But in that case, how is the > > reading of task_struct different from how the rest of the kernel is > > reading task_struct? > > Yes, namespaces are a concern. This idea got mostly shot down for kdbus > (what ever happened to that?), and it likely has the same problems for > seccomp. > > >> > >> What is this for? > > > > Memory reading opens up lots of use cases. For example, logging what > > files are being opened without imposing too much performance penalty > > from strace. Or as an accelerator for user notify emulation, where > > syscalls can be rejected on a fast path if we know the memory contents > > does not satisfy certain conditions that user notify will check. > > > > This has all kinds of race conditions. > > > I hate to be a party pooper, but this patchset is going to very high bar > to acceptance. Right now, seccomp has a couple of excellent properties: > > First, while it has limited expressiveness, it is simple enough that the > implementation can be easily understood and the scope for > vulnerabilities that fall through the cracks of the seccomp sandbox > model is low. Compare this to Windows' low-integrity/high-integrity > sandbox system: there is a never ending string of sandbox escapes due to > token misuse, unexpected things at various integrity levels, etc. > Seccomp doesn't have tokens or integrity levels, and these bugs don't > happen. > > Second, seccomp works, almost unchanged, in a completely unprivileged > context. The last time making eBPF work sensibly in a less- or Yeah, which is pretty important.