From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FREEMAIL_REPLYTO_END_DIGIT,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CFC7C3A589 for ; Thu, 15 Aug 2019 18:43:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C80082086C for ; Thu, 15 Aug 2019 18:43:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=protonmail.ch header.i=@protonmail.ch header.b="qwmMlhXr" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731607AbfHOSnS (ORCPT ); Thu, 15 Aug 2019 14:43:18 -0400 Received: from mail-40136.protonmail.ch ([185.70.40.136]:53884 "EHLO mail-40136.protonmail.ch" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728818AbfHOSnR (ORCPT ); Thu, 15 Aug 2019 14:43:17 -0400 Date: Thu, 15 Aug 2019 18:43:06 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.ch; s=default; t=1565894593; bh=ND7tBfcYJKAh1WsZTzGacbQjk9haGZQHPo35ShA56Nc=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References: Feedback-ID:From; b=qwmMlhXr3hYgLqjAkYZd8zbseVL3j0wedk8zYIV497DuO/fvA+lHvT9x8MwqgvJ1V gzRm5eeJstu9+MkpLhXAVCCfrwtEME3lxyWjzqfCKal7uXI6luIg/wElPp1AkBena2 h5UUxSttI4gyShf5i/qvjyBg1cTD0+Z5tLXi4gtc= To: Alexei Starovoitov From: Jordan Glover Cc: Andy Lutomirski , Daniel Colascione , Song Liu , Kees Cook , Networking , bpf , Alexei Starovoitov , Daniel Borkmann , Kernel Team , Lorenz Bauer , Jann Horn , Greg KH , Linux API , LSM List Reply-To: Jordan Glover Subject: Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf Message-ID: In-Reply-To: <20190815172856.yoqvgu2yfrgbkowu@ast-mbp.dhcp.thefacebook.com> References: <20190813215823.3sfbakzzjjykyng2@ast-mbp> <20190814005737.4qg6wh4a53vmso2v@ast-mbp> <20190814220545.co5pucyo5jk3weiv@ast-mbp.dhcp.thefacebook.com> <20190815172856.yoqvgu2yfrgbkowu@ast-mbp.dhcp.thefacebook.com> Feedback-ID: QEdvdaLhFJaqnofhWA-dldGwsuoeDdDw7vz0UPs8r8sanA3bIt8zJdf4aDqYKSy4gJuZ0WvFYJtvq21y6ge_uQ==:Ext:ProtonMail MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thursday, August 15, 2019 5:28 PM, Alexei Starovoitov wrote: > On Thu, Aug 15, 2019 at 11:24:54AM +0000, Jordan Glover wrote: > > > On Wednesday, August 14, 2019 10:05 PM, Alexei Starovoitov alexei.staro= voitov@gmail.com wrote: > > > > > On Wed, Aug 14, 2019 at 10:51:23AM -0700, Andy Lutomirski wrote: > > > > > > > If eBPF is genuinely not usable by programs that are not fully trus= ted > > > > by the admin, then no kernel changes at all are needed. Programs th= at > > > > want to reduce their own privileges can easily fork() a privileged > > > > subprocess or run a little helper to which they delegate BPF > > > > operations. This is far more flexible than anything that will ever = be > > > > in the kernel because it allows the helper to verify that the rest = of > > > > the program is doing exactly what it's supposed to and restrict eBP= F > > > > operations to exactly the subset that is needed. So a container > > > > manager or network manager that drops some provilege could have a > > > > little bpf-helper that manages its BPF XDP, firewalling, etc > > > > configuration. The two processes would talk over a socketpair. > > > > > > there were three projects that tried to delegate bpf operations. > > > All of them failed. > > > bpf operational workflow is much more complex than you're imagining. > > > fork() also doesn't work for all cases. > > > I gave this example before: consider multiple systemd-like deamons > > > that need to do bpf operations that want to pass this 'bpf capability= ' > > > to other deamons written by other teams. Some of them will start > > > non-root, but still need to do bpf. They will be rpm installed > > > and live upgraded while running. > > > We considered to make systemd such centralized bpf delegation > > > authority too. It didn't work. bpf in kernel grows quickly. > > > libbpf part grows independently. llvm keeps evolving. > > > All of them are being changed while system overall has to stay > > > operational. Centralized approach breaks apart. > > > > > > > The interesting cases you're talking about really do involved > > > > unprivileged or less privileged eBPF, though. Let's see: > > > > systemd --user: systemd --user is not privileged at all. There's no > > > > issue of reducing privilege, since systemd --user doesn't have any > > > > privilege to begin with. But systemd supports some eBPF features, a= nd > > > > presumably it would like to support them in the systemd --user case= . > > > > This is unprivileged eBPF. > > > > > > Let's disambiguate the terminology. > > > This /dev/bpf patch set started as describing the feature as 'unprivi= leged bpf'. > > > I think that was a mistake. > > > Let's call systemd-like deamon usage of bpf 'less privileged bpf'. > > > This is not unprivileged. > > > 'unprivileged bpf' is what sysctl kernel.unprivileged_bpf_disabled co= ntrols. > > > There is a huge difference between the two. > > > I'm against extending 'unprivileged bpf' even a bit more than what it= is > > > today for many reasons mentioned earlier. > > > The /dev/bpf is about 'less privileged'. > > > Less privileged than root. We need to split part of full root capabil= ity > > > into bpf capability. So that most of the root can be dropped. > > > This is very similar to what cap_net_admin does. > > > cap_net_amdin can bring down eth0 which is just as bad as crashing th= e box. > > > cap_net_admin is very much privileged. Just 'less privileged' than ro= ot. > > > Same thing for cap_bpf. > > > May be we should do both cap_bpf and /dev/bpf to make it clear that > > > this is the same thing. Two interfaces to achieve the same result. > > > > systemd --user processes aren't "less privileged". The are COMPLETELY u= nprivileged. > > Granting them cap_bpf is the same as granting it to every other unprivi= leged user > > process. Also unprivileged user process can start systemd --user proces= s with any > > command they like. > > systemd itself is trusted. It's the same binary whether it runs as pid=3D= 1 > or as pid=3D123. One of the use cases is to make IPAddressDeny=3D work wi= th --user. > Subset of that feature already works with AmbientCapabilities=3DCAP_NET_A= DMIN. > CAP_BPF is a natural step in the same direction. The point was that systemd will run any arbitrary command you'll throw at i= t and you want to automatically attach CAP_BPF to it. AmbientCapabilities is not valid opt= ion for systemd --user instance (otherwise it would be nuts). I think we may have misunderstanding here. Did you mean systemd "system" se= rvice with "User=3Dxxx" option instead of "systemd --user" service? It would make sens= e then.